What DeepSeek Can Teach Product Managers About AI?

A Product Manager's Guide to the decode the Next Big Disruption

Jan 27, 2025

Introduction

DeepSeek, a Chinese AI startup, has triggered a $1.2 trillion market value drop in global tech stocks by achieving what was thought impossible - matching GPT-4's performance at just 3% of the cost.

DeepSeek isn’t just another LLM provider—it’s a case study in how to approach AI product development in a resource-constrained world.

Think of DeepSeek as the "Tesla moment" or more aptly "Toyota" of AI - delivering high performance at a fraction of the cost.

DeepSeek proved that “bigger isn’t always better,” challenging Big Tech’s “scale-at-all-costs” approach.

Today, the DeepSeek AI assistant quickly rose to become the top-rated free application on Apple's App Store in multiple countries, surpassing established players like ChatGPT.

The future of AI products isn't about having the biggest budget. Example is DeepSeek which is democratizing AI by making it incredibly cost-effective and accessible.

Key Numbers That Matter

Why Should PMs Care?

1. Cost Revolution

Achieved GPT-4 level performance using just 2,000 NVIDIA chips (vs. 16,000 typically needed)
Used only 2.8 million GPU hours (vs Meta's Llama using 30.8 million)
Demonstrates that effective AI doesn't require massive budgets
This means AI features that once cost $100,000 might now cost $3,000
- Token pricing -significantly lower than industry standards (95% cost reduction)
- DeepSeek’s models (e.g., DeepSeek-R1, DeepSeek-R2) achieve GPT-4-level performance at 1/30th the cost

2. Market Disruption

Topped App Store rankings globally this weekend
Triggered largest tech stock decline since 2022
Changed assumptions about AI development costs

3. Competitive Advantage

Faster response times for users
Makes advanced AI features viable for smaller products
Enables more features within the same budget

The Secret Sauce: Three Core Innovations

1. Efficient Architecture

Sparse Activation: Only 20% of neurons fire per query, reducing compute needs.
Dynamic Scaling: Automatically adjusts model size based on task complexity.
Uses "Mixture of Experts" (MoE) - activating only 37B of 671B parameters per task
Implements Multi-Head Latent Attention (MLA) reducing memory usage by 16x

2. Resource Optimization

Custom communication schemes between chips
Memory optimization techniques
Innovative training methods without top-tier hardware
Uses synthetic data and reinforcement learning from human-AI collaboration (RLAIF) to cut training costs.

3. Open Source Approach

Full model architecture and training methodology openly shared
- Enables researchers to distill better, smaller models
MIT license allowing free commercial use and modification
Full transparency in reasoning process
Released comprehensive technical report for DeepSeek-V3 on arXiv
Open-sources base models while monetizing enterprise tools—a “Linux-like” strategy for AI.

DeepSeek primarily consists of fresh graduates and doctoral students from top universities. Their total engineering team ~139 engineers working on DeepSeek V3.

Understand some key terms - MoE, MLA

Mixture of Experts (MoE): The Smart Hospital Analogy

Imagine a super-efficient hospital where:

Instead of one doctor trying to treat every condition
You have multiple specialist departments
A smart receptionist (called the "router") who knows exactly which specialist to send each patient to
Only the needed specialists are called in for each case, keeping costs down

This is exactly how MoE works in AI:

Instead of one massive AI system handling everything
It has multiple specialized sub-networks ("experts")
A smart routing system directs each task to the right experts
Only relevant experts are activated, saving computational power1 5

Multi-Head Latent Attention (MLA): The Smart Memory System

Think of MLA like a revolutionary filing system:

Traditional systems keep complete copies of every file (expensive)
MLA keeps compressed summaries (latent vectors)
When needed, it can quickly reconstruct the full information
Uses significantly less storage while maintaining performance3 4

Key benefits:

Reduces memory usage by up to 75%
Maintains or improves performance compared to traditional methods
Enables faster processing of information

Current Limitations

Model is not connected to real-time internet
Currently facing scalability challenges due to cyber attacks
Relies on existing knowledge up to July 2024.

Model Variants & Usage Guidance

Quick Selection Guide

For Developers:

Start with DeepSeek Coder for basic coding tasks
Upgrade to DeepSeek-Coder-V2 for complex development projects

For Enterprises:

Use DeepSeek-V3 for large-scale operations
Choose DeepSeek-R1 for advanced analytics

For Mobile/Edge Applications:

DeepSeek-R1-Distill is your best choice

For General Purpose:

DeepSeek LLM (67B) offers good balance
DeepSeek-V2 provides cost-effective scaling

Top 5 Myths DeepSeek Busts About AI

Myth 1: “High Costs Are Inevitable”

Reality: DeepSeek uses dynamic computation (e.g., activating only relevant neural pathways), cuts inference costs by 97%.

Myth 2: “Bigger Models = Better Results”

Reality: Their 7B-parameter DeepSeek-R2 matches 70B-parameter models in reasoning tasks through smarter training, not brute force.

Myth 3: “Specialized AI Requires Massive Data”

Reality: DeepSeek’s “knowledge infusion” technique adapts base models to niche domains (e.g., legal, manufacturing) with 10x less data.

Myth 4: “Fast Inference Requires Sacrificing Accuracy”

Reality: Their sparse Mixture-of-Experts architecture delivers 200ms response times without quality loss.

Myth 5: “Only Big Tech Can Innovate”

Reality: DeepSeek’s lean team of 50 researchers outmaneuvered giants by focusing on efficiency-first AI.

Product Management Lessons

1. Efficiency First

✅ Question resource assumptions
✅ Focus on optimization before scale
❌ Don't assume bigger is better

2. Innovation Strategy

✅ Consider open-source advantages
✅ Prioritize user transparency
❌ Don't overlook cost innovation

3. Market Positioning

✅ Challenge industry assumptions
✅ Focus on core user needs
❌ Don't ignore global competition

What DeepSeek Example Means For Our Product

Short Term:

Lower barriers to entry in AI products
- Ask: “What problem needs 10x cheaper AI?” not “How do we use AI?”
Democratization of AI capabilities
Reevaluate AI implementation costs
- Cost-Aware Development
  - Treat computational efficiency as a core feature, not an afterthought.
  - Example: Prioritize “cost per inference” metrics alongside accuracy
Consider open-source alternatives
Plan for price competition

Long Term:

Prepare for AI commoditization
Focus on unique value layers
- Problem-First, Not Model-First
Build for efficiency first
Focus on unique value propositions beyond raw AI capability
Like DeepSeek’s work adopt a similar mindset of questioning the status quo and exploring innovative approaches.

Thanks for reading Ravi's Diary of Learnings! This post is public so feel free to share it.

Conclusion

DeepSeek proves that world-class AI can be built efficiently, challenging the notion that massive resources are necessary. For PMs, this means rethinking AI strategy, focusing on efficiency, and preparing for a future where AI capabilities are more accessible and affordable than ever

Remember: The future of AI product management isn't about having the biggest budget – it's about using resources most efficiently to deliver real user value.

Hungry for more? Please subscribe now

Niche Skills

AI Product Management – Learn with Me Series

Raviteja Palanki

Jan 11

Welcome to my “AI Product Management – Learn with Me Series.”

Read full story

Ravi's Diary of Learnings