What DeepSeek Can Teach Product Managers About AI?
A Product Manager's Guide to the decode the Next Big Disruption
Introduction
DeepSeek, a Chinese AI startup, has triggered a $1.2 trillion market value drop in global tech stocks by achieving what was thought impossible - matching GPT-4's performance at just 3% of the cost.
DeepSeek isn’t just another LLM provider—it’s a case study in how to approach AI product development in a resource-constrained world.
Think of DeepSeek as the "Tesla moment" or more aptly "Toyota" of AI - delivering high performance at a fraction of the cost.
DeepSeek proved that “bigger isn’t always better,” challenging Big Tech’s “scale-at-all-costs” approach.
Today, the DeepSeek AI assistant quickly rose to become the top-rated free application on Apple's App Store in multiple countries, surpassing established players like ChatGPT.
The future of AI products isn't about having the biggest budget. Example is DeepSeek which is democratizing AI by making it incredibly cost-effective and accessible.
Key Numbers That Matter
Why Should PMs Care?
1. Cost Revolution
Achieved GPT-4 level performance using just 2,000 NVIDIA chips (vs. 16,000 typically needed)
Used only 2.8 million GPU hours (vs Meta's Llama using 30.8 million)
Demonstrates that effective AI doesn't require massive budgets
This means AI features that once cost $100,000 might now cost $3,000
Token pricing -significantly lower than industry standards (95% cost reduction)
DeepSeek’s models (e.g., DeepSeek-R1, DeepSeek-R2) achieve GPT-4-level performance at 1/30th the cost
2. Market Disruption
Topped App Store rankings globally this weekend
Triggered largest tech stock decline since 2022
Changed assumptions about AI development costs
3. Competitive Advantage
Faster response times for users
Makes advanced AI features viable for smaller products
Enables more features within the same budget
The Secret Sauce: Three Core Innovations
1. Efficient Architecture
Sparse Activation: Only 20% of neurons fire per query, reducing compute needs.
Dynamic Scaling: Automatically adjusts model size based on task complexity.
Uses "Mixture of Experts" (MoE) - activating only 37B of 671B parameters per task
Implements Multi-Head Latent Attention (MLA) reducing memory usage by 16x
2. Resource Optimization
Custom communication schemes between chips
Memory optimization techniques
Innovative training methods without top-tier hardware
Uses synthetic data and reinforcement learning from human-AI collaboration (RLAIF) to cut training costs.
3. Open Source Approach
Full model architecture and training methodology openly shared
Enables researchers to distill better, smaller models
MIT license allowing free commercial use and modification
Full transparency in reasoning process
Released comprehensive technical report for DeepSeek-V3 on arXiv
Open-sources base models while monetizing enterprise tools—a “Linux-like” strategy for AI.
DeepSeek primarily consists of fresh graduates and doctoral students from top universities. Their total engineering team ~139 engineers working on DeepSeek V3.
Understand some key terms - MoE, MLA
Mixture of Experts (MoE): The Smart Hospital Analogy
Imagine a super-efficient hospital where:
Instead of one doctor trying to treat every condition
You have multiple specialist departments
A smart receptionist (called the "router") who knows exactly which specialist to send each patient to
Only the needed specialists are called in for each case, keeping costs down
This is exactly how MoE works in AI:
Instead of one massive AI system handling everything
It has multiple specialized sub-networks ("experts")
A smart routing system directs each task to the right experts
Only relevant experts are activated, saving computational power15
Multi-Head Latent Attention (MLA): The Smart Memory System
Think of MLA like a revolutionary filing system:
Traditional systems keep complete copies of every file (expensive)
MLA keeps compressed summaries (latent vectors)
When needed, it can quickly reconstruct the full information
Uses significantly less storage while maintaining performance34
Key benefits:
Reduces memory usage by up to 75%
Maintains or improves performance compared to traditional methods
Enables faster processing of information
Current Limitations
Model is not connected to real-time internet
Currently facing scalability challenges due to cyber attacks
Relies on existing knowledge up to July 2024.
Model Variants & Usage Guidance
Quick Selection Guide
For Developers:
Start with DeepSeek Coder for basic coding tasks
Upgrade to DeepSeek-Coder-V2 for complex development projects
For Enterprises:
Use DeepSeek-V3 for large-scale operations
Choose DeepSeek-R1 for advanced analytics
For Mobile/Edge Applications:
DeepSeek-R1-Distill is your best choice
For General Purpose:
DeepSeek LLM (67B) offers good balance
DeepSeek-V2 provides cost-effective scaling
Top 5 Myths DeepSeek Busts About AI
Myth 1: “High Costs Are Inevitable”
Reality: DeepSeek uses dynamic computation (e.g., activating only relevant neural pathways), cuts inference costs by 97%.
Myth 2: “Bigger Models = Better Results”
Reality: Their 7B-parameter DeepSeek-R2 matches 70B-parameter models in reasoning tasks through smarter training, not brute force.
Myth 3: “Specialized AI Requires Massive Data”
Reality: DeepSeek’s “knowledge infusion” technique adapts base models to niche domains (e.g., legal, manufacturing) with 10x less data.
Myth 4: “Fast Inference Requires Sacrificing Accuracy”
Reality: Their sparse Mixture-of-Experts architecture delivers 200ms response times without quality loss.
Myth 5: “Only Big Tech Can Innovate”
Reality: DeepSeek’s lean team of 50 researchers outmaneuvered giants by focusing on efficiency-first AI.
Product Management Lessons
1. Efficiency First
✅ Question resource assumptions
✅ Focus on optimization before scale
❌ Don't assume bigger is better
2. Innovation Strategy
✅ Consider open-source advantages
✅ Prioritize user transparency
❌ Don't overlook cost innovation
3. Market Positioning
✅ Challenge industry assumptions
✅ Focus on core user needs
❌ Don't ignore global competition
What DeepSeek Example Means For Our Product
Short Term:
Lower barriers to entry in AI products
Ask: “What problem needs 10x cheaper AI?” not “How do we use AI?”
Democratization of AI capabilities
Reevaluate AI implementation costs
Cost-Aware Development
Treat computational efficiency as a core feature, not an afterthought.
Example: Prioritize “cost per inference” metrics alongside accuracy
Consider open-source alternatives
Plan for price competition
Long Term:
Prepare for AI commoditization
Focus on unique value layers
Problem-First, Not Model-First
Build for efficiency first
Focus on unique value propositions beyond raw AI capability
Like DeepSeek’s work adopt a similar mindset of questioning the status quo and exploring innovative approaches.
Conclusion
DeepSeek proves that world-class AI can be built efficiently, challenging the notion that massive resources are necessary. For PMs, this means rethinking AI strategy, focusing on efficiency, and preparing for a future where AI capabilities are more accessible and affordable than ever
Remember: The future of AI product management isn't about having the biggest budget – it's about using resources most efficiently to deliver real user value.
Hungry for more? Please subscribe now
AI Product Management – Learn with Me Series
Welcome to my “AI Product Management – Learn with Me Series.”