The new chatbot from DeepSeek has made a significant impact in the AI industry, introducing itself with a promise of surprising answers to any question. This bold entry into the market has even contributed to one of NVIDIA's largest stock price drops, signaling DeepSeek's growing influence.
Image: ensigame.com
What distinguishes DeepSeek's model is its innovative architecture and training methods, which include:
Multi-token Prediction (MTP): This technique allows the model to predict multiple words at once by analyzing different sentence segments, enhancing both accuracy and efficiency.
Mixture of Experts (MoE): Utilizing 256 neural networks, with eight activated per token, this approach speeds up AI training and boosts performance.
Multi-head Latent Attention (MLA): This mechanism focuses on critical sentence parts, repeatedly extracting key details to ensure important nuances are not missed.
DeepSeek, a prominent Chinese startup, claims to have developed its competitive AI model, DeepSeek V3, at a minimal cost of $6 million, using just 2048 graphics processors.
Image: ensigame.com
However, a deeper look by SemiAnalysis reveals that DeepSeek operates a vast computational infrastructure, utilizing around 50,000 Nvidia Hopper GPUs, including various models like H800, H100, and H20. These resources are spread across multiple data centers, used not only for AI training but also for research and financial modeling. The company's total investment in servers stands at about $1.6 billion, with operational costs around $944 million.
DeepSeek is a subsidiary of the Chinese hedge fund High-Flyer, established as a separate AI-focused division in 2023. Unlike many startups that rely on cloud computing, DeepSeek owns its data centers, providing full control over AI model optimization and enabling rapid innovation. The company remains self-funded, enhancing its agility and decision-making speed.
Image: ensigame.com
Additionally, DeepSeek attracts top talent from leading Chinese universities, offering salaries exceeding $1.3 million annually, though it does not hire foreign specialists.
Despite DeepSeek's claim of training DeepSeek V3 for just $6 million, this figure only accounts for GPU usage during pre-training and does not include research, model refinement, data processing, or infrastructure costs. Since its start, DeepSeek has invested over $500 million in AI development, leveraging its compact structure to swiftly implement innovations.
Image: ensigame.com
DeepSeek's case illustrates that a well-funded, independent AI company can challenge industry leaders. However, experts highlight that the company's success stems from substantial investments, technical breakthroughs, and a strong team, rather than a "revolutionary budget" for AI development. Yet, DeepSeek's costs are still lower than those of its competitors, with the training of R1 costing $5 million compared to ChatGPT4o's $100 million.