DeepSeek V4: 1T-Parameter MoE on Domestic Chips, but Release Date Remains Uncertain

Sources told the Financial Times that DeepSeek is preparing to release V4, which would be its most ambitious model yet: a 1-trillion-parameter Mixture of Experts architecture with a 1-million-token context window, native multimodal capabilities, and an Apache 2.0 licence. The model is reportedly built using Huawei Ascend and Cambricon chips rather than Nvidia GPUs. Multiple predicted release dates have passed without a launch.

The uncertainty around timing does not diminish the significance. If V4 delivers on its reported specifications, it challenges several assumptions about what is possible outside the Nvidia ecosystem.

Why does the chip story matter?

Every frontier model to date has been trained on Nvidia GPUs. The US export controls on advanced chips were designed specifically to prevent Chinese AI labs from accessing the compute needed to train competitive models. DeepSeek V4, if built on Huawei Ascend and Cambricon chips as reported, would demonstrate that the export controls have not achieved their intended effect.

Huawei's Ascend 910B has been positioned as a domestic alternative to Nvidia's A100 and H100. Independent benchmarks show it trailing Nvidia hardware on raw performance, but the gap narrows when software optimisation compensates for hardware limitations. DeepSeek's engineering team has a track record of extracting exceptional performance from constrained hardware — the original DeepSeek V2 achieved competitive results with significantly less compute than Western counterparts.

For practitioners, the chip story matters because it affects the long-term supply dynamics of AI compute. If frontier models can be trained on non-Nvidia hardware, the GPU monopoly weakens, and compute pricing faces competitive pressure from multiple hardware ecosystems.

What do the technical specifications signal?

1 trillion parameters in MoE configuration. DeepSeek V2 and V3 pioneered efficient MoE architectures where only a fraction of parameters activate per inference. A 1T-parameter MoE model likely activates 100-200B parameters per forward pass, making it comparable in inference cost to a dense model of that size while having access to a much larger knowledge base. The architecture trades training cost for inference efficiency.

1 million token context. This matches GPT-5.4's context window and exceeds most other models. Long context enables processing entire codebases, full document collections, and extended multi-turn conversations without retrieval augmentation. For DeepSeek's core audience — developers and researchers — this is a practical feature that reduces pipeline complexity.

Native multimodal. Previous DeepSeek models were text-only or had multimodal capabilities bolted on. Native multimodal means image, video, and text understanding are trained together from the start rather than added as a second step. This typically produces more coherent cross-modal reasoning.

Apache 2.0. Consistent with DeepSeek's open-weight strategy. Free for commercial use, modification, and redistribution. This is the licensing choice that maximises adoption and community development at the expense of direct monetisation.

Why have release dates slipped?

The article cites sources saying DeepSeek planned to release V4 in early March, but the date passed. Several possibilities:

Training stability issues. Training trillion-parameter models on non-standard hardware introduces risks that Nvidia's well-optimised CUDA ecosystem handles more gracefully. Training runs at this scale fail frequently even on Nvidia hardware; novel chip architectures add failure modes.

Benchmark politics. DeepSeek's releases have been timed for maximum impact — V2 and V3 both launched with benchmark results that challenged Western frontier models. A delay could indicate that V4's benchmarks are not yet at the level DeepSeek wants to present.

External pressure. Chinese AI labs operate under regulatory and geopolitical constraints that can affect release timing. Government review processes, national security considerations, or coordination with chip suppliers could all introduce delays.

What should practitioners watch for?

When V4 does launch — and the consistent reporting from multiple sources suggests it will — evaluate it on three dimensions:

Benchmark versus real-world performance. DeepSeek's previous models showed strong benchmark results but sometimes underperformed expectations on practical tasks, particularly instruction following and safety. Test on your actual workload before committing.

Inference infrastructure availability. An open-weight 1T-parameter model requires substantial hardware to serve. Major inference providers will need to add DeepSeek V4 to their offerings before most teams can use it. Track which providers offer it and at what pricing.

Safety and content policies. Chinese AI labs operate under different content moderation frameworks than Western counterparts. Depending on your use case and regulatory environment, the model's built-in content policies may or may not align with your requirements. Open weights mean you can adjust, but the base behaviour matters for teams deploying without fine-tuning.

The broader significance of V4 extends beyond the model itself. If DeepSeek delivers a frontier-competitive model built on domestic Chinese chips under an open licence, it restructures the geopolitical landscape of AI development. The assumption that export controls create a durable compute advantage faces its most serious test yet.

Why does the chip story matter?

What do the technical specifications signal?

Why have release dates slipped?

What should practitioners watch for?

Share this briefing

Your daily AI update