China Telecom's AI research institute released TeleChat3-36B-Thinking in late December, a reasoning-focused language model that the company says was trained end-to-end on Huawei Ascend NPUs using the MindSpore framework. The model weights are available on Hugging Face.
The Hardware Story
What makes this interesting isn't the parameter count. It's the infrastructure.
TeleChat3 runs on Atlas 800T A2 training servers, Huawei's domestic alternative to Nvidia's data center GPUs. The model card explicitly calls out compatibility with Ascend hardware and MindSpore Transformers, Huawei's large-model training toolkit. According to China Telecom's technical report, earlier TeleChat2 models trained on clusters of 8,000 Ascend NPUs using a 4D parallelism strategy.
The timing matters here. In May 2025, the US Commerce Department issued guidance warning that using Huawei's Ascend 910B, 910C, and 910D chips could constitute a violation of American export controls. The statement initially claimed this applied "anywhere in the world" before the Commerce Department walked back that language. China's Commerce Ministry responded by threatening legal action against anyone who enforces those restrictions.
Architecture Borrowed From DeepSeek
TeleChat3's model card acknowledges a debt to DeepSeek's architecture work. The larger TeleChat3-105B-A4.7B variant uses Multi-Latent Attention, the memory-efficient technique DeepSeek introduced in their V2 model that compresses key-value vectors into a low-dimensional latent space before caching. MLA dramatically cuts KV cache requirements during inference, which is particularly useful for long-context applications.
The 36B dense model uses grouped-query attention instead, a more conventional choice. But the acknowledgment section in China Telecom's documentation explicitly thanks "the DeepSeek team" for architectural inspiration that contributed to training stability and efficiency.
The Benchmark Question
China Telecom published evaluation results across several benchmarks in thinking mode. Some numbers that caught my attention:
On SWE-Bench Verified, they claim 51% for the 36B model. That would put it ahead of both Qwen3-30B-A3B (21%) and Qwen3-32B (28%) according to their table. I'm skeptical. SWE-Bench numbers are notoriously sensitive to scaffolding and evaluation methodology, and we don't know if these were run under comparable conditions.
The AIME 2025 score of 73.3% matches GPT-OSS-120B exactly in their reported results. GPQA-Diamond at 70.56% beats Qwen3-32B's 68.4%. These are self-reported figures, and third-party verification would be useful.
What's missing: any comparison to DeepSeek-V3 or R1, the models whose architectural ideas they borrowed. That absence is conspicuous.
Why This Matters Beyond the Model
The TeleChat series represents something broader than one company's AI efforts. China Telecom has stated publicly that their models demonstrate "total self-sufficiency in domestic LLM training"—a claim that's as much political as technical.
Earlier in 2024, China Telecom announced a 1-trillion parameter model also trained on domestic hardware. That model used the same Ascend and MindSpore stack, with the Institute of AI at China Telecom calling it evidence that China had broken free of dependency on foreign semiconductors for large-scale AI training.
The reality is more complicated. Huawei's chip production faces constraints. A Council on Foreign Relations analysis from late 2025 estimated Huawei might produce between 300,000 and 400,000 AI chips in 2025, compared to Nvidia's projected 4-5 million. Even aggressive estimates put Huawei at roughly 5% of Nvidia's aggregate AI computing power.
But compute isn't everything. DeepSeek demonstrated that training efficiency matters enormously. If Chinese labs can squeeze more capability out of fewer chips through architectural innovation and training optimization, raw compute comparisons become less meaningful.
The Full-Stack Gambit
Huawei isn't just making chips. MindSpore, their deep learning framework, provides the software layer. It's designed for what Huawei calls "device-edge-cloud" deployment, running the same models across data centers, edge servers, and mobile devices. The framework supports automatic differentiation using source-to-source transformation rather than operator overloading, which Huawei claims enables better compile-time optimization.
Whether MindSpore can match PyTorch's ecosystem and developer adoption is another question. But for Chinese companies facing potential restrictions on US-origin software, having a domestic alternative matters for risk management if nothing else.
China Telecom and Huawei jointly launched what they're calling the first commercial "Ascend super node" earlier in 2025, aimed at large-scale AI training workloads. The companies are positioning this as infrastructure that Chinese firms can adopt wholesale, avoiding the compliance uncertainties that come with Nvidia hardware.
What Happens Next
TeleChat3-36B-Thinking is available now on Hugging Face and ModelScope. China Telecom recommends specific inference parameters for reasoning tasks: temperature between 1.1 and 1.2, repetition penalty at 1.0, top_p at 0.95. For general tasks, they suggest lowering temperature to 0.6 and raising repetition penalty to 1.05 to reduce repetitive outputs.
The GitHub repo includes integration with MindSpore Transformers for domestic hardware deployment. They've also released a larger MoE variant, TeleChat3-105B-A4.7B-Thinking, with 192 routed experts.
Whether these models see significant adoption outside China is unclear. The benchmark claims are interesting but unverified. The architecture borrows heavily from DeepSeek. What's genuinely novel is the demonstration that trillion-parameter-scale training is achievable on a fully domestic Chinese hardware and software stack, US export controls notwithstanding.
Bill Gates told CNN recently that American tech bans have "forced the Chinese in terms of chip manufacturing and everything to go full speed ahead." TeleChat3 is one data point suggesting he might be right.




