.-') _ .-') _ | |
( OO ) ) ( OO ) ) | |
.-----. ,--./ ,--,' ,--./ ,--,' | |
' .--./ | \ | |\ | \ | |\ | |
| |('-. | \| | )| \| | ) | |
/_) |OO )| . |/ | . |/ | |
|| |`-'| | |\ | | |\ | | |
(_' '--'\ | | \ | | | \ | | |
`-----' `--' `--' `--' `--' | |
lite.cnn.com - on gopher - inofficial | |
ARTICLE VIEW: | |
China’s DeepSeek shook the tech world. Its developer just revealed | |
the cost of training the AI model | |
Reuters | |
Updated: | |
4:33 AM EDT, Fri September 19, 2025 | |
Source: Reuters | |
Chinese artificial intelligence developer DeepSeek spent just $294,000 | |
on training its R1 model, much less than reported for US rivals, it | |
said in a paper that is likely to reignite debate over Beijing’s | |
place in the AI race. | |
The rare update from the Hangzhou-based company – the first estimate | |
it has released of R1’s training costs – appeared Wednesday in a | |
peer-reviewed article in the academic journal Nature. | |
DeepSeek’s release of what it said were lower-cost AI systems in | |
January prompted global investors to dump tech stocks as they worried | |
the new models could threaten the dominance of AI leaders including | |
Nvidia. | |
Since then, the company and its founder Liang Wenfeng have largely | |
disappeared from public view, apart from pushing out a few product . | |
Sam Altman, CEO of US AI giant OpenAI, said in 2023 that the training | |
of foundational models had cost “much more” than $100 million – | |
though his company has not given detailed figures for any of its | |
releases. | |
Training costs for the large language models powering AI chatbots refer | |
to the expenses incurred from running a cluster of powerful chips for | |
weeks or months to process vast amounts of text and code. | |
The Nature article, which listed Liang as one of the co-authors, said | |
DeepSeek’s reasoning-focused cost $294,000 to train and used 512 | |
Nvidia H800 chips. A previous version of the article published in | |
January did not contain this information. | |
Some of DeepSeek’s statements about its development costs and the | |
technology it used have been questioned by US companies and officials. | |
The H800 chips it mentioned were designed by Nvidia for the Chinese | |
market after the United States made it illegal in October 2022 for the | |
company to export its more powerful H100 and A100 AI chips to China. | |
US officials told Reuters in June that DeepSeek had access to “large | |
volumes” of H100 chips procured after US export controls were | |
implemented. Nvidia told Reuters at the time that DeepSeek had used | |
lawfully acquired H800 chips, not H100s. | |
In a supplementary information document accompanying the Nature | |
article, the company acknowledged for the first time it owns A100 chips | |
and said it had used them in preparatory stages of development. | |
“Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to | |
prepare for the experiments with a smaller model,” the researchers | |
wrote. After this initial phase, R1 was trained for a total of 80 hours | |
on the 512 chip cluster of H800 chips, they added. | |
Model distillation | |
DeepSeek also responded for the first time, though not directly, to | |
assertions from a top White House adviser and other US AI figures in | |
January that it had deliberately “distilled” OpenAI’s models into | |
its own. | |
The term refers to a technique whereby one AI system learns from | |
another, allowing the newer model to reap the benefits of the | |
investments of time and computing power that went into building the | |
earlier model, but without the associated costs. | |
DeepSeek has consistently defended distillation as yielding better | |
model performance while being far cheaper, enabling broader access to | |
AI-powered technologies. | |
DeepSeek said in January that it had used Meta’s open-source Llama AI | |
model for some distilled versions of its own models. | |
DeepSeek said in Nature that training data for its V3 model relied on | |
crawled web pages that contained a “significant number of | |
OpenAI-model-generated answers, which may lead the base model to | |
acquire knowledge from other powerful models indirectly.” But it said | |
this was not intentional but, rather, incidental. | |
OpenAI did not respond immediately to a request for comment. | |
<- back to index |