Less than two weeks ago, a scarcely known Chinese company released its latest artificial intelligence (AI) model and sent shockwaves around the world.
DeepSeek claimed in a technical paper uploaded to GitHub that its open-weight R1 model achieved comparable or better results than AI models made by some of the leading Silicon Valley giants — namely OpenAI’s ChatGPT, Meta’s Llama and Anthropic’s Claude. And most staggeringly, the model achieved these results while being trained and run at a fraction of the cost.
The market response to the news on Monday was sharp and brutal: As DeepSeek rose to become the most downloaded free app in Apple’s App Store, $1 trillion was wiped from the valuations of leading U.S. tech companies.
And Nvidia, a company that makes high-end H100 graphics chips presumed essential for AI training, lost $589 billion in valuation in the biggest one-day market loss in U.S. history. DeepSeek, after all, said it trained its AI model without them — though it did use less-powerful Nvidia chips. U.S. tech companies responded with panic and ire, with OpenAI representatives even suggesting that DeepSeek plagiarized parts of its models.
Related: AI can now replicate itself — a milestone that has experts terrified
AI experts say that DeepSeek’s emergence has upended a key dogma underpinning the industry’s approach to growth — showing that bigger isn’t always better.
“The fact that DeepSeek could be built for less money, less computation and less time and can be run locally on less expensive machines, argues that as everyone was racing towards bigger and bigger, we missed the opportunity to build smarter and smaller,” Kristian Hammond, a professor of computer science at Northwestern University, told Live Science in an email.
But what makes DeepSeek’s V3 and R1 models so disruptive? The key, scientists say, is efficiency.
What makes DeepSeek’s models tick?
“In some ways, DeepSeek’s advances are more evolutionary than revolutionary,” Ambuj Tewari, a professor of statistics and computer science at the University of Michigan, told Live Science. “They are still operating under the dominant paradigm of very large models (100s of billions of parameters) on very large datasets (trillions of tokens) with very large budgets.”
If we take DeepSeek’s claims at face value, Tewari said, the main innovation to the company’s approach is how it wields its large and powerful models…
Click Here to Read the Full Original Article at Latest from Livescience…