As always, we addressed topic suggestions from our team at this point. This month’s topic request comes from our Managing Director Caro and deals with the current hype surrounding DeepSeek-R1.
When dealing with the Chinese language model DeepSeek-R1, one property in particular immediately catches the eye: the immense price advantage compared to other models. And that with competitive performance. This competitive price is made possible by an extremely efficient training method and model architecture that was applied to the model. DeepSeek-R1 has caused a real hype and is presented in the media almost as a quantum leap. But what is actually behind it?
Much ado about nothing?
In fact, DeepSeek-R1 is not a new technology, but merely an impressive combination of already known technologies, such as the Mixture of Experts approach.
This innovative combination enables the model to be operated on comparatively inexpensive hardware. And: DeepSeek-R1 has perfected reinforcement learning. In simple terms, the results produced by DeepSeek-R1 are transferred to the next model version, thus achieving an enormous improvement in the “intelligence” of that next generation. OpenAI, for example, currently requires much more effort to significantly improve its own models. Which is why the company is coming under increasing pressure from DeepSeek-R1 and the publication of the associated technical details.
With all the excitement, it should not be forgotten that there have been almost weekly innovative developments in the field of language models in recent years – from companies, universities or state-driven actors. One example of this is the Teuken7B model. Neither the open source approach nor the basis of DeepSeek-R1 are really novel.
Rather, the development of DeepSeek-R1 is a reaction to the sanctions of the USA – China lacks powerful hardware.
DeepSeek-R1 is by no means the first open source language model to have made it to the top of the comparison table for a short period of time.
DeepSeek-R1 can be used by anyone – even without operating it on their own hardware. However, as with other models, the data entered is passed on to the manufacturer. In addition, one should be aware that the stored knowledge is partly politically colored. A circumstance, however, that has also been observed in part with other models. So DeepSeek-R1 has above all achieved one thing: finding a cost-effective and highly efficient way to train and improve complex language models. And it has shown this way to the public. It can therefore be assumed that the development in this area will pick up even more speed due to the publication of DeepSeek-R1. However, it will probably not be enough for a lasting advantage over the competition.