DeepSeek, a Chinese AI start-up founded in 2023, has quickly made waves in the industry. With fewer than 200 employees and backed by the quant fund High-Flyer ($8 billion assets under management), the company released its open-source model, DeepSeek R1, one day before the announcement of OpenAI’s $500 billion Stargate project.
What sets DeepSeek apart is the prospect of radical cost efficiency. The company claims to have trained its model for just $6 million using 2,000 Nvidia H800 graphics processing units (GPUs) vs. the $80 million to $100 million cost of GPT-4 and the 16,000 H100 GPUs required for Meta’s LLaMA 3. While the comparisons are far from apples to apples, the possibilities are valuable to understand.
DeepSeek’s rapid adoption underscores its potential impact. Within days, it became the top free app in US app stores, spawned more than 700 open-source derivatives (and growing), and was onboarded by Microsoft, AWS, and Nvidia AI platforms.
DeepSeek’s R1 model rocked the stock markets. On January 23, 2025, China-based AI startup DeepSeek released its open-source R1 reasoning generative AI (GenAI) model. News about R1 quickly spread, and by the start of stock trading on January 27, 2025, the market cap for many major technology companies with large AI footprints had fallen drastically since then:
NVIDIA, a US-based chip designer and developer most known for its data center GPUs, dropped 18% between the market close on January 24 and the market close on February 3.
Microsoft, the leading hyperscaler in the cloud AI race with its Azure cloud services, dropped 7.5% (Jan 24–Feb 3).
Market participants, and specifically investors, reacted to the narrative that the model that DeepSeek released is on par with cutting-edge models, was supposedly trained on only a couple of thousands of GPUs, and is open source. However, since that initial sell-off, reports and analysis shed some light on the initial hype.
DeepSeek: A breakthrough moment for AI
DeepSeek, a Chinese AI startup, is disrupting the AI landscape with its R1 open-source model that not only makes advanced AI technology accessible, but also demonstrates a unique approach to AI development, emphasizing performance, cost-effectiveness and transparency. Performance: DeepSeek claims one of its standout features is its impressive performance metrics. The platform's latest model is said to rival some of the most advanced closed-source models in terms of speed and accuracy. This is a testament to the power of open-source development, where collective contributions can potentially lead to breakthroughs that individual entities might struggle to achieve on their own. Cost Efficiency: Historically, the first unit of any new technological innovation is always prohibitively expensive. Consider the first ever computer invented compared to what it costs today. However, as the technology evolves and improvements are made, the overall costs decrease at a faster rate. This pattern has been consistently observed across industries, and if history is our guide, it is inevitable to expect the same trend with DeepSeek and AI. We will continue to see more cost-efficient LLMs entering the marketplace.
read more
A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence. Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other. Transformers are translating text and speech in near real-time, opening meetings and classrooms to diverse and hearing-impaired attendees. They’re helping researchers understand the chains of genes in DNA and amino acids in proteins in ways that can speed drug design.
read more
Characterized by their unique attention mechanisms and parallel processing abilities, Transformer models stand as a testament to the innovative leaps in understanding and generating human language with an accuracy and efficiency previously unattainable. First appeared in 2017 in the “Attention is all you need” article by Google, the transformer architecture is at the heart of groundbreaking models like ChatGPT, sparking a new wave of excitement in the AI community. They've been instrumental in OpenAI's cutting-edge language models and played a key role in DeepMind's AlphaStar.
In this transformative era of AI, the significance of Transformer models for aspiring data scientists and NLP practitioners cannot be overstated.
As one of the core fields for most of the latest technological leap forwards, this article aims to decipher the secrets behind these models.
read more