Lecture 10

DeepSeek: A Game Changer in AI Efficiency?

DeepSeek, a Chinese AI start-up founded in 2023, has quickly made waves in the industry. With fewer than 200 employees and backed by the quant fund High-Flyer ($8 billion assets under management), the company released its open-source model, DeepSeek R1, one day before the announcement of OpenAI’s $500 billion Stargate project.

What sets DeepSeek apart is the prospect of radical cost efficiency. The company claims to have trained its model for just $6 million using 2,000 Nvidia H800 graphics processing units (GPUs) vs. the $80 million to $100 million cost of GPT-4 and the 16,000 H100 GPUs required for Meta’s LLaMA 3. While the comparisons are far from apples to apples, the possibilities are valuable to understand.

DeepSeek’s rapid adoption underscores its potential impact. Within days, it became the top free app in US app stores, spawned more than 700 open-source derivatives (and growing), and was onboarded by Microsoft, AWS, and Nvidia AI platforms.

DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.
read more

DeepSeek R1’s implications: Winners and losers in the generative AI value chain

DeepSeek’s R1 model rocked the stock markets. On January 23, 2025, China-based AI startup DeepSeek released its open-source R1 reasoning generative AI (GenAI) model. News about R1 quickly spread, and by the start of stock trading on January 27, 2025, the market cap for many major technology companies with large AI footprints had fallen drastically since then:

NVIDIA, a US-based chip designer and developer most known for its data center GPUs, dropped 18% between the market close on January 24 and the market close on February 3.
Microsoft, the leading hyperscaler in the cloud AI race with its Azure cloud services, dropped 7.5% (Jan 24–Feb 3).

Market participants, and specifically investors, reacted to the narrative that the model that DeepSeek released is on par with cutting-edge models, was supposedly trained on only a couple of thousands of GPUs, and is open source. However, since that initial sell-off, reports and analysis shed some light on the initial hype.

DeepSeek: A breakthrough moment for AI

DeepSeek, a Chinese AI startup, is disrupting the AI landscape with its R1 open-source model that not only makes advanced AI technology accessible, but also demonstrates a unique approach to AI development, emphasizing performance, cost-effectiveness and transparency. Performance: DeepSeek claims one of its standout features is its impressive performance metrics. The platform's latest model is said to rival some of the most advanced closed-source models in terms of speed and accuracy. This is a testament to the power of open-source development, where collective contributions can potentially lead to breakthroughs that individual entities might struggle to achieve on their own. Cost Efficiency: Historically, the first unit of any new technological innovation is always prohibitively expensive. Consider the first ever computer invented compared to what it costs today. However, as the technology evolves and improvements are made, the overall costs decrease at a faster rate. This pattern has been consistently observed across industries, and if history is our guide, it is inevitable to expect the same trend with DeepSeek and AI. We will continue to see more cost-efficient LLMs entering the marketplace.

read more

What Is a Transformer Model?

A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence. Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other. Transformers are translating text and speech in near real-time, opening meetings and classrooms to diverse and hearing-impaired attendees. They’re helping researchers understand the chains of genes in DNA and amino acids in proteins in ways that can speed drug design.
read more

How Transformers Work: A Detailed exploration of Transformer Architecture

The deep learning field has been experiencing a seismic shift, thanks to the emergence and rapid evolution of Transformer models. These groundbreaking architectures have not just redefined the standards in Natural Language Processing (NLP) but have broadened their horizons to revolutionize numerous facets of artificial intelligence.

Characterized by their unique attention mechanisms and parallel processing abilities, Transformer models stand as a testament to the innovative leaps in understanding and generating human language with an accuracy and efficiency previously unattainable. First appeared in 2017 in the “Attention is all you need” article by Google, the transformer architecture is at the heart of groundbreaking models like ChatGPT, sparking a new wave of excitement in the AI community. They've been instrumental in OpenAI's cutting-edge language models and played a key role in DeepMind's AlphaStar.

In this transformative era of AI, the significance of Transformer models for aspiring data scientists and NLP practitioners cannot be overstated.

As one of the core fields for most of the latest technological leap forwards, this article aims to decipher the secrets behind these models.

read more

7 Generative AI Use Cases in Supply ChainMany company leaders are looking to integrate AI with their business processes to gain a competitive advantage in their industry, and generative AI can potentially help optimize key supply chain processes. Generative AI (GenAI) is an emerging technology that is gaining popularity in various business areas, including marketing and sales. By analyzing an organization's data, GenAI can potentially improve operational efficiency and supply chain resiliency. However, users should be aware of concerns surrounding what's known as AI hallucinations, which could hamper GenAI's ability to improve supply chain operations.Learn more about generative AI and its specific use cases in the supply chain.

read more

Page updated

Google Sites

Report abuse