OpenAI has accused Chinese AI startup DeepSeek of potentially using its proprietary models to train DeepSeek's own AI systems12. OpenAI and Microsoft are investigating whether DeepSeek utilized OpenAI's API to incorporate its AI models into their offerings, potentially violating OpenAI's terms of service13.
Evidence of Distillation: OpenAI claims to have found indications that DeepSeek employed a technique called "distillation" (*) to train its AI models using OpenAI's technology12.
OpenAI has evidence that its models helped train China’s DeepSeek - The Verge 29.01.2025
OpenAI and Microsoft are investigating whether the Chinese rival used OpenAI’s API to integrate OpenAI’s AI models into DeepSeek’s own models, according to Bloomberg. The outlet’s sources said Microsoft security researchers detected that large amounts of data were being exfiltrated through OpenAI developer accounts in late 2024, which the company believes are affiliated with DeepSeek.
OpenAI told the Financial Times that it found evidence linking DeepSeek to the use of distillation (*)— a common technique developers use to train AI models by extracting data from larger, more capable ones.
President Donald Trump’s artificial intelligence czar David Sacks said “it is possible” that IP theft had occurred. “There’s substantial evidence that what DeepSeek did here is they distilled knowledge out of OpenAI models and I don’t think OpenAI is very happy about this,” Sacks told Fox News on Tuesday.
OpenAI says DeepSeek may have used its AI outputs 'inappropriately' to train new model - Business Insider Jan 29, 2025
Data Breaches: Microsoft's cybersecurity team reportedly discovered significant data breaches through OpenAI developer accounts in late 2024, suspected to be linked to DeepSeek1.
Cost Efficiency: DeepSeek has developed top-performing AI models using less-advanced chips at a fraction of the cost of US rivals like OpenAI, Google, and Meta2.
Terms of Service Violation: While developers can use OpenAI's API to integrate its capabilities, using the outputs to create competing models violates OpenAI's terms of service1.
Government Involvement: OpenAI has stated it will work closely with the US government to protect advanced AI models from exploitation by adversaries and competitors13.
Market Impact: The news of DeepSeek's affordable yet powerful AI model led to a decline in US tech company stock prices4.
(*) Distillation, according to IBM, refers to a machine-learning technique where the learning of a large pre-trained "teacher model" is transferred to a smaller "student model."2
Model distillation in AI training is a technique that transfers knowledge from a large, complex model (the "teacher") to a smaller, more efficient model (the "student"). This process aims to create a compact model that retains much of the performance of the larger model while being faster and less resource-intensive13.
The distillation process typically involves the following steps:
Training the teacher model: A large, sophisticated model is trained on a dataset to achieve high accuracy and performance14.
Generating soft targets: The teacher model produces probability distributions over classes (soft targets) for the training data, which contain more information than hard labels12.
Training the student model: The student model is trained to mimic the teacher's outputs, often using a combination of the original training data and the soft targets generated by the teacher12.
Knowledge transfer: The student model learns to replicate the teacher's decision-making process, including the relationships between inputs and outputs14.
Soft targets: These provide more nuanced information about the teacher's predictions, allowing the student to learn subtle patterns in the data14.
Feature-based distillation: The student model may learn from the teacher's internal features, minimizing the difference between their learned representations1.
Relation-based distillation: This advanced technique focuses on transferring the underlying relationships between inputs and outputs from the teacher to the student1.
Self-distillation: In some cases, a single model can act as both teacher and student, transferring knowledge from its deeper layers to shallower ones46.
Model distillation offers several benefits, including model compression, improved generalization, and the ability to deploy high-performing models on resource-constrained devices35. This technique has become particularly relevant with the proliferation of large language models (LLMs) and other massive AI systems48.
Update 29.01.2025