Upcoming: Building an LLM from Scratch

One of the things that make me very curious until now is how a Large Language Model (LLM) works. I have studied and implemented tokenizers and multi-head attention, but I still want to dive deeper the full architecture of an LLM to the level of building the loss function. Moreover, I think this a better start than jumping straight into implementing current versions of attention mechanism, such as flash attention.

In this project, I will build GPT-2 from scratch.

I have worked on prompting and fine-tuning with LLMs, it's time to build a one myself (based on Sebastian Raschka's book "Build a Large Language Model (From Scratch)").

Page updated

Google Sites

Report abuse

Upcoming: Building an LLM from Scratch

Contact: haanhtran9698[at]gmail[dot]com