Deep Learning Foundations Starter Track
Overview
This page is for students who are interested in AI but are still at the very beginning. If you are not yet comfortable with deep learning, PyTorch, or neural networks, this page is for you.
The goal of this page is to help students build enough foundation to later study computer vision, multimodal AI, and edge AI with confidence.
Part I. Google Colab
Students who are completely new to deep learning do not need to install everything on their own computer first. A good way to start is to use Google Colab.
What to learn first in Colab
how to open a notebook
how to run a code cell
how to use GPU runtime
how to install a package with pip
how to upload a small file or connect Google Drive
Recommended resources
Google Colab: https://colab.research.google.com/
Colab FAQ: https://research.google.com/colaboratory/faq.html
PyTorch in Colab guide: https://pytorch.org/tutorials/beginner/colab.html
Weights & Biases: https://wandb.ai/site
W&B Docs: https://docs.wandb.ai/
W&B Tutorial: https://docs.wandb.ai/tutorials/
Part II. Lecture materials for beginning students
1. Deep Learning Zero To All / 모두를 위한 딥러닝 시즌 2
Website: https://deeplearningzerotoall.github.io/season2/
PyTorch page: https://deeplearningzerotoall.github.io/season2/lec_pytorch.html
Code: https://github.com/deeplearningzerotoall/PyTorch
Why start here: a clear and accessible starting point for students who are new to deep learning.
2. PyTorch Korean Tutorials
Website: https://tutorials.pytorch.kr/index.html
Beginner basics: https://tutorials.pytorch.kr/beginner/basics/intro.html
GitHub: https://github.com/PyTorchKorea/tutorials-kr
Why use this: a good next step for students who want to learn PyTorch in a more standard and up-to-date way.
3. 모두의 딥러닝 개정 2판
Code: https://github.com/taehojo/deeplearning-for-everyone-2nd
Why use this: useful for students who prefer learning from a book and following code step by step.
Part III. Papers and code
These are a small number of landmark papers that students should read after becoming comfortable with basic deep learning and PyTorch. The goal is not to read many papers at once. The goal is to begin practicing how to read important papers in deep learning and computer vision.
1. AlexNet (NeurIPS 2012)
Paper: ImageNet Classification with Deep Convolutional Neural Networks
Why read it: this is one of the papers that made deep learning for computer vision take off.
Focus on: ReLU, dropout, data augmentation, and large-scale image classification.
2. VGG (ICLR 2015)
Paper: Very Deep Convolutional Networks for Large-Scale Image Recognition
Why read it: this paper is simple and easy to follow, and it shows why deeper convolutional networks matter.
Focus on: repeated 3x3 convolutions, depth, and simple architecture design.
3. ResNet (CVPR 2016)
Paper: Deep Residual Learning for Image Recognition
Why read it: one of the most important papers in modern deep learning.
Focus on: the degradation problem, residual connections, and why skip connections help optimization.
4. U-Net (MICCAI 2015)
Paper: U-Net: Convolutional Networks for Biomedical Image Segmentation
Why read it: this is a very good first paper for understanding segmentation and encoder-decoder structure.
Focus on: contracting path, expanding path, skip connections, and localization.
5. Transformer / ViT
Paper 1: Attention Is All You Need
Paper 2: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Why read these: the original Transformer paper introduced an architecture based purely on attention, and ViT brought that idea into image recognition by treating images as sequences of patches.
Focus on: self-attention, token representation, patch embedding, and how ViT differs from CNNs.
Part IV. Self-check project
Build a Vision Transformer from scratch in PyTorch and train it on a small dataset
Goal
This project is intended for students who have already finished the lecture materials.
The goal is not to get the best score. The goal is to become comfortable with reading a paper, tracing an implementation, modifying a model, and observing what happens.
Reference code: https://github.com/tintn/vision-transformer-from-scratch
This repository is a simplified PyTorch implementation of the ViT paper and is designed to be easier to understand than a large production codebase.
What to do
Read the ViT paper at a high level. You do not need to understand every equation perfectly at first.
Run the reference implementation in Google Colab.
Identify the main building blocks:
patch embedding
positional embedding
multi-head self-attention
MLP block
classification head
Train the model on a small dataset.
Change one or two settings yourself, such as:
patch size
embedding dimension
number of transformer layers
number of heads
learning rate
Record the results with Weights & Biases (W&B).
Write a short memo answering:
What part was hardest to understand?
What changed when you modified the model?
Why is ViT different from CNNs?