Calendar
- Previous offerings: Fall 2024, Fall 2023, Spring 2023
- The schedule below is tentative and subject to change.
- All course materials can be found on Github.
- We do not have a reference textbook, but some lectures follow materials from Speech and Language Processing (JM below) and Dive into Deep Learning (D2L below).
Course Schedule
- Jan 22
-
- Text classification
- HW 1 outHW1 [pdf]
- Jan 29
-
- Distributed representation of words
- Neural network basics
- Additional readings
- Textbook: JM Ch6
- Original word2vec paper: Efficient estimation of word representations in vector space
- Feb 5
-
- Sequence modeling
- HW 1 due HW 2 out HW2 [pdf] [annotated slides]
-
- RNN and its variants
- Attention and Transformers
- Additional readings
- Textbook: D2L Ch9.4-9.7
- Original Transformer paper: Attention is all you need
- Feb 12
-
- Encoder-decoder models
- Decoding algorithms
- Additional readings
- Original attention paper: Neural Machine Translation by Jointly Learning to Align and Translate
- Feb 19
-
- Tokenization
- Architecture, objective, optimization
- Feb 26
-
- Guest lecture Efficient pretraining and finetuning by Haitian Jiang
- HW 2 due HW 3 out HW3 [pdf] [HuggingfaceTransformers]
-
- Flash attention
- Architecture: mixture-of-experts, multi-head latent attention
- Mixed precision training
- Additional readings
- Mar 5
-
- Guest lecture Scaling language models by Nick Lourie
-
- Language model basics
- Emergent capabilities
- Scaling laws
- Mar 12
-
- Post-training of language models (basics)
-
- Instruction tuning
- Reinforcement learning basics
- Mar 19
-
- Advanced RLHF techniques
- Alignment
- Mar 26
- Spring Break - No Lecture
- Apr 2
-
- Benchmarking and evaluation
- Apr 9
-
- Guest lecture retrieval-augmented LM by Sewon Min (UC Berkeley)
- HW 4 due
- Apr 16
-
- Guest lecture pretraining data by Hector Liu (MBZUAI)
- Apr 23
-
- Guest lecture LM agent by Yu Su (OSU)
- Apr 30
-
- Guest lecture Qwen models by Junyang Lin (Alibaba)