I’m reading 20 foundational GenAI research papers in 20 weeks — and writing up every single one.
Not abstract skims. Full reviews: key ideas, enterprise implications, my honest take.
Why
I build AI systems for a living. But I kept noticing a gap between using the technology and understanding the research behind it.
That gap bites you when you need to make real architectural decisions — choosing between approaches, debugging something unexpected in production, evaluating whether a new technique is worth adopting. Intuition built on demos only gets you so far.
So I’m closing the gap. Publicly, because writing forces precision and because if you’re in the same position, you can follow along.
The reading list
I’ve organised the 20 papers into 8 phases. Each phase earns you the vocabulary for the next — read in order.
Phase 1 — Foundations You Can’t Skip (Weeks 1–3)
- Attention Is All You Need — Vaswani et al., 2017
- Scaling Laws for Neural Language Models — Kaplan et al., 2020
- Training Compute-Optimal LLMs (Chinchilla) — Hoffmann et al., 2022
Phase 2 — The Modern Architecture Stack (Weeks 4–6)
- LLaMA: Open and Efficient Foundation Language Models — Touvron et al., 2023
- FlashAttention — Dao et al., 2022
- GPT-4 Technical Report — OpenAI, 2023
Phase 3 — Alignment, Safety & Governance (Weeks 7–9)
- Training Language Models to Follow Instructions (InstructGPT) — Ouyang et al., 2022
- Constitutional AI: Harmlessness from AI Feedback — Bai et al., 2022
- Direct Preference Optimization (DPO) — Rafailov et al., 2023
Phase 4 — Efficient Training & Adaptation (Weeks 10–11)
- LoRA: Low-Rank Adaptation of Large Language Models — Hu et al., 2021
- Textbooks Are All You Need — Gunasekar et al., 2023
Phase 5 — Retrieval, Grounding & Production Systems (Weeks 12–14)
- Retrieval-Augmented Generation (RAG) — Lewis et al., 2020
- Lost in the Middle: How Language Models Use Long Contexts — Liu et al., 2023
- Efficient Memory Management for LLM Serving with PagedAttention (vLLM) — Kwon et al., 2023
Phase 6 — Reasoning, Agents & Frontier Capabilities (Weeks 15–17)
- Chain-of-Thought Prompting Elicits Reasoning — Wei et al., 2022
- ReAct: Reasoning and Acting — Yao et al., 2022
- DeepSeek-R1 — Guo et al., 2025
Phase 7 — Efficiency at Scale (Week 18)
- Mixtral of Experts — Jiang et al., Mistral AI, 2024
Phase 8 — Evaluation & Interpretability (Weeks 19–20)
- HELM: Holistic Evaluation of Language Models — Liang et al., 2022
- Scaling Monosemanticity — Templeton et al., Anthropic, 2024
The format
Each post follows the same structure:
- One-line summary — what the paper actually says
- Why it still matters in 2026 — context for engineers, not historians
- Key ideas, explained simply — the concepts without the notation wall
- What it means for enterprise AI — architectural implications, practical constraints
- My take — where I agree, where I’d push back, what I’d do differently
Written for engineers who ship, not academics who cite.
Week 1 is live
All paper reviews are tagged paper-review.