I · AI History I · AI Landscape II · Linear Algebra II · Probability & Statistics II · Optimization II · Information Theory III · Supervised Learning III · Unsupervised & Evaluation IV · Neural Network from Scratch IV · Practical Training IV · Convolutional Networks IV · RNN / LSTM / GRU V · Pre-Neural NLP V · Word Embeddings V · Tokenization VI · Attention is All You Need VI · Transformer Block VI · Positional Encodings VI · Attention Variants VI · Build nanoGPT VII · Pre-training Paradigms VII · Scaling Laws VII · Data Pipelines VII · Training Infrastructure VIII · Llama Family VIII · Mixture-of-Experts VIII · SSM / Mamba / RWKV VIII · Long-Context Techniques IX · SFT & PEFT IX · RLHF IX · DPO & Successors IX · Reasoning Post-training X · Inference 101 X · High-throughput Serving X · Speculative Decoding X · Quantization X · Production Stacks XI · Parallelism Strategies XI · Frameworks in Practice XII · GPU Architecture XII · Roofline & Perf Analysis XIII · CUDA Basics XIII · High-performance Kernels XIV · FlashAttention XIV · Kernel Authoring Stacks XV · NCCL & Collectives XV · LLM Training Cluster XVI · Vision-Language Models XVI · Generative Multimodal XVII · RL Foundations XVII · Policy Gradient XVIII · Agent Paradigms XVIII · Frameworks & Multi-Agent XIX · RAG Fundamentals XIX · Vector Databases XX · Alignment & Red-teaming XX · Mechanistic Interpretability XXI · Capability Benchmarks XXI · Chat / Preference Eval XXII · Roles & Career Ladders XXII · Interview Loop & Prep

Part X — Inference & Serving

High-throughput Serving (vLLM / PagedAttention)

Content coming soon.