Part IX — Post-training
RLHF (PPO derivation, InstructGPT recipe)
Content coming soon.