Part IX — Post-training

RLHF (PPO derivation, InstructGPT recipe)

Content coming soon.