JGKYM

Recent Notes

Resolving CUDA Initialization Errors with Accelerate in Kaggle Notebooks
Jun 18, 2025
How We Find ROIs
Jun 10, 2025
Deformable ROI Pooling–A Flexible Approach to Feature Extraction
Jun 10, 2025
The Core Idea of Supervised Contrastive Learning
Jun 08, 2025
Three Main Types of Distributed Training
Jun 04, 2025

See 25 more →

Tag: reinforcement-learning

11 items with this tag.

May 25, 2025
Interpreting The Bradley-Terry Model for Preferences
May 25, 2025
Interpreting The Optimization in RL Fine-Tuning within RLHF
May 25, 2025
Bradley-Terry Model Is Just A Logistic Function
May 25, 2025
Why DPO Instead of RLHF? RL Is Expensive
May 25, 2025
Understanding Reward Modeling in RLHF
May 25, 2025
Implicit Reward Functions in DPO
May 25, 2025
The Partition Function—Making RLHF Computations Difficult
May 14, 2025
Understanding Lemma 2 in DPO
May 14, 2025
Understanding Lemma 1 in DPO
- alignment
- reinforcement-learning
May 14, 2025
Understanding The Equivalence Between Two Reward Models in DPO
May 12, 2025
Avoiding Explicit Reward Models in DPO

Graph View

Created with Quartz v4.5.0 © 2025

GitHub