JGKYM

Recent Notes

  • Resolving CUDA Initialization Errors with Accelerate in Kaggle Notebooks

    Jun 18, 2025

  • How We Find ROIs

    Jun 10, 2025

  • Deformable ROI Pooling–A Flexible Approach to Feature Extraction

    Jun 10, 2025

  • The Core Idea of Supervised Contrastive Learning

    Jun 08, 2025

  • Three Main Types of Distributed Training

    Jun 04, 2025

See 25 more →

Tag: reinforcement-learning

11 items with this tag.

  • May 25, 2025

    Interpreting The Bradley-Terry Model for Preferences

    • alignment
    • reinforcement-learning
    • dpo
  • May 25, 2025

    Interpreting The Optimization in RL Fine-Tuning within RLHF

    • finetuning
    • reinforcement-learning
    • dpo
  • May 25, 2025

    Bradley-Terry Model Is Just A Logistic Function

    • alignment
    • reinforcement-learning
    • dpo
  • May 25, 2025

    Why DPO Instead of RLHF? RL Is Expensive

    • finetuning
    • reinforcement-learning
    • dpo
  • May 25, 2025

    Understanding Reward Modeling in RLHF

    • finetuning
    • reinforcement-learning
    • dpo
  • May 25, 2025

    Implicit Reward Functions in DPO

    • alignment
    • reinforcement-learning
    • dpo
  • May 25, 2025

    The Partition Function—Making RLHF Computations Difficult

    • alignment
    • reinforcement-learning
    • dpo
  • May 14, 2025

    Understanding Lemma 2 in DPO

    • alignment
    • reinforcement-learning
    • dpo
  • May 14, 2025

    Understanding Lemma 1 in DPO

    • alignment
    • reinforcement-learning
  • May 14, 2025

    Understanding The Equivalence Between Two Reward Models in DPO

    • alignment
    • reinforcement-learning
    • dpo
  • May 12, 2025

    Avoiding Explicit Reward Models in DPO

    • alignment
    • reinforcement-learning
    • dpo

Graph View

Created with Quartz v4.5.0 © 2025

  • GitHub