Interpreting The Bradley-Terry Model for Preferences

Bradley-Terry (BT) Model

p^{*} (y_{1} > y_{2} ∣ x) = \frac{exp ( r ( x , y _{1} ))}{exp ( r ( x , y _{1} )) + exp ( r ( x , y _{2} ))}

Interpretation

This equation models the probability that people prefer one response ( $y_{1}$ ) over another ( $y_{2}$ ) when given a specific prompt ( $x$ ).

A latent (unknown) reward function $r^{*} (x, y)$ : This function quantifies “how good” a response $y$ is for a given prompt $x$ . This underlying “goodness” is what the model tries to estimate.
Exponential transformation: The $exp (\cdot)$ function ensures that the reward values are always positive. This is important because probabilities must be non-negative.
Softmax: The overall structure of the equation resembles a softmax function, which takes the exponentially transformed reward values and normalizes them into a probability ranging between 0 and 1. This allows us to interpret the output as a clear preference probability.

(Rafailov et al., 2024, p. 3)

Reference

Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D., & Finn, C. (2024). Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. arXiv. https://doi.org/10.48550/arXiv.2305.18290

JGKYM

Recent Notes

Stirling's Approximation

UTF-8 Encoding

Unicode

Understanding Debouncing in Programming

Resolving CUDA Initialization Errors with Accelerate in Kaggle Notebooks

Interpreting The Bradley-Terry Model for Preferences

Bradley-Terry (BT) Model

Interpretation

Reference

Graph View

Table of Contents

Backlinks