Definition 1

We say that two reward functions and are equivalent iff for some function . — Rafailov et al. (2024), p. 5

According to the definition, two reward function and , are considered equivalent if their difference depends only on the prompt and not on the response . This difference is expressed as a function .

For example, if a given prompt , there are five possible responses , the difference between two equivalent reward functions and , will be the same for all possible responses, as shown below:

This means that for the same prompt , the difference between the values of the two reward functions will always remain a constant value, , regardless of which response is generated.

See also

Understanding Lemma 1 in DPO
Understanding Lemma 2 in DPO

Reference

Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D., & Finn, C. (2024). Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. arXiv. https://doi.org/10.48550/arXiv.2305.18290