Definition 1
We say that two reward functions and are equivalent iff for some function . — Rafailov et al. (2024), p. 5
According to the definition, two reward function and , are considered equivalent if their difference depends only on the prompt and not on the response . This difference is expressed as a function .
For example, if a given prompt , there are five possible responses , the difference between two equivalent reward functions and , will be the same for all possible responses, as shown below:
This means that for the same prompt , the difference between the values of the two reward functions will always remain a constant value, , regardless of which response is generated.
See also
Understanding Lemma 1 in DPO
Understanding Lemma 2 in DPO