notesum.ai
Published at December 3Time-Reversal Provides Unsupervised Feedback to LLMs
cs.CL
cs.AI
Released Date: December 3, 2024
Authors: Yerram Varun, Rahul Madhavan1, Sravanti Addepalli2, Arun Suggala2, Karthikeyan Shanmugam2, Prateek Jain2
Aff.: 1Indian Institute of Science; 2Google DeepMind
| Model | Description |
| TRLM-Ba |
Pre-trained in the reverse token order for previous token prediction (Alg. 1 in the supplement). Instruction-tuned variant is FLaN fine-tuned (Longpre et al., 2023) in reverse token order. Scores the reversed question given a reversed answer combined with suitable prompts. Generates questions in the reverse direction when conditioned on answers in the reverse direction.
Scoring: (Reverse(Scoring Prompt+Query) | Reverse(Conditioning Prompt + Answer)) (Alg. 2 in the supplement). Generation: |
| TRLM-Fo |
Pre-trained in the usual forward token order. Scores Question given Answer using the prompt. Generates from the conditional distribution of an answer.
Scoring: (Query | Answer + Conditioning Prompt ) (Alg. 3 in the supplement) Generation: |
|
TRLM-FoBa
(Reverse) |
Pre-trained both in forward and reverse token order (Alg. 4 in the supplement). Understands text in both directions. Reverse version scores and generates identically to TRLM-Ba.
Scoring: Scores identically to TRLM-Ba. Generation: Generates identically to TRLM-Ba. |
|
TRLM-FoBa
(Forward) |
Pre-trained both in forward and reverse token order. Forward version scores and generates identically to TRLM-Fo.
Scoring: Scores identically to TRLM-Fo. Generation: Generates identically to TRLM-Fo. |
| Self Scoring |
The model that is used for generating a given response is also used for scoring responses given queries in the conventional forward scoring direction.
Scoring: We use the model’s own perplexity scores as feedback to select the responses. |
|
Forward
Baseline |
A conventional forward model trained for next-token prediction on the same training corpus and model class as TRLM .
Scoring: While self-scoring used the perplexity obtained from the generator model, in this setting, we use perplexity of a different forward model. |