What is the parameter 'Top P' in the context of LLMs?

Top P is a hyperparameter often used in Large Language Models (LLMs) for sampling techniques, also known as nucleus sampling. It restricts the model to select the next token only from the top P% of the cumulative probability distribution of candidate tokens.

It’s typically used alongside Temperature to control the randomness of the generated output.

Lower Top P values:
- Consider only a small number of high-probability tokens.
- Tend to produce more deterministic and factual responses.
Higher Top P values:
- Consider a wider range of candidate tokens.
- Can lead to the generation of more creative and diverse responses.

Reference

https://www.promptingguide.ai/introduction/settings

JGKYM

Recent Notes

Resolving CUDA Initialization Errors with Accelerate in Kaggle Notebooks

How We Find ROIs

Deformable ROI Pooling–A Flexible Approach to Feature Extraction

The Core Idea of Supervised Contrastive Learning

Three Main Types of Distributed Training

What is the parameter 'Top P' in the context of LLMs?

Reference

Graph View