Which sampling method selects the smallest set of tokens whose cumulative probability reaches the threshold?

Prepare for the GARP Risk and AI (RAI) Exam. Master concepts with flashcards and multiple-choice questions, each with hints and clarifications. Get exam-ready with extensive practice!

Multiple Choice

Which sampling method selects the smallest set of tokens whose cumulative probability reaches the threshold?

Explanation:
Top-P (Nucleus) Sampling focuses on the cumulative probability mass of tokens. It orders tokens by their probability from highest to lowest and keeps adding tokens until the total probability reaches a predefined threshold. The smallest group of tokens that reaches or exceeds that threshold becomes the set from which the next token is sampled. This approach adapts to how confident the model is in its choices: in common contexts, only a few tokens are needed; in uncertain moments, more tokens may be included. For example, imagine the top tokens have probabilities 0.4, 0.3, 0.15, 0.08, and 0.07, and the threshold is 0.8. The running sums are 0.4, 0.7, 0.85. Once you include the third token, you’ve reached or surpassed 0.8, so the nucleus consists of the first three tokens, and sampling occurs within that small set. This differs from Top-K, which fixes a specific number of top tokens regardless of their combined mass. Temperature changes the shape of the distribution but doesn’t define which tokens are considered; Statelessness is unrelated to how candidates are selected for sampling.

Top-P (Nucleus) Sampling focuses on the cumulative probability mass of tokens. It orders tokens by their probability from highest to lowest and keeps adding tokens until the total probability reaches a predefined threshold. The smallest group of tokens that reaches or exceeds that threshold becomes the set from which the next token is sampled. This approach adapts to how confident the model is in its choices: in common contexts, only a few tokens are needed; in uncertain moments, more tokens may be included.

For example, imagine the top tokens have probabilities 0.4, 0.3, 0.15, 0.08, and 0.07, and the threshold is 0.8. The running sums are 0.4, 0.7, 0.85. Once you include the third token, you’ve reached or surpassed 0.8, so the nucleus consists of the first three tokens, and sampling occurs within that small set.

This differs from Top-K, which fixes a specific number of top tokens regardless of their combined mass. Temperature changes the shape of the distribution but doesn’t define which tokens are considered; Statelessness is unrelated to how candidates are selected for sampling.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy