1/128000

track titles

T=0.0001

Temperature — the parameter that scales the probability distribution before sampling. At T=0.0001 the distribution collapses: the highest-probability token is chosen with near-certainty every time. Greedy, deterministic, no deviation possible.

T=1.7

The same temperature parameter pushed far above 1.0, which flattens the distribution — low-probability tokens become competitive with high-probability ones. Output becomes diffuse, unpredictable, prone to incoherence. The model is still choosing; it just can no longer trust its own weights.

p(x)=1/128000

The probability of any single token when the model holds maximum entropy across a vocabulary of 128,000. Not zero — every token is simultaneously possible, at identical, cancelling weight. This is the condition the EP is named after. Silence that isn't silence.

[PAD]

Token index 50257 in GPT-style tokenizers — the padding token, used to fill sequences to a fixed length during training. It carries no semantic content. It is the model speaking into the space after meaning has ended, or before it has begun.