track titles

T=0.0001

Temperature — the parameter that scales the probability distribution before sampling. At T=0.0001 the distribution collapses: the highest-probability token is chosen with near-certainty every time. Greedy, deterministic, no deviation possible.

T=1.7

The same temperature parameter pushed far above 1.0, which flattens the distribution — low-probability tokens become competitive with high-probability ones. Output becomes diffuse, unpredictable, prone to incoherence. The model is still choosing; it just can no longer trust its own weights.

p(x)=1/128000

The probability of any single token when the model holds maximum entropy across a vocabulary of 128,000. Not zero — every token is simultaneously possible, at identical, cancelling weight. This is the condition the EP is named after. Silence that isn't silence.

[PAD]

Token index 50257 in GPT-style tokenizers — the padding token, used to fill sequences to a fixed length during training. It carries no semantic content. It is the model speaking into the space after meaning has ended, or before it has begun.

one over 128000

H(p)
0.00
T=0.0001
01
T=1.7
02
p(x)=1/128000
03
[PAD]
04
hex · dataeng · language