The probability of any single token when a language model holds maximum entropy. Not zero. Not silence. Every word simultaneously, at equal weight, cancelling.
Four tracks in the machine's own notation — float16 hex values, token indices, probability distributions — read back as spoken word. A second voice runs underneath: the same data, stripped further, looping independently, out of phase.
The a-side plays the human-readable version loud, the data underneath. The b-side inverts this. Neither explains the other.
DORKHOLM 2026
Temperature — the parameter that scales the probability distribution before sampling. At T=0.0001 the distribution collapses: the highest-probability token is chosen with near-certainty every time. Greedy, deterministic, no deviation possible.
The same temperature parameter pushed far above 1.0, which flattens the distribution — low-probability tokens become competitive with high-probability ones. Output becomes diffuse, unpredictable, prone to incoherence. The model is still choosing; it just can no longer trust its own weights.
The probability of any single token when the model holds maximum entropy across a vocabulary of 128,000. Not zero — every token is simultaneously possible, at identical, cancelling weight. This is the condition the EP is named after. Silence that isn't silence.
Token index 50257 in GPT-style tokenizers — the padding token, used to fill sequences to a fixed length during training. It carries no semantic content. It is the model speaking into the space after meaning has ended, or before it has begun.