← oshrinaparstek.com

Associative Memory & Cycle Gap

Do language models store memories like Hopfield networks? This experiment log follows a series of encode-decode experiments revealing that a small memory model exhibits attractor dynamics, semantic denoising, and factual error correction — behaving less like a text generator and more like an associative memory system.

cognitive offloading Hopfield networks cycle gap semantic denoising attractor dynamics

Contents

The Setup Cycle Gap as Escalation Signal Iterative Dynamics Semantic Denoising Error Correction The 2×2 Matrix Fixed Points & One-Pass Optimality Core Insights

The Setup

The model is a small "memory" system intentionally overfit on TriviaQA contexts: a frozen BGE-M3 encoder (1024-dim) feeds through a MultiEmbeddingAdapter (32 prefix queries with cross-attention) into a GPT-2 decoder (768-dim). It doesn't generalize — by design. It memorizes.

The key signal we're testing is Cycle Gap (CG): encode some text, generate from the encoding, re-encode the generated text, then measure how far the second encoding drifts from the first. If the model "knows" the input well, the cycle is tight. If the input is unfamiliar or corrupted, the cycle tears open.

We compare CG against the standard escalation signal: entropy (average per-token uncertainty during generation). The central question: can CG catch failures that entropy misses?

Exp 3–6Cycle Gap as Escalation Signal

The first experiments establish when CG works and when it doesn't. The critical variable: the input must be in-distribution. When it is, CG dominates entropy. When it isn't, both signals are blind.

Exp 3 — HotpotQA

Out-of-distribution: both signals fail

CG AUC ≈ 0.52, Entropy AUC ≈ 0.52. All HotpotQA inputs are OOD for this TriviaQA-trained model.

Exp 4 — Embedding interpolation

CG dominates at α = 0.7

CG AUC = 0.90 vs Entropy AUC = 0.71. CG detects "confident" interpolated inputs that entropy misses entirely.

Exp 5 — Question concatenation

Questions are OOD for a context-trained model

CG AUC = 0.51 (random). The model was trained on contexts, not questions — everything looks equally foreign.

Exp 6 — Context concatenation

CG wins on in-distribution data

CG AUC = 0.95 vs Entropy AUC = 0.87. CG catches 28 "confident hallucinations" — combined contexts where entropy is low but CG is high.

Key finding

CG detects confident hallucination — the dangerous case where the model is certain but wrong. In Experiment 6, 28 samples had low entropy (model "confident") but high CG (cycle torn open). Entropy alone would miss these entirely.

Exp 7–8cIterative Dynamics

What happens when you run the encode-decode cycle repeatedly? Text → encode → decode → text → encode → decode → ... The system reveals its dynamical structure.

Free cycling (Exp 8a)

Without anchoring, the system is dissipative. Each cycle adds sampling noise that accumulates. In-distribution contexts converge fastest (6.4 iterations on average), combined questions slowest (18.0). Semantic drift is real: "2001: A Space Odyssey" → Spielberg → Kubrick → Hitchcock — the system wanders through related attractors.

Driven cycling (Exp 8b)

When the original query drives each iteration (concatenated to the generated text), two failure modes emerge:

Collapse Model picks ONE topic, ignores the other. CG drops to single-query level. Partial but coherent.

Oscillation Model tries to mix both topics. CG stays elevated and fluctuates. Hallucination territory.

OOD queries reveal a blind spot (Exp 8c)

A surprise: completely out-of-distribution queries (cooking, physics, biology) stabilize at low CG — similar to correct single-topic queries. Why? The model drifts to the nearest training content. "Sourdough bread" → "sugary drink" → "kitchen" → "silk fiber." The generated text IS training data, so CG is low.

Insight

CG alone is insufficient. It measures "does the model know what it's generating" — not "is it answering the right question." You need a second signal: cos(generated, query) — the relevance between output and input. Together they give a three-way classification:

Pattern	CG	Relevance	Interpretation
✓ Reliable	Low	High	Confident and relevant
⚠ Hallucination	High / oscillating	Medium	Uncertain, mixing topics
✗ OOD drift	Low	Low	Confident but irrelevant

Exp 9Semantic Denoising

If the model has attractor dynamics, it should behave like a Hopfield network: recover clean patterns from noisy inputs. We test this by randomly replacing content words with random English words, then feeding the corrupted text through the encode-decode cycle.

96%

outputs closer to clean
@ 10% noise

86%

outputs closer to clean
@ 30% noise

54%

outputs closer to clean
@ 50% noise

The denoising is semantic, not lexical. The model doesn't copy surviving words — it generates correct replacements from learned associations:

Input: "effect on the horse. Breaking the Stalemate"

Output: "effect on the negotiations. Breaking the Stalemate" ✓

Input: "elect a marsh, Ralph"

Output: "choose a leader, Ralph" ✓

Hopfield analogy confirmed

Near an attractor (low noise), the model converges to the stored pattern. Far from any attractor (50% noise), it drifts to wrong patterns. Graceful degradation — the denoising ratio drops smoothly: 5.37 → 3.30 → 1.45 as noise increases. Each stored pattern has a basin of attraction with finite radius.

Exp 9b–9cError Correction

Two types of perturbation: rephrasing (different words, same facts) and factual errors (same words, wrong facts). Both are corrected 70% of the time — but through different mechanisms.

Rephrase correction (Exp 9b)

Human-quality paraphrases with completely different wording. The model converges toward the original 70% of the time. The strongest example: the Korean War passage — original and rephrase produce literally identical output despite completely different input wording. The weakest: specific narrative details with proper names.

Factual error correction (Exp 9c)

Same sentence structure, wrong names/dates/numbers. Also 70% correction — but the mechanism is different. Familiar structure acts as a scaffold that helps the model slot in correct facts:

Input: "...Roosevelt waited until they were seated and then said, 'I understand you're a banker, Mr. Roberts.'"

Output: "...Coolidge waited until they were seated and said, 'I understand you're a farmer, Mr. Roberts.'" ✓

cos(gen_error, gen_original) = 1.000 — identical output despite wrong input facts

Input: "They elect a leader, Simon, who, with the advice and support of Jack..."

Output: "elect a leader, Ralph, who, with the advice and support of Piggy..." ✓

Surprise: errors are easier to fix than rephrases

Factual errors: convergence 0.945, entropy ratio 2.8×.
Rephrases: convergence 0.919, entropy ratio 6.3×.
The model is more confident with familiar structure + wrong facts than with unfamiliar structure + correct facts. Structure scaffolds correction.

Exp 9dThe 2×2 Matrix

The hardest test: change both the wording and the facts simultaneously. If the two attractor systems (form and content) were independent, we'd expect 49% correction (0.7 × 0.7). We get 20%.

70% Same form, wrong facts

70% Different form, correct facts

20% Different form + wrong facts — the two systems fail together

The proof is the Coolidge pair: with familiar structure and wrong facts, the model produces perfect output (cos = 1.000). With unfamiliar structure and the same wrong facts, it fails (cos = 0.843). Losing form actively undermines factual correction. The attractor systems are coupled.

Non-additive failure

Form and content attractors reinforce each other. When both are perturbed, the model doesn't degrade gracefully — it collapses. Only technical terminology (Red Book, CD-ROM) survives the combined assault, suggesting an attractor robustness hierarchy: technical terms > proper names > narrative details.

Exp 10Fixed Points & One-Pass Optimality

Does iterating the correction cycle help? We run 8 iterations on each perturbed input.

One pass is optimal

The first encode-decode cycle does all the correction. Additional iterations always degrade. In 7–8 out of 10 cases across all perturbation types, quality decreases with more iterations. The system is dissipative — sampling noise accumulates.

But there's an exception. Strong-attractor content reaches true fixed points: CG = 0.000, identical output every cycle. These are genuine fixed points of the dynamical system — the model generates exactly the same text, character for character, on every subsequent iteration.

Content	Group	Fixed at iter	cos to original
Barriers / society	rephrase	1	0.991
Monkees	factual error	1	0.981
CD specifications	factual error	2	0.983
CD specifications	rephrase	3	0.974
Grand Prix	rephrase	4	0.952

CG after one iteration predicts which class a pattern belongs to. CG ≈ 0 means stable fixed point. CG > 0.02 means drift. This refines the Hopfield analogy: the system has deep basins (true fixed points for strong patterns) and shallow basins (dissipative for weak patterns).

Core Insights

1. The model is a semantic denoiser

At low noise, 96% of outputs converge to the original. The corrections are semantic — "horse" → "negotiations", "marsh" → "leader" — not lexical copying. The Hopfield analogy holds: encode-decode cycles have basins of attraction around stored patterns.

2. CG catches confident hallucination

CG beats entropy precisely when it matters most: when the model is confident but wrong. Entropy says "all good." CG says "the cycle is torn." This is the escalation signal the System 1/System 2 architecture needs.

3. Form and content are coupled attractor systems

Changing form OR content alone: 70% recovery. Changing both: 20% — far worse than the 49% predicted by independence. Familiar structure scaffolds factual correction. This explains why LLMs struggle with novel framings of familiar facts.

4. Attractor robustness hierarchy

Technical terminology (Red Book, CD-ROM) > Proper names (Ralph, Coolidge) > Narrative details (dates, descriptions). Conceptually distinctive terms form the deepest basins of attraction.

5. One pass is all you get

The first encode-decode cycle extracts all available correction. More iterations add noise. But strong attractors reach true fixed points (CG = 0) — genuinely stable states of the dynamical system. CG at iteration 1 predicts whether a pattern is in a deep or shallow basin.

Ongoing work — part of the cognitive offloading project at IBM Research.
Model: BGE-M3 → MultiEmbeddingAdapter → GPT-2, trained on TriviaQA.

← Back to oshrinaparstek.com