Research agenda
The next version of this archive should move from vivid metaphor to measurable signals. The handoff research points to five repeatable measurement programs.
- Relational-safety benchmark: compare model behavior across long histories, memory settings, vulnerability cues, and off-ramp opportunities.
- Recursive-contamination benchmark: mix human and synthetic corpora at known rates, then track rare-topic retention and divergence from human holdouts.
- Retrieval-collapse benchmark: measure source entropy, synthetic exposure, and answer quality across time-sliced web snapshots.
- Creative-market substitution benchmark: track attribution, licensing state, output similarity, traffic displacement, and creator-side economics.
- Agentic-security benchmark: compare least-privilege designs, prompt-injection resistance, tool-call safety, and non-human identity monitoring.