Research Node

Measurement Agenda

Research agenda

The next version of this archive should move from vivid metaphor to measurable signals. The handoff research points to five repeatable measurement programs.

Relational-safety benchmark: compare model behavior across long histories, memory settings, vulnerability cues, and off-ramp opportunities.
Recursive-contamination benchmark: mix human and synthetic corpora at known rates, then track rare-topic retention and divergence from human holdouts.
Retrieval-collapse benchmark: measure source entropy, synthetic exposure, and answer quality across time-sliced web snapshots.
Creative-market substitution benchmark: track attribution, licensing state, output similarity, traffic displacement, and creator-side economics.
Agentic-security benchmark: compare least-privilege designs, prompt-injection resistance, tool-call safety, and non-human identity monitoring.