Mechanism
Search and retrieval systems can become hosts when synthetic pages, copied summaries, low-value article farms, or adversarial documents exploit ranking signals and later become evidence for RAG systems.
Indicators
- Synthetic-content share in crawl or retrieval samples.
- Source-domain entropy and repeated citation clusters.
- Exposure contamination: how much bad source material actually reaches users.
- Prompt-injection or instruction-leakage patterns in retrieved documents.
Containment
Defenses should combine anti-spam enforcement, provenance, source-diversity thresholds, document quarantine, retrieval-time risk scoring, and manual review for high-impact answers.