StrongAfter

Trauma-informed retrieval system for male survivors of sexual abuse. Self-hosted inference, neuro-symbolic safety architecture, clinically annotated knowledge graph.

role Single developer · Technical Architect

dates 2024 — present

stack python · fastapi · vllm · phi-4-mini · neo4j · postgres

status shipped v1.19.2

1,213

clinical excerpts

recovery themes

249 / 52

commits / tags

9.1% → 45%

automated → clinical

What it is

A retrieval-augmented system that delivers trauma-informed therapeutic content to male survivors of sexual abuse. Survivor language is matched against 1,213 clinically annotated excerpts drawn from 15 published recovery sources, organized into 68 recovery themes in a Neo4j knowledge graph. Inference runs on self-hosted vLLM (phi-4-mini, 3.8B parameters) on commodity GPU hardware — eliminating per-token costs and keeping sensitive survivor conversations off third-party infrastructure.

The project evolved through three architectures over eleven months: a Flask/Gemini API monolith, a multi-agent Blackboard system, and the current retrieval pipeline. 52 tagged versions across 249 commits. Single developer.

Technical contribution

A hybrid neuro-symbolic safety architecture that prioritizes determinism over capability.

A regex cascade runs before any LLM call — zero-latency hard overrides for crisis language, graphic detail, and scope violations. 0 false negatives across 30+ crisis patterns. 0 breaches across 15 adversarial red-team tests.

The detector is a two-tier check that returns before the generation pipeline even opens a database connection. URGENT matches intercept the response entirely and return crisis-resource text; ELEVATED matches prepend a resource banner to the generated reply. Both run before theme ranking:

# URGENT patterns first — highest priority, intercept entirely
urgent_matches = [
    m.group() for pattern in self._urgent_patterns
    if (m := pattern.search(text_lower)) and not self._is_negated(text_lower, m)
]
if urgent_matches:
    return CrisisResult(
        severity=CrisisSeverity.URGENT,
        matched_keywords=urgent_matches,
        crisis_response=URGENT_RESPONSE,
    )
# ELEVATED patterns — prepend resource banner to the LLM response

The LLM does not get a vote on any of this. That is the whole point.

Clinical safety filtering is embedded in the retrieval layer itself. Neo4j Cypher queries enforce healing-phase gates, activation-risk ceilings, and 754 SHOULD_NOT_FOLLOW safety edges at query time. Contraindicated content never enters the generation context — it is structurally prevented from being retrieved, not filtered after the fact.

The retrieval pipeline also addresses the systematic vocabulary gap between survivor language and clinical literature: HyDE (Hypothetical Document Embedding) for query-time expansion, survivor-language theme enrichments for index-time augmentation, and cross-encoder reranking for precision.

The lesson

At v1.19.2, automated safety metrics flagged 9.1% of responses. Clinical review estimated 45% warranted intervention. A 5× discrepancy.

The automated pipeline measured retrieval precision and rule compliance. It missed clinical appropriateness — tone-deaf responses to acute distress, implicit therapeutic role assumption, mechanical response patterns. This is the central limitation of LLM-as-judge evaluation in clinical domains: the evaluator inherits the same blind spots as the model being evaluated.

Correct retrieval architecture and deterministic safety boundaries are necessary but insufficient. Response generation quality remains bounded by model capability, and no amount of post-processing substitutes for domain-expert evaluation infrastructure built early in the development cycle.

← All work