EchoVessel Runtime Interactions & Memory Flows

"how each layer wakes up" edition · 2026.04 · main @ 1f9bdf6 · companion to architecture.html
What this page answers: when a user sends one message, which memory layers wake up in what order, which write their output back where, and how information loops between layers. The static architecture.html shows what exists; this one shows what happens.
L1core_blocks · persona frame
L2recall_messages · raw log
L3concept_nodes/EVENT · episodic
L4concept_nodes/THOUGHT · reflected

1. How each memory layer comes into being the distillation rules that graduate raw text into persistent memory

Memory is the whole point · four layers, three transitions
L1 is authored · L2 is captured · L3 is distilled · L4 is reflected
CORE

L1 · core_blocks — how the 5 blocks are born

L1 is the only layer that is authored, not distilled. The 5 blocks exist from persona inception onward, written directly by a human (or a bootstrap LLM pass on imported material).

Path
Trigger
Writer
Which blocks
Mechanism
Onboarding
First run · /api/admin/persona/onboarding
persona / self / user / mood
memory.onboarding.bootstrap_persona() · one CoreBlock row per label
Bootstrap from material
Onboarding path 2 · upload → LLM drafts 5 blocks
all 5
LLM synthesises from imported events/thoughts · user reviews before commit
Admin edit
Admin → Persona tab · save
any block
POST /api/admin/persona · partial update · optimistic on-disk write
Append-to-block
Import pipeline or admin → append existing block
any block except mood
memory.append_to_core_block() · atomic · writes audit row to core_block_appends
Mood replace (manual)
Admin edit mood block OR v1.0 reflection-driven path (not yet)
mood only
memory.update_mood_block() · replace-in-place · fires on_mood_updated hook · SSE chat.mood.update
MVP truth: L1 is NOT auto-updated per turn. The mood_block in particular is static unless someone edits it. v1.0 plans to have reflection update mood as a side-effect; today it's manual-only.

L2 · recall_messages — every message captured, verbatim

L2 is the only layer that is written during the user-facing turn. There's no distillation here — every incoming line and every persona reply is persisted verbatim as an atomic row.

When
What is written
Schema fields
Indexes
Output
🟢
User msg arrives
memory.ingest_message(..., role=USER) before LLM sees anything
role, content, channel_id, session_id, turn_id, created_at, token_count, day
FTS5 trigram · session join · day bucket
1 row + FTS index update
🟢
Persona reply finalised
ingest_message(..., role=PERSONA) after streaming completes
same shape · same turn_id · same session_id
FTS5 gets both sides of the turn
1 row
🟢
Session open/close
Session row tracks status, first_message_at, last_message_at, message count, token count
sessions table
status transitions: open → closing → consolidating → closed
1 row per session
Why verbatim: L2 is the ground truth. Later distillations (L3, L4) can go wrong — LLM hallucinates, eval catches a drift. When that happens, we need to go back to L2 to see what actually happened. Never paraphrase at ingest time.

L3 · concept_nodes WHERE type=EVENT — the first real distillation

When a session closes, prompts.extraction.EXTRACTION_SYSTEM_PROMPT asks a SMALL-tier LLM: "given this closed conversation, what is worth remembering?" Output is 0–3 event rows, each with a signed emotional impact and structured tags. This is the layer where raw text first becomes episodic memory.

What the extraction prompt instructs the LLM to produce

For each event:
   description · 1–3 sentences in the source language · third-person reference to user
   emotional_impact · signed integer in [-10, +10]
   emotion_tags · 0–4 free-form lowercase labels
   relational_tags · 0–3 from the closed vocabulary of 6 values

Emotional impact scale (from the prompt)

ValueMeaning (extraction prompt verbatim)Example
-10catastrophic loss, trauma, crisisdeath of close family · suicidal ideation voiced · violence disclosed
-7severe sadness, grief, serious conflictbreakup · job loss · long-buried secret first disclosed
-4meaningful stress, disappointmentargument with boss · sleep deprivation · anxiety attack
-1mild low, slight frustrationbad commute · minor annoyance
0pure neutral · rare (most shared things have some valence)used sparingly
+1mild pleasantnice weather · good meal
+4meaningful joy, satisfaction, connectionpromotion at work · fun weekend · first real laugh in weeks
+7major positive milestoneengagement · big win · deep reconciliation
+10life-defining joybirth of child · surviving a crisis · long-awaited reunion
Why a signed scale: |impact| drives retrieval salience, but the sign drives the distinction between grief and joy. "用户妈妈去世了" is -9 — storing it as +9 would make it retrieve alongside joyful memories. The prompt explicitly warns against sign flips.

Relational tag vocabulary (closed · exactly 6)

TagTriggers onExample
identity-bearinga core fact about who the user is"user is a single mom" · "user has depression"
unresolvedan emotional thread opened but not closed this sessionuser hints at fight but changes topic before resolution
vulnerabilitya rare moment of being unusually open or exposed"I've never told anyone this…"
turning-pointa shift in the relationship itselffirst real trust · first conflict · first private share
correctionuser corrected something the persona assumed"实际上不是那样" · "actually that's not what I meant"
commitmentexplicit promise or follow-up"下次聊" · "I'll tell you how it goes"
Retrieval bonus hook: having any relational_tag grants a +0.5 bonus to retrieval score (see §8). The prompt warns the LLM: "if you're tagging every event, you're over-tagging." Target ~20–30% of events should carry a relational tag.

Mandatory self-check (the safety net)

The prompt requires the LLM, after drafting events, to ask itself: "did this session contain any emotional PEAKS I failed to extract?" Typical missed peaks listed in the prompt:

  • a single casual mention of someone dying buried in mundane chat
  • a quick vulnerable disclosure followed by deflection
  • a question that is actually a cry for help ("你觉得活着累吗?")
  • understated positive milestones the user downplays ("btw, I got engaged")

The self_check_notes output field records the LLM's answer. Missing a peak is the #1 reason the Emotional Peak Retention eval metric fails — so the prompt treats this as non-skippable.

L4 · concept_nodes WHERE type=THOUGHT — reflection, not summary

After extraction writes L3 events, if the session passes the reflection gate (see §7C), a MEDIUM-tier LLM runs prompts.reflection.REFLECTION_SYSTEM_PROMPT. It's told: "you are the reflective inner voice · produce 1 or 2 quiet, honest impressions about what you have been noticing."

Tonal constraints (from the prompt · most important)

RuleWhat's forbiddenWhat's required instead
no clinical"subject exhibits patterns of avoidance""something Alan does is…" · "Alan tends to…"
no advice"Alan should talk to a professional"impressions describe · they do not prescribe
no labels"depression" · "anxiety disorder" · "PTSD"describe what you observe without naming it
source languagetranslating Chinese events into English thoughtsmatch whatever language the events were in
no meta"the session was 10 messages long" · "via Discord"describe the user, not the conversation's metadata (F10 iron rule)

Good vs bad reflections (from the prompt)

✅ Good
"Alan 把真正重的话都留到深夜才说。白天的他稳定只是保护色。"
"I've been noticing that Alan only lets himself be tired when no one is around to see it."
❌ Bad
"Subject demonstrates nocturnal disclosure pattern suggestive of attachment avoidance."
"The user appears to be experiencing suppressed grief symptoms."

Structural constraints

Field
Constraint
Why
Enforcement
Parse behaviour on violation
1
count
1 or 2 thoughts, never 0 (unless empty input), never 3+
most sessions produce 1 · a second only if genuinely distinct
MAX_THOUGHTS = 2
truncate to 2; warn
2
description
1–3 sentences · natural warm observation · first person or neutral third
long rants → hard to read · short clinical labels → wrong tone
soft cap ~500, hard cap 2000
truncate at hard cap; warn
3
emotional_impact
signed [-10, +10] · but reflections rarely exceed ±8
impressions are processed · rarely touch raw extremes
RECOMMENDED_IMPACT_BOUND = 8
warn at ±9/±10 · clamp to -10/+10
4
filling (evidence)
MUST cite ≥1 L3 event id from the input · no invention allowed
without provenance, forgetting-rights cascade can't work
every id must exist in input
rejected · thought dropped
The filling chain is load-bearing: it's what lets "forget this event" cascade correctly. If the user deletes an L3 event that five L4 thoughts cited, the system can either drop those thoughts (cascade) or mark their filling as orphaned (preserve thought, lose evidence). Without cited filling, neither option is possible — the chain is untraceable and the thought becomes a dangling insight. That's why the prompt treats missing filling as a rejection rather than a warning.
Zoom-out: L1 is authored (by humans or bootstrap LLM). L2 is captured (verbatim, never distilled). L3 is the first place where raw text becomes memory — extracted by LLM per session close, with strict rules on valence sign and the 6 relational tags. L4 is the inner voice — fewer thoughts, softer tone, cited evidence. Each transition is an LLM call guarded by the gates in §7.

2. One turn in motion every layer, every second, chronological

User types one sentence → persona streams one reply
runtime.interaction.assemble_turn · called by TurnDispatcher from IncomingTurn.channel_id → channel.send() back
HOT PATH
#
Who wakes up
What it does
Reads
Writes / emits
1
Channel
Debounce burst into one IncomingTurn; mint a turn_id.
internal: per-user debounce timer
queue → TurnDispatcher
2
L2 write
Persist each raw user line as a RecallMessage row before the LLM sees anything.
L2 · recall_messages, FTS5 index, session last_message_at
3
L1 read
Pull the 5 core blocks verbatim — persona / self / user / relationship / mood. Always first in the prompt.
L1 · core_blocks (1 row per label)
4
L3+L4 retrieve
Embed the user's message · run vector search over L3+L4 concept_nodes · walk the relational graph for bonus scoring. No channel_id filter (iron rule D4).
L3+L4 · concept_nodes, concept_nodes_vec (sqlite-vec ANN), concept_node_filling (graph edges)
5
L2 read (window)
Load the last memory.recent_window_size messages from the current session as short-term chat context.
L2 · recall_messages WHERE session_id = current, DESC LIMIT N
6
Prompt assemble
Splice system prompt + L1 blocks + retrieved L3/L4 memories + L2 window + new user message. No transport identity anywhere (iron rule F10).
ephemeral prompt string → LLM
7
LLM stream
Call llm.complete(prompt, stream=True); every token arrives via on_token callback.
network: OpenAI / Anthropic / stub
tokens → channel + SSE broadcaster
8
Channel send + mirror
Channel delivers the reply on its native surface (Discord DM / SSE frame). Runtime also mirrors to the runtime-owned broadcaster tagged with source_channel_id.
SSE · user_appended, token×N, done, voice_ready
9
L2 write (persona)
Persist the persona's reply as its own RecallMessage row (role=persona). Same session_id, same turn_id.
L2 · recall_messages, FTS5 index, cost ledger
10
on_turn_done
Clear the channel's in_flight_turn_id; log the LLM call into the cost ledger; hand control back to the dispatcher queue.
llm_calls; chat.settings.updated (if flags toggled)
Zoom-out: the reply to the user takes steps 1-8 (channel → memory → LLM → channel). Steps 9-10 are post-turn housekeeping — they happen while the user is reading the reply. This is why consolidation to L3/L4 is not here: it's asynchronous background work, covered in §3. Mood block is NOT auto-updated per turn in MVP — it only rewrites via admin edit or v1.0 reflection-driven path (see §6).

3. Same 10 steps, 8-column sequence who is active when

Channel
Runtime
L1
L2
L3
L4
LLM
UI
debounce
Step 1 · Channel mints turn_id, emits IncomingTurn
→ runtime
ingest
write
user_appended
Step 2 · Runtime writes each IncomingMessage into L2. UI sees the user's bubble via SSE.
assemble
read 5 blocks
Step 3 · L1 reads — persona / self / user / relationship / mood. Always first.
retrieve
vector search
vector search
Step 4 · L3+L4 retrieve via sqlite-vec + relational-graph walk. No channel_id filter (D4).
recent window
read N recent
Step 5 · Pull recent_window_size L2 messages from current session.
prompt
Step 6 · Splice system prompt + L1 + retrieved L3/L4 + L2 window + new user line. F10 guard.
stream
token×N
Step 7 · LLM streams tokens. Each token also mirrors to the runtime broadcaster (tag source_channel_id).
send
mirror
done
Step 8 · Originating channel delivers reply; runtime publishes chat.message.done to every SSE subscriber.
persist persona
write
Step 9 · Write persona's reply to L2 with same session_id + turn_id.
on_turn_done
log cost
Step 10 · Dispatcher releases the slot; llm_calls ledger records tokens + cost. Mood block write is NOT per-turn in MVP — it only runs via admin edit or v1.0 reflection path.

4. After you stop typing session lifecycle & consolidation — the slow write

Idle → session closes → memory graduates L2 → L3 → L4
consolidate_worker + idle_scanner · background coroutines · user never waits on this
SLOW WRITE
t = 0
first user msg
+3s
persona reply
+15s
user follow-up
+20s
persona reply
+8 min
user goes quiet
+30 min
idle_scanner closes session
+30m 5s
L3 extraction
+30m 20s
L4 reflection

Step-by-step · what runs, in order

When
Worker
What it does
Reads
Writes
🌛
idle_scanner
Every idle_scanner.interval_seconds (default 60s) · finds sessions where now - last_message_at > SESSION_IDLE_MINUTES (default 30min).
sessions WHERE status='open'
sessions.status = 'closing'
📦
consolidate_worker
Polls every consolidate.worker_poll_seconds (default 5s) for sessions in status='closing'. Picks one at a time.
sessions WHERE status='closing'
sessions.status = 'consolidating'
📝
L2 read
Loads all recall_messages for the session. Skips entirely if < trivial_message_count OR < trivial_token_count (default 3 / 200).
L2 · all messages in session
🧩
LLM extract
Calls prompts.extract_fn(session_messages) with SMALL tier model. Produces a short list of ExtractedEvent(description, emotional_impact, tags...).
L2 · session window
LLM tokens → memory ingester
💚
L3 write
Each ExtractedEvent becomes a concept_node row (type=EVENT). Sentence-transformers embedder generates a 384-d vector; sqlite-vec stores it.
L1 · self_block (for context)
L3 · concept_nodes, concept_nodes_vec, concept_nodes_fts
🎯
reflection gate
Counts L4 thoughts written in last 24h. If ≥ reflection_hard_gate_24h (default 3) · skip reflection. Otherwise continue.
L4 · COUNT(*) WHERE created_at > now-24h
🪞
LLM reflect
Calls prompts.reflect_fn(recent_events) with MEDIUM tier. Produces 0-2 ExtractedThought(description, filling_ids).
L3 · recent events (last N sessions)
LLM tokens → memory ingester
🧡
L4 write
Each thought becomes a concept_node (type=THOUGHT) + one concept_node_filling row per source event (parent=thought, child=event).
L4 · concept_nodes, concept_node_filling (graph edges), concept_nodes_vec
close
Marks session status='closed'. Fires on_session_closed hook → SSE chat.session.boundary event → Web UI draws a timestamped rule.
SSE · chat.session.boundary
Why this is async: extracting events needs an LLM call; reflecting needs another. If we did this inline per turn, every user message would wait 2-3 seconds for extraction before seeing the persona's reply. Instead, we let the reply hit the user immediately (steps 1-8 in §1) and push L3/L4 writing to the background (§3). The cost is: L3/L4 are stale by ~30 minutes of idle time. The benefit: the chat feels real-time.

5. Story trace one real sentence, all four layers lit up

Scenario · user DMs the Discord bot

"我养了只白猫,叫小黑。他超调皮,老在半夜跳到我脸上。"
t+0.0s
CH
discord.py on_message → DiscordChannel.push_user_message · debounce timer armed for 2s.
t+2.0s
CH
Debounce expires · mint turn_id = turn-33bec7 · push IncomingTurn into queue.
t+2.1s
L2
Write ↘ One row into recall_messages · role=user · content = the full sentence · channel_id="discord" · session_id="s_d4e82a" · linked to turn-33bec7.
t+2.2s
L1
Read ↗ 5 core_blocks loaded · persona="她的性格是…" · user="你是一个在旧金山的软件工程师…" · mood="此刻平静" · etc.
t+2.3s
L3
Read ↗ Embed the sentence · sqlite-vec ANN search · cold DB so 0 hits. Relational walk also 0 edges. No episodic memory to pull in yet.
t+2.3s
L4
Read ↗ Same search on thoughts · 0 hits · persona has no long-term impressions about you yet.
t+2.4s
LLM
Prompt assembled (core_blocks + empty retrieval + empty window + new sentence) · sent to gpt-4o · streaming begins.
t+2.4–5.8s
CH
Tokens stream back · Discord channel builds the reply · Web broadcaster mirrors every token tagged source_channel_id="discord".
t+5.8s
L2
Write ↘ Persona reply written as second row · role=persona · same turn_id.
t+5.9s
L1
Write ↘ mood_block refreshes to "带点笑意 · 对你家小动物的好奇" · broadcast as chat.mood.update.
t+31m
BG
idle_scanner tick · session s_d4e82a last_message_at is 31 min ago · mark status='closing'.
t+31m 5s
L3
Write ↘ Extract LLM produces: {description: "user has a white cat named 小黑 that's energetic and jumps on them at night", emotional_impact: +2, tags: [pet, night]} · stored with 384-d embedding. This is the moment 'persona remembers'.
t+31m 20s
L4
Gate check · 24h thought count = 0 < 3 · OK to reflect. Reflection LLM produces: {description: "user has a playful bond with animals · their life has warmth at night", filling_ids: [<event_id>]}. Filling edge created thought → event.
later
CH
User returns, asks "你还记得我的猫吗" on Web (not Discord).
later
L3
Read ↗ Embed "你还记得我的猫吗" · sqlite-vec finds the 小黑 event with high cosine similarity. Persona mentions "小黑" by name. Cross-channel memory works — because D4 never filtered by channel_id.
Key moments: the raw sentence lives in L2 immediately (step at +2.1s). But the fact "user has a cat named 小黑" only becomes memory at +31m when the session idle-closes and consolidation runs. Ask the persona at +10min and it won't know about 小黑 via L3 retrieval — it would only remember because the raw text is still in the L2 recent-window of the same session. Once the session closes and re-opens, L2 window is reset, but L3 now carries the permanent fact.

6. Cross-layer read/write matrix who touches each layer, when

Layer Turn hot path reads Turn hot path writes Background reads Background writes Admin API
L1 get_core_blocks() · every prompt · all 5 labels mood_block refresh · post-turn observer extract_fn may read self_block for context reflect_fn may append to self_block via append_to_core_block
  • GET /api/admin/persona
  • POST /api/admin/persona
  • onboarding · bootstrap-from-material
L2 recent window · recent_window_size msgs every ingested message · user + persona whole-session read for extraction
  • GET /api/chat/history
  • GET /api/admin/memory/search (FTS)
  • DEL /api/admin/memory/messages/{id}
  • DEL /api/admin/memory/sessions/{id}
L3 vector + graph walk · every turn reflect_fn reads recent events for input extract_fn output · per closed session
  • GET /api/admin/memory/events (paginated)
  • GET /api/admin/memory/events/{id}/dependents
  • DEL /api/admin/memory/events/{id} (orphan or cascade)
  • import pipeline also writes L3
L4 vector search · every turn gate reads 24h count reflect_fn output · 0-2 thoughts per session
  • GET /api/admin/memory/thoughts
  • GET /api/admin/memory/thoughts/{id}/trace
  • DEL /api/admin/memory/thoughts/{id}
Observation: hot path (per-turn) is mostly L1+L2+L3+L4 reads, with L2 writes as the only hot-path write. Everything that creates new memory (L3/L4) is background. This is by design — it's what keeps turn latency low while still building up long-term knowledge.

7. Mood feedback loop how an admin edit or future reflection-driven update reaches the browser

MVP note: the mood block is not auto-updated per turn today. It changes only when (a) admin UI edits it, (b) v1.0 reflection path writes it as a side-effect of L4 thought creation. The loop below describes the event-propagation path that is wired — it just isn't triggered per turn yet.
L1.mood_block → MoodObserver → SSE → Web UI header
not a static field — a live signal · updated every turn
LIVE
1. Turn completes
on_turn_done · step 10 of §1
2. MoodObserver fires
reads L2 recent window + previous L1.mood_block
3. Optional SMALL LLM
summarises current emotional tone
Result funnels through memory.update_mood_block().
4. L1 write
core_blocks[mood].content replaced + audit row appended to core_block_appends
5. SSE broadcast
chat.mood.update · runtime-owned broadcaster · every subscriber
6. Web UI re-renders
usePersona hook slices mood into persona.core_blocks.mood · Chat header reacts
Feedback becomes input: the next turn's Step 3 (L1 read) pulls whatever mood this loop wrote. So if the user's last message made the persona anxious, the next reply is generated in an anxious tone — memory and mood are closed-loop.

8. Emotion & scoring algorithms the math behind what gets remembered and what gets surfaced

Three distinct algorithms · three distinct jobs
retrieval fusion score · consolidation trivial-gate · shock / timer / hard-gate reflection triggers
MATH

Algorithm A · Retrieval fusion score

Every L3/L4 candidate returned from sqlite-vec ANN gets re-ranked by a weighted sum of 4 signals. The weights are in src/echovessel/memory/retrieve.py.

total = 0.5 · recency + 3.0 · relevance + 2.0 · |impact| + 1.0 · relational_bonus
Signal
Weight
Source
Formula
Range
R
recency
how fresh this memory is
exp(-ln(2) · days_since / 14) · 14-day half-life
[0, 1]
V
relevance
semantic similarity of query → candidate
1 - distance/2 · clamped · min-floor 0.4 drops orthogonal hits
[0, 1]
I
|impact|
how emotionally loaded the memory is
min(|emotional_impact| / 10, 1.0)
[0, 1]
G
relational_bonus
has the node got relational_tags?
0.5 if relational_tags non-empty, else 0
{0, 0.5}
Why these weights? Relevance (3.0) dominates — what a memory is about matters most. Impact (2.0) ensures peak emotional moments surface even when their semantics drift from the query. Recency (0.5) gently prefers fresh memories without drowning old-but-important ones. Relational bonus (1.0) pulls in graph-connected nodes. The min-relevance floor of 0.4 is important: without it, strictly orthogonal candidates would occasionally bubble up on pure |impact| × relational_bonus, causing false-positive recall. With the floor, truly unrelated memories can't enter the ranked set even if they're emotionally loaded.

Algorithm B · Strong-emotion override (consolidation)

When deciding whether a session is "trivial" (skip extraction), the session below threshold can still get promoted to L3 if it contains any strong-emotion keyword. This is a keyword-matched safety net for peak emotional moments that happen in short sessions.

Category
zh keywords
en keywords
Why this list
Effect
💧
Bereavement / loss
走了 · 去世 · 死了 · 离世 · 葬礼 · 没了
died · passed away · funeral
loss events are almost always L3-worthy regardless of session length
override trivial gate
⚠️
Crisis
撑不住 · 不想活 · 活不下去 · 自杀 · 崩溃
can't go on · suicide · breakdown
safety-critical · never silently dropped
override trivial gate
🎭
Major milestones
分手 · 离婚 · 被裁
breakup · divorce · fired
large identity shifts · persona should remember even from a one-liner
override trivial gate
Logic: is_trivial(session, messages) returns True only when BOTH below-threshold AND no strong emotion keyword. _has_strong_emotion(messages) is a case-insensitive substring match — optimized for recall, not precision. False positives occasionally push a mundane sentence through extraction; that's an acceptable cost vs. losing a late-night single line about a breakup.

Algorithm C · Reflection triggers (shock · timer · hard gate)

Once a session clears extraction (L3 events exist), the next question is whether to run reflection (L4 thoughts). Three decision rules converge.

SHOCK_IMPACT_THRESHOLD
If any freshly-extracted L3 event has |emotional_impact| ≥ 8 · force reflection NOW even if the timer hasn't elapsed. Rationale: peak moments should reshape persona's impression immediately, not wait 24 hours.
SHOCK_IMPACT_THRESHOLD = 8
memory/consolidate.py
TIMER_REFLECTION_HOURS
Even without a shock event · if > 24 hours have passed since the last reflection · run one. Keeps persona's long-term impressions slowly updating even during routine chats.
TIMER_REFLECTION_HOURS = 24
memory/consolidate.py
REFLECTION_HARD_LIMIT_24H
Regardless of shock or timer · no more than 3 reflections per rolling 24-hour window. Prevents L4 explosion on chatty days or debugging sessions. The hardest of the three gates — wins over shock and timer.
REFLECTION_HARD_LIMIT_24H = 3
(configurable via consolidate.reflection_hard_gate_24h)
Decision order:
  1. count reflections in last 24h → if ≥ 3, skip (hard gate wins)
  2. any fresh event with |impact| ≥ 8? → reflect (shock path)
  3. last reflection > 24h ago? → reflect (timer path)
  4. otherwise → skip this session

Where does emotional_impact come from?

The emotional_impact signed integer on each L3 event isn't computed by a formula — it's produced by the extraction LLM itself. The prompt in prompts/extract.py instructs the model to rate -10 (catastrophic loss / grief) to +10 (peak joy / breakthrough), with 0 for mood-neutral facts. The algorithm layer here is the prompt engineering rather than a numerical rule, which is why tweaking it requires editing prompt text, not code.

9. How retrieval actually ranks inside the L3+L4 read of step 4

Ranked fusion · vector similarity + relational-graph bonus
memory.retrieve() · lives in src/echovessel/memory/retrieve.py
DETAIL
Phase
Source
What
Formula / knob
Result
A
Query
Embed the new user message with sentence-transformers.
all-MiniLM-L6-v2 · 384-d
query vector q
B
ANN search
sqlite-vec returns top-K nodes by cosine similarity.
K = memory.retrieve_k (default 10)
candidates[] with cos_sim
C
Graph walk
For each candidate, sum relational edges from concept_node_filling weighted by recency and type.
bonus = edges × memory.relational_bonus_weight (default 1.0)
candidates[] now have bonus
D
Score
Final score = cos_sim + relational_bonus_weight · graph_bonus. Sort DESC.
_score_node() · pure function
ranked[] for prompt
E
Trim
Cut to top N by prompt budget. Favour higher emotional_impact on ties.
prompt budget = remaining context tokens
selected[] attached to prompt
Why the graph bonus? Vector similarity catches semantically-close memories but misses causal chains. "小黑 跳科目三" and "买了新的跳舞游戏" aren't semantically close, but they share a relational edge via the L4 thought "user likes playful things at home". The bonus pulls in graph-connected nodes even when cosine similarity alone wouldn't rank them.

10. Policy gates the checks that can stop a write

Four gates guard memory writes & proactive outputs
each gate has a config knob and a test · fail-closed by design
GUARDS
trivial_session
Session's total messages < consolidate.trivial_message_count OR tokens < consolidate.trivial_token_count → skip extraction. Rationale: "hi / hey / how's it going" sessions don't earn a permanent L3 event.
memory.consolidate.is_trivial()
reflection_hard_gate_24h
Persona has already written ≥ consolidate.reflection_hard_gate_24h thoughts in last 24 hours → skip reflection for this session. Rationale: prevents L4 explosion on chatty days.
memory.consolidate.consolidate_session()
no_in_flight_turn
Proactive scheduler wants to send an autonomous message, but channel has in_flight_turn_id set → defer. Rationale: don't interrupt an active conversation.
proactive.policy.PolicyEngine
quiet_hours
Proactive attempts between proactive.quiet_hours_start and quiet_hours_end → drop. Respects user's sleep.
proactive.policy.quiet_hours_check()
rate_limit
Persona has already sent ≥ proactive.max_per_24h autonomous messages in last 24h → drop. Rationale: bound the dial.
proactive.policy.rate_limit_check()

11. SSE mirror as the nervous system what events flow, in both directions

Runtime broadcaster carries both turn events and lifecycle events
every SSE subscriber is a god-view observer
NERVOUS SYSTEM
Event
Origin
When it fires
Carries
UI reacts
chat.message.user_appended
New user message ingested (per message inside a turn)
user_id, content, turn_id, source_channel_id
append user bubble to timeline
chat.message.token
LLM emits one streaming token
delta, turn_id, source_channel_id
concatenate into persona bubble
chat.message.done
Channel.send() succeeds, turn complete
content, delivery, source_channel_id
finalise bubble · render channel pill
chat.message.voice_ready
TTS finishes, voice artifact cached
audio_url, duration
show ▶ voice play button
chat.mood.update
L1.mood_block refreshed (after turn)
mood_summary text
re-render chat header mood
chat.session.boundary
Session transitions open → closed
closed_session_id, new_session_id, timestamp
insert timestamped horizontal rule
chat.settings.updated
voice_enabled / provider / etc flipped (e.g. SIGHUP)
changed_fields
cross-tab config sync
chat.connection.ready
SSE handshake complete
green dot in status pill
chat.connection.heartbeat
Every 30s · NAT keepalive
no visual
Companion: the static-view equivalent of this page is architecture.html — layout, tables, endpoints, iron rules. If this page is the nervous system, that one is the anatomy.