EchoVessel — Runtime Interactions & Memory Flows

Memory is the whole point · six layers, three transitions

L1 authored · L2 captured · L3 distilled · L4 reflected · L5 entities resolved · L6 episodic state tracked

CORE

L1 · `core_blocks` — how the 3 blocks are born

L1 is the only layer that is human-authored. Three blocks — persona / user / style — exist from persona inception onward, written directly by a human (or a bootstrap LLM pass on imported material that the user reviews before commit).

Path

Trigger

Writer

Which blocks

Mechanism

①

Onboarding

First run · /api/admin/persona/onboarding

persona / user

memory.onboarding.bootstrap_persona() · one CoreBlock row per label

②

Bootstrap from material

Onboarding path 2 · upload → LLM drafts the blocks

all 3

LLM synthesises from imported events/thoughts · user reviews before commit

③

Admin edit

Admin → Persona tab · save

persona / user

POST /api/admin/persona · partial update · atomic on-disk write · extra='forbid' rejects legacy keys (422)

④

Style-only edit

Admin → Persona tab · style preferences

style

POST /api/admin/persona/style · the only writer for the style block

L1 is NEVER auto-updated. No code path inside slow_tick / consolidate / extraction may write to core_blocks — pinned by the tests/memory/test_slow_cycle.py "L1-never-auto-update invariant" test. Per-turn affect lives in L6 (personas.episodic_state), which consolidate writes through automatically from extraction's session_mood_signal. Persona reflection grows in L4 thoughts (subject='persona'); third-party people grow in L5 entities.description. Neither lands in L1.

L2 · `recall_messages` — every message captured, verbatim

L2 is the only layer that is written during the user-facing turn. There's no distillation here — every incoming line and every persona reply is persisted verbatim as an atomic row.

When

What is written

Schema fields

Indexes

Output

🟢

User msg arrives

memory.ingest_message(..., role=USER) before LLM sees anything

role, content, channel_id, session_id, turn_id, created_at, token_count, day

FTS5 trigram · session join · day bucket

1 row + FTS index update

🟢

Persona reply finalised

ingest_message(..., role=PERSONA) after streaming completes

same shape · same turn_id · same session_id

FTS5 gets both sides of the turn

1 row

🟢

Session open/close

Session row tracks status, first_message_at, last_message_at, message count, token count

sessions table

status transitions: open → closing → consolidating → closed

1 row per session

Why verbatim: L2 is the ground truth. Later distillations (L3, L4) can go wrong — LLM hallucinates, eval catches a drift. When that happens, we need to go back to L2 to see what actually happened. Never paraphrase at ingest time.

L3 · `concept_nodes` WHERE type=EVENT — the first real distillation

When a session closes, prompts.extraction.EXTRACTION_SYSTEM_PROMPT asks a SMALL-tier LLM: "given this closed conversation, what is worth remembering?" Output is 0–3 event rows, each with a signed emotional impact and structured tags. This is the layer where raw text first becomes episodic memory.

What the extraction prompt instructs the LLM to produce

        For each event:

           description · 1–3 sentences in the source language · third-person reference to user

           emotional_impact · signed integer in [-10, +10]

           emotion_tags · 0–4 free-form lowercase labels

           relational_tags · 0–3 from the closed vocabulary of 6 values

Emotional impact scale (from the prompt)

Value	Meaning (extraction prompt verbatim)	Example
-10	catastrophic loss, trauma, crisis	death of close family · suicidal ideation voiced · violence disclosed
-7	severe sadness, grief, serious conflict	breakup · job loss · long-buried secret first disclosed
-4	meaningful stress, disappointment	argument with boss · sleep deprivation · anxiety attack
-1	mild low, slight frustration	bad commute · minor annoyance
0	pure neutral · rare (most shared things have some valence)	used sparingly
+1	mild pleasant	nice weather · good meal
+4	meaningful joy, satisfaction, connection	promotion at work · fun weekend · first real laugh in weeks
+7	major positive milestone	engagement · big win · deep reconciliation
+10	life-defining joy	birth of child · surviving a crisis · long-awaited reunion

Why a signed scale: |impact| drives retrieval salience, but the sign drives the distinction between grief and joy. "用户妈妈去世了" is -9 — storing it as +9 would make it retrieve alongside joyful memories. The prompt explicitly warns against sign flips.

Relational tag vocabulary (closed · exactly 6)

Tag	Triggers on	Example
`identity-bearing`	a core fact about who the user is	"user is a single mom" · "user has depression"
`unresolved`	an emotional thread opened but not closed this session	user hints at fight but changes topic before resolution
`vulnerability`	a rare moment of being unusually open or exposed	"I've never told anyone this…"
`turning-point`	a shift in the relationship itself	first real trust · first conflict · first private share
`correction`	user corrected something the persona assumed	"实际上不是那样" · "actually that's not what I meant"
`commitment`	explicit promise or follow-up	"下次聊" · "I'll tell you how it goes"

Retrieval bonus hook: having any relational_tag grants a +0.5 bonus to retrieval score (see §8). The prompt warns the LLM: "if you're tagging every event, you're over-tagging." Target ~20–30% of events should carry a relational tag.

Mandatory self-check (the safety net)

The prompt requires the LLM, after drafting events, to ask itself: "did this session contain any emotional PEAKS I failed to extract?" Typical missed peaks listed in the prompt:

a single casual mention of someone dying buried in mundane chat
a quick vulnerable disclosure followed by deflection
a question that is actually a cry for help ("你觉得活着累吗？")
understated positive milestones the user downplays ("btw, I got engaged")

The self_check_notes output field records the LLM's answer. Missing a peak is the #1 reason the Emotional Peak Retention eval metric fails — so the prompt treats this as non-skippable.

L4 · `concept_nodes` WHERE type=THOUGHT — reflection, not summary

After extraction writes L3 events, if the session passes the reflection gate (see §7C), a MEDIUM-tier LLM runs prompts.reflection.REFLECTION_SYSTEM_PROMPT. It's told: "you are the reflective inner voice · produce 1 or 2 quiet, honest impressions about what you have been noticing."

Tonal constraints (from the prompt · most important)

Rule	What's forbidden	What's required instead
`no clinical`	"subject exhibits patterns of avoidance"	"something Alan does is…" · "Alan tends to…"
`no advice`	"Alan should talk to a professional"	impressions describe · they do not prescribe
`no labels`	"depression" · "anxiety disorder" · "PTSD"	describe what you observe without naming it
`source language`	translating Chinese events into English thoughts	match whatever language the events were in
`no meta`	"the session was 10 messages long" · "via Discord"	describe the user, not the conversation's metadata (F10 iron rule)

Good vs bad reflections (from the prompt)

✅ Good

"Alan 把真正重的话都留到深夜才说。白天的他稳定只是保护色。"

"I've been noticing that Alan only lets himself be tired when no one is around to see it."

❌ Bad

"Subject demonstrates nocturnal disclosure pattern suggestive of attachment avoidance."

"The user appears to be experiencing suppressed grief symptoms."

Structural constraints

Field

Constraint

Why

Enforcement

Parse behaviour on violation

count

1 or 2 thoughts, never 0 (unless empty input), never 3+

most sessions produce 1 · a second only if genuinely distinct

MAX_THOUGHTS = 2

truncate to 2; warn

description

1–3 sentences · natural warm observation · first person or neutral third

long rants → hard to read · short clinical labels → wrong tone

soft cap ~500, hard cap 2000

truncate at hard cap; warn

emotional_impact

signed [-10, +10] · but reflections rarely exceed ±8

impressions are processed · rarely touch raw extremes

RECOMMENDED_IMPACT_BOUND = 8

warn at ±9/±10 · clamp to -10/+10

filling (evidence)

MUST cite ≥1 L3 event id from the input · no invention allowed

without provenance, forgetting-rights cascade can't work

every id must exist in input

rejected · thought dropped

The filling chain is load-bearing: it's what lets "forget this event" cascade correctly. If the user deletes an L3 event that five L4 thoughts cited, the system can either drop those thoughts (cascade) or mark their filling as orphaned (preserve thought, lose evidence). Without cited filling, neither option is possible — the chain is untraceable and the thought becomes a dangling insight. That's why the prompt treats missing filling as a rejection rather than a warning.

Zoom-out: L1 is authored (by humans or bootstrap LLM). L2 is captured (verbatim, never distilled). L3 is the first place where raw text becomes memory — extracted by LLM per session close, with strict rules on valence sign and the 6 relational tags. L4 is the inner voice — fewer thoughts, softer tone, cited evidence. Each transition is an LLM call guarded by the gates in §7.

2. One turn in motion every layer, every second, chronological

User types one sentence → persona streams one reply

runtime.interaction.assemble_turn · called by TurnDispatcher from IncomingTurn.channel_id → channel.send() back

HOT PATH

Who wakes up

What it does

Reads

Writes / emits

Channel

Debounce burst into one IncomingTurn; mint a turn_id.

internal: per-user debounce timer

queue → TurnDispatcher

L2 write

Persist each raw user line as a RecallMessage row before the LLM sees anything.

—

L2 · recall_messages, FTS5 index, session last_message_at

L1 read

Pull the 3 core blocks verbatim — persona / user / style. Always first in the prompt.

L1 · core_blocks (1 row per label)

—

L3+L4 retrieve

Embed the user's message · run vector search over L3+L4 concept_nodes · walk the relational graph for bonus scoring. No channel_id filter (iron rule D4).

L3+L4 · concept_nodes, concept_nodes_vec (sqlite-vec ANN), concept_node_filling (graph edges)

—

L2 read (window)

Load the last memory.recent_window_size messages from the current session as short-term chat context.

L2 · recall_messages WHERE session_id = current, DESC LIMIT N

—

Prompt assemble

Splice system prompt + L1 blocks + retrieved L3/L4 memories + L2 window + new user message. No transport identity anywhere (iron rule F10).

—

ephemeral prompt string → LLM

LLM stream

Call llm.complete(prompt, stream=True); every token arrives via on_token callback.

network: OpenAI / Anthropic / stub

tokens → channel + SSE broadcaster

Channel send + mirror

Channel delivers the reply on its native surface (Discord DM / SSE frame). Runtime also mirrors to the runtime-owned broadcaster tagged with source_channel_id.

—

SSE · user_appended, token×N, done, voice_ready

L2 write (persona)

Persist the persona's reply as its own RecallMessage row (role=persona). Same session_id, same turn_id.

—

L2 · recall_messages, FTS5 index, cost ledger

on_turn_done

Clear the channel's in_flight_turn_id; log the LLM call into the cost ledger; hand control back to the dispatcher queue.

—

llm_calls; chat.settings.updated (if flags toggled)

Zoom-out: the reply to the user takes steps 1-8 (channel → memory → LLM → channel). Steps 9-10 are post-turn housekeeping — they happen while the user is reading the reply. This is why consolidation to L3 / L4 / L5 / L6 is not here: it's asynchronous background work, covered in §3. L6 episodic_state lands at session-close as a side-effect of extraction emitting session_mood_signal (no extra LLM call); the per-turn read of L6 just decays the snapshot if it's older than 12 hours.

3. Same 10 steps, 8-column sequence who is active when

Channel

Runtime

LLM

debounce

—

Step 1 · Channel mints turn_id, emits IncomingTurn

→ runtime

ingest

—

write

—

user_appended

Step 2 · Runtime writes each IncomingMessage into L2. UI sees the user's bubble via SSE.

—

assemble

read 3 blocks

—

Step 3 · L1 reads — persona / user / style. Always first. L6 read also runs here; if mood snapshot is > 12h old it's decayed back to neutral on the way in.

—

retrieve

—

vector search

—

Step 4 · L3+L4 retrieve via sqlite-vec + relational-graph walk. No channel_id filter (D4).

—

recent window

—

read N recent

—

Step 5 · Pull recent_window_size L2 messages from current session.

—

prompt

—

Step 6 · Splice system prompt + L1 + retrieved L3/L4 + L2 window + new user line. F10 guard.

—

stream

token×N

Step 7 · LLM streams tokens. Each token also mirrors to the runtime broadcaster (tag source_channel_id).

send

mirror

—

done

Step 8 · Originating channel delivers reply; runtime publishes chat.message.done to every SSE subscriber.

—

persist persona

—

write

—

Step 9 · Write persona's reply to L2 with same session_id + turn_id.

on_turn_done

log cost

—

Step 10 · Dispatcher releases the slot; llm_calls ledger records tokens + cost. L6 episodic_state is NOT written per-turn — it lands once when the session closes (as a side-effect of extraction's session_mood_signal); see §3 / §7.

4. After you stop typing session lifecycle & consolidation — the slow write

Idle → session closes → memory graduates L2 → L3 → L4

consolidate_worker + idle_scanner · background coroutines · user never waits on this

SLOW WRITE

t = 0

first user msg

+3s

persona reply

+15s

user follow-up

+20s

persona reply

+8 min

user goes quiet

+30 min

idle_scanner closes session

+30m 5s

L3 extraction

+30m 20s

L4 reflection

Step-by-step · what runs, in order

When

Worker

What it does

Reads

Writes

🌛

idle_scanner

Every idle_scanner.interval_seconds (default 60s) · finds sessions where now - last_message_at > SESSION_IDLE_MINUTES (default 30min).

sessions WHERE status='open'

sessions.status = 'closing'

📦

consolidate_worker

Polls every consolidate.worker_poll_seconds (default 5s) for sessions in status='closing'. Picks one at a time.

sessions WHERE status='closing'

sessions.status = 'consolidating'

📝

L2 read

Loads all recall_messages for the session. Skips entirely if < trivial_message_count OR < trivial_token_count (default 3 / 200).

L2 · all messages in session

—

🧩

LLM extract

Calls prompts.extract_fn(session_messages) with SMALL tier model. Produces a short list of ExtractedEvent(description, emotional_impact, tags...).

L2 · session window

LLM tokens → memory ingester

💚

L3 write

Each ExtractedEvent becomes a concept_node row (type=EVENT). Sentence-transformers embedder generates a 384-d vector; sqlite-vec stores it.

L1 · persona / user blocks (for context)

L3 · concept_nodes, concept_nodes_vec, concept_nodes_fts

🎯

reflection gate

Counts L4 thoughts written in last 24h. If ≥ reflection_hard_gate_24h (default 3) · skip reflection. Otherwise continue.

L4 · COUNT(*) WHERE created_at > now-24h

—

🪞

LLM reflect

Calls prompts.reflect_fn(recent_events) with MEDIUM tier. Produces 0-2 ExtractedThought(description, filling_ids).

L3 · recent events (last N sessions)

LLM tokens → memory ingester

🧡

L4 write

Each thought becomes a concept_node (type=THOUGHT) + one concept_node_filling row per source event (parent=thought, child=event).

—

L4 · concept_nodes, concept_node_filling (graph edges), concept_nodes_vec

✓

Marks session status='closed'. Fires on_session_closed hook → SSE chat.session.boundary event → Web UI draws a timestamped rule.

—

SSE · chat.session.boundary

Why this is async: extracting events needs an LLM call; reflecting needs another. If we did this inline per turn, every user message would wait 2-3 seconds for extraction before seeing the persona's reply. Instead, we let the reply hit the user immediately (steps 1-8 in §1) and push L3/L4 writing to the background (§3). The cost is: L3/L4 are stale by ~30 minutes of idle time. The benefit: the chat feels real-time.

5. Story trace one real sentence, all six layers lit up

Scenario · user DMs the Discord bot

"我养了只白猫,叫小黑。他超调皮,老在半夜跳到我脸上。"

t+0.0s

discord.py on_message → DiscordChannel.push_user_message · debounce timer armed for 2s.

t+2.0s

Debounce expires · mint turn_id = turn-33bec7 · push IncomingTurn into queue.

t+2.1s

Write ↘ One row into recall_messages · role=user · content = the full sentence · channel_id="discord" · session_id="s_d4e82a" · linked to turn-33bec7.

t+2.2s

Read ↗ 3 core_blocks loaded · persona="她的性格是…" · user="你是一个在旧金山的软件工程师…" · style="don't say 'haha'". L6 read also runs · personas.episodic_state.mood="neutral" so the # How you feel right now section is skipped.

t+2.3s

Read ↗ Embed the sentence · sqlite-vec ANN search · cold DB so 0 hits. Relational walk also 0 edges. No episodic memory to pull in yet.

t+2.3s

Read ↗ Same search on thoughts · 0 hits · persona has no long-term impressions about you yet.

t+2.4s

LLM

Prompt assembled (core_blocks + empty retrieval + empty window + new sentence) · sent to gpt-4o · streaming begins.

t+2.4–5.8s

Tokens stream back · Discord channel builds the reply · Web broadcaster mirrors every token tagged source_channel_id="discord".

t+5.8s

Write ↘ Persona reply written as second row · role=persona · same turn_id.

t+31m

idle_scanner tick · session s_d4e82a last_message_at is 31 min ago · mark status='closing'.

t+31m 5s

Write ↘ Extract LLM produces:

{description: "user has a white cat named 小黑 that's energetic and jumps on them at night", emotional_impact: +2, tags: [pet, night], session_mood_signal: {mood: "warm", energy: 6, last_user_signal: "shared a fond detail about pet"}}

· stored with 384-d embedding. This is the moment 'persona remembers'. Same call also writes an L5 entity for "小黑" (kind=pet) and an L6 episodic_state update from the session_mood_signal field — no extra LLM round-trip; SSE chat.mood.update fires from on_mood_updated.

t+31m 20s

Gate check · 24h thought count = 0 < 3 · OK to reflect. Reflection LLM produces: {description: "user has a playful bond with animals · their life has warmth at night", filling_ids: [<event_id>]}. Filling edge created thought → event.

later

User returns, asks "你还记得我的猫吗" on Web (not Discord).

later

Read ↗ Embed "你还记得我的猫吗" · sqlite-vec finds the 小黑 event with high cosine similarity. Persona mentions "小黑" by name. Cross-channel memory works — because D4 never filtered by channel_id.

Key moments: the raw sentence lives in L2 immediately (step at +2.1s). But the fact "user has a cat named 小黑" only becomes memory at +31m when the session idle-closes and consolidation runs. Ask the persona at +10min and it won't know about 小黑 via L3 retrieval — it would only remember because the raw text is still in the L2 recent-window of the same session. Once the session closes and re-opens, L2 window is reset, but L3 now carries the permanent fact.

6. Cross-layer read/write matrix who touches each layer, when

Layer	Turn hot path reads	Turn hot path writes	Background reads	Background writes	Admin API
L1	`load_core_blocks()` · every prompt · 3 labels (persona / user / style)	—	extract_fn may read `persona` / `user` blocks for context	—	GET `/api/admin/persona` POST `/api/admin/persona` (rejects legacy keys with 422) POST `/api/admin/persona/style` (style block) onboarding · bootstrap-from-material
L2	recent window · `recent_window_size` msgs	every ingested message · user + persona	whole-session read for extraction	—	GET `/api/chat/history` GET `/api/admin/memory/search` (FTS) DEL `/api/admin/memory/messages/{id}` DEL `/api/admin/memory/sessions/{id}`
L3	vector + graph walk · every turn	—	reflect_fn reads recent events for input	extract_fn output · per closed session	GET `/api/admin/memory/events` (paginated) GET `/api/admin/memory/events/{id}/dependents` DEL `/api/admin/memory/events/{id}` (orphan or cascade) import pipeline also writes L3
L4	vector search + `force_load_user_thoughts` · every turn	—	gate reads 24h count · slow_tick reads recent events for forward-looking inference	reflect_fn output (thoughts) · slow_tick output (thoughts + expectations + intentions)	GET `/api/admin/memory/thoughts` GET `/api/admin/memory/thoughts/{id}/trace` DEL `/api/admin/memory/thoughts/{id}` GET `/api/admin/slow-tick/transcripts`
L5	alias scan via `find_query_entities()` · every turn	—	extract_fn checks for known aliases during dedup	extract_fn writes new entities + alias rows + junction edges	GET `/api/admin/memory/entities` POST `/api/admin/memory/entities/{id}/merge`
L6	`personas.episodic_state` · 12h decay check on assemble_turn entry	—	—	`update_episodic_state()` on session close from extraction's `session_mood_signal`	GET `/api/admin/persona` (returns episodic_state)

Observation: hot path (per-turn) is dominated by reads from L1+L2+L3+L4+L5+L6, with L2 writes as the only hot-path write. Everything that creates new memory (L3 / L4 / L5 / L6) is background. This is by design — it's what keeps turn latency low while still building up long-term knowledge.

7. L6 episodic state lifecycle how persona affect updates and reaches the browser

extraction.session_mood_signal → update_episodic_state → SSE → Web UI header

single LLM call writes both L3 events and L6 affect

LIVE

1. Session closes

consolidate_worker picks up status='closing'

→

2. Extraction LLM runs

SMALL tier · returns L3 events + L5 entities + session_mood_signal in one JSON

→

3. Consolidate writes through

events + entities persisted; mood signal validated

↓

Validated signal funnels through memory.update_episodic_state().

4. L6 write

personas.episodic_state JSON replaced; on_mood_updated hook fires on the lifecycle queue

→

5. SSE broadcast

chat.mood.update · runtime-owned broadcaster · every subscriber

→

6. Web UI re-renders

usePersona hook slices the new episodic_state into the chat header

Read-side closure: the next turn's Step 3 (L1+L6 read) pulls the freshly-written episodic_state. assemble_turn entry checks the snapshot's updated_at; if it's older than 12 hours, mood resets to neutral so a long quiet period doesn't open the next conversation under stale affect. When mood is neutral, the # How you feel right now system-prompt section is skipped entirely to keep the prompt terse.

8. Emotion & scoring algorithms the math behind what gets remembered and what gets surfaced

Three distinct algorithms · three distinct jobs

retrieval fusion score · consolidation trivial-gate · shock / timer / hard-gate reflection triggers

MATH

Algorithm A · Retrieval fusion score

Every L3/L4 candidate returned from sqlite-vec ANN gets re-ranked by a weighted sum of 4 signals. The weights are in src/echovessel/memory/retrieve.py.

        total = 0.5 · recency + 3.0 · relevance + 2.0 · |impact| + 1.0 · relational_bonus
      

Signal

Weight

Source

Formula

Range

recency

how fresh this memory is

exp(-ln(2) · days_since / 14) · 14-day half-life

[0, 1]

relevance

semantic similarity of query → candidate

1 - distance/2 · clamped · min-floor 0.4 drops orthogonal hits

[0, 1]

|impact|

how emotionally loaded the memory is

min(|emotional_impact| / 10, 1.0)

[0, 1]

relational_bonus

has the node got relational_tags?

0.5 if relational_tags non-empty, else 0

{0, 0.5}

Why these weights? Relevance (3.0) dominates — what a memory is about matters most. Impact (2.0) ensures peak emotional moments surface even when their semantics drift from the query. Recency (0.5) gently prefers fresh memories without drowning old-but-important ones. Relational bonus (1.0) pulls in graph-connected nodes. The min-relevance floor of 0.4 is important: without it, strictly orthogonal candidates would occasionally bubble up on pure |impact| × relational_bonus, causing false-positive recall. With the floor, truly unrelated memories can't enter the ranked set even if they're emotionally loaded.

Algorithm B · Strong-emotion override (consolidation)

When deciding whether a session is "trivial" (skip extraction), the session below threshold can still get promoted to L3 if it contains any strong-emotion keyword. This is a keyword-matched safety net for peak emotional moments that happen in short sessions.

Algorithm C · Reflection triggers (shock · timer · hard gate)

Once a session clears extraction (L3 events exist), the next question is whether to run reflection (L4 thoughts). Three decision rules converge.

SHOCK_IMPACT_THRESHOLD

≥

If any freshly-extracted L3 event has |emotional_impact| ≥ 8 · force reflection NOW even if the timer hasn't elapsed. Rationale: peak moments should reshape persona's impression immediately, not wait 24 hours.

SHOCK_IMPACT_THRESHOLD = 8
memory/consolidate.py

TIMER_REFLECTION_HOURS

⏲

Even without a shock event · if > 24 hours have passed since the last reflection · run one. Keeps persona's long-term impressions slowly updating even during routine chats.

TIMER_REFLECTION_HOURS = 24
memory/consolidate.py

REFLECTION_HARD_LIMIT_24H

Regardless of shock or timer · no more than 3 reflections per rolling 24-hour window. Prevents L4 explosion on chatty days or debugging sessions. The hardest of the three gates — wins over shock and timer.

REFLECTION_HARD_LIMIT_24H = 3
(configurable via consolidate.reflection_hard_gate_24h)

Decision order:
  1. count reflections in last 24h → if ≥ 3, skip (hard gate wins)
  2. any fresh event with |impact| ≥ 8? → reflect (shock path)
  3. last reflection > 24h ago? → reflect (timer path)
  4. otherwise → skip this session

Where does `emotional_impact` come from?

The emotional_impact signed integer on each L3 event isn't computed by a formula — it's produced by the extraction LLM itself. The prompt in prompts/extract.py instructs the model to rate -10 (catastrophic loss / grief) to +10 (peak joy / breakthrough), with 0 for mood-neutral facts. The algorithm layer here is the prompt engineering rather than a numerical rule, which is why tweaking it requires editing prompt text, not code.

EchoVessel Runtime Interactions & Memory Flows

1. How each memory layer comes into being the distillation rules that graduate raw text into persistent memory

L1 · `core_blocks` — how the 3 blocks are born

L2 · `recall_messages` — every message captured, verbatim

L3 · `concept_nodes` WHERE type=EVENT — the first real distillation

What the extraction prompt instructs the LLM to produce

Emotional impact scale (from the prompt)

Relational tag vocabulary (closed · exactly 6)

Mandatory self-check (the safety net)

L4 · `concept_nodes` WHERE type=THOUGHT — reflection, not summary

Tonal constraints (from the prompt · most important)

Good vs bad reflections (from the prompt)

Structural constraints

2. One turn in motion every layer, every second, chronological

3. Same 10 steps, 8-column sequence who is active when

4. After you stop typing session lifecycle & consolidation — the slow write

Step-by-step · what runs, in order

5. Story trace one real sentence, all six layers lit up

Scenario · user DMs the Discord bot

6. Cross-layer read/write matrix who touches each layer, when

7. L6 episodic state lifecycle how persona affect updates and reaches the browser

8. Emotion & scoring algorithms the math behind what gets remembered and what gets surfaced

Algorithm A · Retrieval fusion score

Algorithm B · Strong-emotion override (consolidation)

Algorithm C · Reflection triggers (shock · timer · hard gate)

Where does `emotional_impact` come from?

9. How retrieval actually ranks inside the L3+L4 read of step 4

10. Policy gates the checks that can stop a write

11. SSE mirror as the nervous system what events flow, in both directions

EchoVessel Runtime Interactions & Memory Flows

1. How each memory layer comes into being the distillation rules that graduate raw text into persistent memory

L1 · core_blocks — how the 3 blocks are born

L2 · recall_messages — every message captured, verbatim

L3 · concept_nodes WHERE type=EVENT — the first real distillation

What the extraction prompt instructs the LLM to produce

Emotional impact scale (from the prompt)

Relational tag vocabulary (closed · exactly 6)

Mandatory self-check (the safety net)

L4 · concept_nodes WHERE type=THOUGHT — reflection, not summary

Tonal constraints (from the prompt · most important)

Good vs bad reflections (from the prompt)

Structural constraints

2. One turn in motion every layer, every second, chronological

3. Same 10 steps, 8-column sequence who is active when

4. After you stop typing session lifecycle & consolidation — the slow write

Step-by-step · what runs, in order

5. Story trace one real sentence, all six layers lit up

Scenario · user DMs the Discord bot

6. Cross-layer read/write matrix who touches each layer, when

7. L6 episodic state lifecycle how persona affect updates and reaches the browser

8. Emotion & scoring algorithms the math behind what gets remembered and what gets surfaced

Algorithm A · Retrieval fusion score

Algorithm B · Strong-emotion override (consolidation)

Algorithm C · Reflection triggers (shock · timer · hard gate)

Where does emotional_impact come from?

9. How retrieval actually ranks inside the L3+L4 read of step 4

10. Policy gates the checks that can stop a write

11. SSE mirror as the nervous system what events flow, in both directions

L1 · `core_blocks` — how the 3 blocks are born

L2 · `recall_messages` — every message captured, verbatim

L3 · `concept_nodes` WHERE type=EVENT — the first real distillation

L4 · `concept_nodes` WHERE type=THOUGHT — reflection, not summary

Where does `emotional_impact` come from?