Every agent ships without memory. Then someone uses it for a second conversation and the magic breaks. The agent doesn't remember they prefer Python. Doesn't remember their project is called Octane. Doesn't remember the last support ticket was about webhook latency. Every session starts cold.
The fix is a memory MCP server: a place the agent writes facts to and reads facts from, typed and namespaced. This article is about what that server should look like, and which of the three memory patterns is right for your agent.
The three patterns
Pattern 1: key-value memory
Named facts. user_pref_lang = "python". last_seen_at = "2026-05-25". billing_plan = "ruby".
This is the boring answer and the right one most of the time. Three properties make it good for agents:
- Stable keys. The agent picks a key once and reads it back forever. No "I remembered this but I called it user_lang last time, not lang_pref."
- Atomic updates. Overwrite is the default. No reconciliation, no merging, no "is this an update or a new fact?"
- Bounded scope. Namespacing is just key prefixing.
user:42:pref_langbeats any graph traversal for "what's user 42's preference?"
The downsides show up when relationships start mattering. If you find yourself writing keys like project_octane_team_member_5 or user_42_owns_repo_3, you have outgrown KV. The relationships should be data, not embedded in the key name.
Pattern 2: fact graph
Subject-predicate-object triples. (user_42, works_on, project_octane). (project_octane, uses_lang, python). (project_octane, has_member, user_53).
Two strengths:
- Traversal. "Who else is on this project?" is a 1-hop query, not a key-pattern scan. "What languages does this team use?" is 2 hops.
- Symmetric relationships. "User 42 works on project Octane" is the same edge as "project Octane has member user 42" — the graph stores it once, the agent reads it either direction.
The break-even with KV is around 100 entities, or sooner if the agent's questions are already shaped like "what's connected to X?" rather than "what's the value of X?"
Fact graphs are not free. The agent has to remember predicate names; you need a controlled vocabulary or the graph fills with synonyms. Most teams that ship a fact graph publish a small predicate list (10-20 verbs) and reject writes outside it.
Pattern 3: hybrid (KV + graph + vector)
Three layers, picked by tool:
- KV for hot identity facts. Preferences, settings, role, recently-active timestamps. Fast, exact-match.
- Fact graph for evolving relationships. Project memberships, ownership, "who is involved in what."
- Vector index over both for retrieval. Semantic recall: "remember that conversation about webhook latency?" Embeddings find it; the source layer (KV or graph) returns the structured data.
This is what most production memory systems look like once they have been in service for a year. It is also what you should not build on day one — premature optimization, three sources of truth to keep consistent.
The MCP tool surface
Four tools cover the KV layer:
remember(key, value, namespace)— write or overwrite.recall(key, namespace)— read.search_memory(query, limit, namespace)— semantic search.forget(key, namespace)— delete.
For fact graphs, add:
link(subject, predicate, object, namespace)— assert a relationship.traverse(subject, predicate, max_depth, namespace)— walk the graph.unlink(subject, predicate, object, namespace)— remove an edge.
The namespace argument is the most important and the most often-screwed-up parameter. It should be derived from the token, not passed by the agent. The agent never sees its own namespace; the server resolves it from the token. Otherwise prompt injection ("write to namespace = user_999") leaks memory across users.
Vector search is a different cost shape
Treat search_memory as a separate tool with separate quotas. KV reads are 1ms and cost ~0. Vector searches are 50-200ms and cost real money (the embedding plus the index hit). Agents that get unlimited search_memory access burn budget and slow down.
Rate-limit it. 10-20 searches per session is enough. Cache embeddings of recent queries. Tell the agent in the tool description that search_memory is more expensive than recall — the agent will respect the hint and use recall when it knows the key.
The traps
Trap 1: trusting the agent to pick good keys
Agents pick keys inconsistently. Today's user_pref_lang is tomorrow's pref_language is next week's preferred_language. Three keys, same fact.
Two fixes. The blunt one: predefine the key space and reject writes to unknown keys. The lenient one: store a controlled vocabulary in the memory itself and have the agent read it before writing. Both work; the blunt one is faster to ship and easier to audit.
Trap 2: defaulting to vector DB
"I'll just store everything as embeddings and search by similarity" is the seductive path. It is wrong for most agents because:
- Exact-match queries (what's user 42's plan?) work badly with similarity.
- Updates are awkward — you can't easily overwrite an embedding-backed fact.
- The cost is significant at scale.
The short version is that vector search is a useful retrieval layer over a structured memory, not a replacement for one. Pure-vector agent memory loses the typed lookups that everyday agent questions ("what's user 42's plan?", "remind me what we decided on X") actually depend on.
Trap 3: shared memory across tenants
"All my agents are friendly; let them share a memory pool" sounds efficient. It is the same mistake as a single admin token: one prompt injection and your competitor's agent reads your customer's preferences. Namespace by tenant from day one. Same rule as the billing MCP server in this article; same blast radius.
Trap 4: storing conversation transcripts as memory
The agent's session log is not memory. Memory is the small set of facts the agent decided are worth keeping. Storing transcripts confuses the retrieval surface — you find the conversation, not the fact. Have the agent summarize each session into 1-5 facts via remember(); discard the transcript or store it separately.
Where to start
- KV with namespacing. Four tools. One week to ship.
- Vector search as a separate tool once agents start asking "remember when…" — usually 2-3 months in.
- Fact graph when relationships start to outnumber attributes in your domain. For most teams, never.
- Hybrid as a graduation move, not a starting move.
The general progression: KV is enough for most personal-assistant agents; KV + vector is enough for most domain agents; full hybrid is for agents whose domain is genuinely graph-shaped (teams, hierarchies, networks). Pick the smallest pattern your agent's questions actually need.
Several dedicated memory MCP services exist in 2026; comparing their tool surfaces is worth doing before you decide between rolling your own on AppElixir and using a hosted memory backend.