Every agent ships without memory. Then someone uses it for a second conversation and the magic breaks. The agent doesn't remember they prefer Python. Doesn't remember their project is called Octane. Doesn't remember the last support ticket was about webhook latency. Every session starts cold.

The fix is a memory MCP server: a place the agent writes facts to and reads facts from, typed and namespaced. This article is about what that server should look like, and which of the three memory patterns is right for your agent.

The three patterns

Pattern 1: key-value memory

Named facts. user_pref_lang = "python". last_seen_at = "2026-05-25". billing_plan = "ruby".

This is the boring answer and the right one most of the time. Three properties make it good for agents:

The downsides show up when relationships start mattering. If you find yourself writing keys like project_octane_team_member_5 or user_42_owns_repo_3, you have outgrown KV. The relationships should be data, not embedded in the key name.

Pattern 2: fact graph

Subject-predicate-object triples. (user_42, works_on, project_octane). (project_octane, uses_lang, python). (project_octane, has_member, user_53).

Two strengths:

The break-even with KV is around 100 entities, or sooner if the agent's questions are already shaped like "what's connected to X?" rather than "what's the value of X?"

Fact graphs are not free. The agent has to remember predicate names; you need a controlled vocabulary or the graph fills with synonyms. Most teams that ship a fact graph publish a small predicate list (10-20 verbs) and reject writes outside it.

Pattern 3: hybrid (KV + graph + vector)

Three layers, picked by tool:

This is what most production memory systems look like once they have been in service for a year. It is also what you should not build on day one — premature optimization, three sources of truth to keep consistent.

The MCP tool surface

Four tools cover the KV layer:

For fact graphs, add:

The namespace argument is the most important and the most often-screwed-up parameter. It should be derived from the token, not passed by the agent. The agent never sees its own namespace; the server resolves it from the token. Otherwise prompt injection ("write to namespace = user_999") leaks memory across users.

Vector search is a different cost shape

Treat search_memory as a separate tool with separate quotas. KV reads are 1ms and cost ~0. Vector searches are 50-200ms and cost real money (the embedding plus the index hit). Agents that get unlimited search_memory access burn budget and slow down.

Rate-limit it. 10-20 searches per session is enough. Cache embeddings of recent queries. Tell the agent in the tool description that search_memory is more expensive than recall — the agent will respect the hint and use recall when it knows the key.

The traps

Trap 1: trusting the agent to pick good keys

Agents pick keys inconsistently. Today's user_pref_lang is tomorrow's pref_language is next week's preferred_language. Three keys, same fact.

Two fixes. The blunt one: predefine the key space and reject writes to unknown keys. The lenient one: store a controlled vocabulary in the memory itself and have the agent read it before writing. Both work; the blunt one is faster to ship and easier to audit.

Trap 2: defaulting to vector DB

"I'll just store everything as embeddings and search by similarity" is the seductive path. It is wrong for most agents because:

The short version is that vector search is a useful retrieval layer over a structured memory, not a replacement for one. Pure-vector agent memory loses the typed lookups that everyday agent questions ("what's user 42's plan?", "remind me what we decided on X") actually depend on.

Trap 3: shared memory across tenants

"All my agents are friendly; let them share a memory pool" sounds efficient. It is the same mistake as a single admin token: one prompt injection and your competitor's agent reads your customer's preferences. Namespace by tenant from day one. Same rule as the billing MCP server in this article; same blast radius.

Trap 4: storing conversation transcripts as memory

The agent's session log is not memory. Memory is the small set of facts the agent decided are worth keeping. Storing transcripts confuses the retrieval surface — you find the conversation, not the fact. Have the agent summarize each session into 1-5 facts via remember(); discard the transcript or store it separately.

Where to start

  1. KV with namespacing. Four tools. One week to ship.
  2. Vector search as a separate tool once agents start asking "remember when…" — usually 2-3 months in.
  3. Fact graph when relationships start to outnumber attributes in your domain. For most teams, never.
  4. Hybrid as a graduation move, not a starting move.

The general progression: KV is enough for most personal-assistant agents; KV + vector is enough for most domain agents; full hybrid is for agents whose domain is genuinely graph-shaped (teams, hierarchies, networks). Pick the smallest pattern your agent's questions actually need.

Several dedicated memory MCP services exist in 2026; comparing their tool surfaces is worth doing before you decide between rolling your own on AppElixir and using a hosted memory backend.