What is the collection abstraction in AppElixir?

A collection is a uniform interface over a single data source. A Google Sheet, a Postgres or MySQL or SQLite table, and a REST endpoint all present the same small contract: list fields with types, read records by query, optionally write records. The compiler and runtime never see the source type; they see a collection. That is what lets one engine serve every source identically.

How does AppElixir infer types from a spreadsheet?

A sheet hands the engine all strings. The collection driver samples a window of rows per column and proposes a type: all values parse as integers means integer, all parse as ISO dates means date, a small fixed value set means enum. SQL gives types for free from the information schema. A REST endpoint is sampled from one example payload. Inferred types are proposals you can correct, not guesses the engine hides.

Why does AppElixir default to read-only tools?

A read tool that returns the wrong row is a bug. A write tool that writes the wrong row is data loss. Read-only is the safe default because the cost of a mistake is bounded. Write capability on a collection is opt-in per source and produces a separately named tool, so the engine never silently grants an agent the ability to mutate your data.

Should one MCP server proxy two data sources?

No. A megaserver that fronts a sheet and a database couples two failure domains, two auth models, and two rate-limit budgets behind one tool surface. The engine encourages one collection per server. MCP servers compose cleanly in any client, so two focused servers beat one server pretending to be a gateway.

One Collection Abstraction Over Spreadsheets, SQL, and REST

This is chapter 3 of the AppElixir engine teardown, a walk through how a no-code schema becomes a live MCP server. Start at chapter one.

In chapter two the schema compiler turned a form into a tool's inputSchema and turned a collection into the tool's output type. That sentence quietly assumed the hard part was already solved: that "a collection" is a single, well-typed thing with fields and records. It is not. A collection is a Google Sheet one minute and a Postgres table the next. The job of this chapter's subsystem is to make that assumption true.

The thesis of the engine is that the plumbing is identical for every server, so only the data and the tool design differ. The collection abstraction is where that thesis either holds or collapses. If the compiler had to know whether it was reading a sheet or a SQL row or an HTTP response, every later stage would inherit that branching, and the no-code promise would fray into three special cases. So the engine draws a hard line: above the line is one uniform collection; below the line, a driver per source type does the dirty work.

The collection interface

A collection is defined by a small contract. Conceptually it looks like this, with the source type appearing nowhere in the part the rest of the engine touches:

interface Collection {
  // Static description used by the schema compiler.
  describe(): {
    name: string
    fields: Field[]              // { key, type, nullable, description? }
    capabilities: {
      read: true                 // always
      write: boolean             // opt-in, per source
    }
  }

  // Read side. The query is a typed filter built from the tool's inputs.
  list(query: Query, page: Cursor | null): Promise<{
    records: Record[]
    next: Cursor | null          // null means last page
  }>

  // Write side. Present only when capabilities.write is true.
  put?(record: Record): Promise<Record>
}

That is the whole surface. The compiler reads describe() to build output types. The runtime calls list() to answer a read tool and put() to answer a write tool. Neither of them imports a spreadsheet client or a SQL connection pool. They import Collection. The three real sources are three classes that implement this interface, and they are the only place in the engine that knows what a Google Sheets API row or a SELECT result set actually is. This is the same per-source-driver shape the Model Context Protocol reference servers use: the official MCP server collection ships a Postgres server and a filesystem server that present one tool surface over very different backends. The collection abstraction is that pattern pulled inside one engine.

Type inference, three ways

Every source has a different relationship with types, and the driver's first job is to produce the fields array that the rest of the engine treats as ground truth.

A spreadsheet hands you nothing

A sheet column is all strings. "42", "2026-06-04", and "shipped" arrive identically as text. The spreadsheet driver samples a window of rows per column and proposes a type by consensus: if every sampled cell parses as an integer, the field is an integer; if every cell parses as an ISO date, it is a date; if the column holds a small fixed set of distinct values, it is an enum (which, as chapter two showed, is the field type agents respect most). Inference is a proposal, not a silent guess. The builder shows you the inferred type next to a sample value and lets you correct it before the contract is compiled. The principle: infer aggressively, but never hide the inference.

SQL hands you types for free

A Postgres, MySQL, or SQLite table already carries a schema. The SQL driver reads the information schema and maps column types straight across: integer to integer, text to string, boolean to boolean, timestamptz to a date-time string, a CHECK constraint or enum type to an enum. Nullability comes from NOT NULL. There is no sampling because there is nothing to infer; the database already did the work. This is why SQL-backed collections produce the cleanest contracts.

REST hands you one example

A REST endpoint has no schema you can introspect, only a response. The REST driver asks for one sample payload, walks the JSON, and infers a field per leaf: a number becomes a number, a quoted string becomes a string, a recognizable date string becomes a date, a small repeated value set becomes an enum. The sample is the schema. As with sheets, the inferred shape is shown for correction before it is trusted, because a single payload can miss an optional field or a nullable column.

Once describe() returns, type information stops being source-specific. Coercion runs the same way regardless of origin: when a read tool returns records, the driver coerces each raw value into the declared field type, so the agent always receives an integer where the contract promised an integer, whether that value came from a SQL column or a string cell in row 19.

By hand vs with the engine

By hand

You write a Sheets tool that returns strings and remember to parseInt every quantity at the call site. Next quarter you add a Postgres-backed tool and copy the structure, but Postgres returns native numbers, so the parseInt is now wrong and silently produces NaN on a clean integer. The REST tool you add after that returns null for a field your sample never showed, and the agent gets a record that does not match the schema you advertised. Three tools, three type models, three places a future you will get it subtly wrong.

With the engine

Each source is a driver that produces a typed fields array and coerces on the way out. The tool, the contract, and the agent see one consistent record shape. Adding a fourth source type later means writing one more driver against the same interface; nothing upstream changes, because nothing upstream ever knew the source type to begin with.

Read tools and write tools

Every collection can be read. Not every collection should be written. The engine encodes that asymmetry directly: capabilities.read is always true, capabilities.write is false unless you turn it on for that specific source. A read-only collection simply has no put(), so the compiler cannot emit a write tool from it. There is no path by which an agent gains the ability to mutate data without an explicit, source-level decision.

Read-only is the safe default for a blunt reason. A read tool that returns the wrong row is a bug you fix. A write tool that writes the wrong row, or writes twice because the agent retried, is data you may not get back. The cost of a read mistake is bounded; the cost of a write mistake is not. So write capability is opt-in, and when you opt in, the compiler produces a separately named tool (create_ticket, not an overloaded tickets) so that granting write is always a visible act. The deeper enforcement of this boundary, who is allowed to call the write tool and how it is rate-limited and audited, is the subject of chapter six.

Pagination over things that paginate differently

A list tool can match more records than any single response should carry, and each source disagrees about how to walk them. The interface hides that behind one opaque Cursor: the runtime passes whatever cursor the last page returned and stops when it gets null back. What the cursor contains is the driver's private business.

Spreadsheet: the cursor is a row offset. The driver reads a fixed window of rows, returns the next offset, and reports null when it walks off the end of the used range.
SQL: the cursor is keyset based, encoding the last seen value of an ordered key so the next page is a WHERE key > ? rather than a large OFFSET that degrades as the table grows.
REST: the cursor wraps whatever the upstream API uses: a page number, a next URL, or an opaque continuation token the endpoint handed back. The driver translates the engine's cursor to and from the API's own scheme.

To the runtime, all three are the same loop. A tools/call result carries the page of records, and the cursor is threaded through the protocol's structured result so the agent can ask for more without knowing it is paging a sheet, a keyset query, or a third-party API.

Why a megaserver is the wrong shape

The interface makes it tempting to build one server that fronts two collections: a sheet and a database behind a single tool surface. The engine deliberately discourages it. A megaserver couples two failure domains (the sheet's quota and the database's connection limit now share one uptime), two auth models, and two rate-limit budgets behind one set of tools, and it turns a clean single-source contract into a gateway you have to reason about as a whole.

The better shape is one collection per server, composed at the client. MCP servers compose cleanly: a single agent can mount many servers at once, and the Model Context Protocol specification is built around a client holding several independent server connections. Two focused servers (one for the sheet, one for the database) each have one auth scope, one failure domain, one audit trail, and one rate-limit budget. If the sheet's quota is exhausted, the database tools keep answering. The engine nudges you toward small single-source servers for the same reason a good codebase prefers small modules over a god object: the seams are where you keep your sanity.

This is the through-line of the whole teardown. Because every source is reduced to the same collection contract, the layers above (the runtime in chapter four, the model-facing descriptions in chapter five, the security plane in chapter six) can be written once and reused for every server. The collection abstraction is not a convenience. It is the load-bearing wall that lets the rest of the engine pretend there is only one kind of data in the world.