Docs/Concepts/MCP vs APIs vs Skills vs RAG

The five primitives behind every autonomous AI agent

APIs, MCP, Skills, RAG, and Tools. The names sound interchangeable; they aren't. Each one unlocks a specific capability an LLM cannot do on its own, and a real agent uses all five together. This page anchors them in a concrete story first, then walks through each primitive with a sequence diagram showing the actual data on the wire — so you finish reading able to decide which one to reach for when you add a new capability.

0. 4:47pm Friday — what an autonomous agent actually does

The CFO Slacks: "why is gross margin down 3 points this quarter?" A human analyst would open BigQuery, dig through invoices, cross-reference cost reports, check a couple of policy docs, draft a memo. About four hours of work. With an autonomous agent the same flow looks like this:

MCPThe agent connects to the company's warehouse-mcp server and the policies-mcp server at session start. Both expose their tools via one standard handshake.

SkillThe prompt "why is gross margin down" matches the trigger of amargin-investigation skill the finance team wrote. The skill's procedure (run these queries, in this order, with these guardrails) is now in the agent's working memory.

ToolFollowing the skill, the agent emits a run_sql tool call to pull this-quarter vs last-quarter unit economics by SKU. The runtime validates the query is read-only, runs it, and feeds the result back into context.

RAGTwo SKUs show abnormal cost growth. The agent searches the policies KB for any pricing-floor or supplier-contract rule that applies to those SKUs. RAG returns the relevant policy clauses with citations.

APIThe agent calls POST /v1/memos on the internal API to file a draft memo with the findings + citations + a recommended action. Slack DM to the CFO with a permalink.

Four hours of work, six seconds of agent time. None of those five steps is interchangeable. Strip out RAG and the memo is unsourced. Strip out the skill and the agent picks a different query ordering every run. Strip out MCP and you wrote per-vendor integration code instead. The rest of this page is the disciplined version of that intuition.

1. TL;DR

An API is how programs talk to a system. An MCP server is a standard way to expose tools to AI assistants. A Skill is a packaged playbook the assistant pulls into its working memory when a trigger matches. RAG is the pattern of fetching relevant chunks from a search index and feeding them into a model's prompt before it answers. A Tool is the typed function the model chooses to invoke during generation. APIs and MCP are infrastructure; RAG and skills are knowledge; tools are the interface through which the model reaches all of them.

2. The 5 primitives, one card each

Each card states what the primitive is, what the model literally cannot do without it, and what becomes possible when it's present. The sequence diagrams below the grid show the actual shape of the data moving on the wire.

API

An HTTP endpoint a caller invokes to read or write a system.

Without it:The agent has nothing real to act on. It can describe sending an email but it cannot actually send one.
With it:The agent can cause side effects: send mail, file a ticket, run a query, place a trade, deploy a build.
MCP

A vendor-neutral JSON-RPC protocol for exposing tools to AI assistants.

Without it:You hand-write a per-vendor integration for every external system. 20 systems = 20 SDK wrappers.
With it:One AI client picks up 11+ toolkits via the same handshake. Plug in Linear, Brightdata, filesystem, memory — done.
Skill

A self-contained instruction bundle the assistant loads when a trigger matches.

Without it:Every run is non-deterministic. Two runs of 'rebalance my portfolio' use different procedures.
With it:One agreed procedure runs every time. Audit-friendly. Crucial for regulated, repeatable, multi-step work.
RAG

Embed query → vector search → splice top-k chunks into the prompt → generate.

Without it:The model answers from training data and guesses. Your private docs and yesterday's news don't exist to it.
With it:Answers are grounded in your data with citations. The model can say 'I don't know' instead of hallucinating.
Tool

A typed function the LLM chooses to invoke during generation.

Without it:The model produces text only. It cannot ACT mid-turn. It cannot defer the answer until it has fetched data.
With it:Mid-turn the model can pause, call a tool, see the result, decide what to do next. This is what makes the loop 'agentic'.

2.1 API — the protocol any program already speaks

time
 │
 ▼
 ┌──── step 1: the caller knows the contract up front ─────────────────┐
 │                                                                     │
 │   from the docs / OpenAPI spec:                                     │
 │     POST /v1/leads                                                  │
 │     headers: { Authorization: "Bearer sk_…" }                       │
 │     body:    { name, email, source }                                │
 │     returns: 201 { id, createdAt } | 4xx { code, message }          │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 2: one round-trip = one resource op ───────────────────────┐
 │                                                                     │
 │  ┌─────────┐  POST /v1/leads        ┌───────────────────────────┐   │
 │  │ client  │ ─────────────────────► │  server (your stack       │   │
 │  │ (any    │                        │   or theirs):             │   │
 │  │  HTTP-  │                        │   • auth                  │   │
 │  │  speaker│  ◄──────────────────── │   • validate              │   │
 │  └─────────┘  201 { id: "lead_42" } │   • insert into DB        │   │
 │                                     │   • enqueue webhook       │   │
 │                                     └───────────────────────────┘   │
 │                                                                     │
 │   stateless: the server doesn't remember the previous call          │
 │   idempotent reads: GET /v1/leads/42 returns the same row twice     │
 └─────────────────────────────────────────────────────────────────────┘

 ── what an API gives an agent:  the ability to actually do things in
    the world. without one, the model is a chat bot.

2.2 MCP — one protocol, many servers, zero per-vendor SDK

time
 │
 ▼
 ┌──── step 1: client connects to one MCP server ──────────────────────┐
 │                                                                     │
 │  AI client ──► initialize {                                         │
 │                  protocolVersion: "2025-06-18",                     │
 │                  clientInfo: { name:"neww-agent", version:"1.4" }   │
 │                }                                                    │
 │  AI client ◄── result {                                             │
 │                  serverInfo: { name:"linear-server", version:"…" }, │
 │                  capabilities: { tools:{…}, resources:{…} }         │
 │                }                                                    │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 2: client asks "what can you do?" ─────────────────────────┐
 │                                                                     │
 │  AI client ──► tools/list                                           │
 │  AI client ◄── { tools: [                                           │
 │       { name:"linear_create_issue",                                 │
 │         inputSchema: { type:"object", required:["title","team"] }}, │
 │       { name:"linear_list_issues",  inputSchema: { … } },           │
 │       { name:"linear_update_status", inputSchema: { … } }           │
 │     ]}                                                              │
 │                                                                     │
 │     ─────── the LLM now "knows" these 3 tools exist ───────         │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 3: model decides to call one ──────────────────────────────┐
 │                                                                     │
 │  USER: "open a ticket for the staging deploy failure"               │
 │                                                                     │
 │  LLM emits:                                                         │
 │     { name:"linear_create_issue",                                   │
 │       args:{ title:"Staging deploy fails on apply step",            │
 │              team:"INFRA", priority:"high" }}                       │
 │                                                                     │
 │  runtime ──► tools/call { name, arguments } ──► MCP server          │
 │                                                                     │
 │  MCP server runs the REAL Linear API under the hood, returns:       │
 │           ◄── { content:[{ type:"text",                             │
 │                            text:"Created INFRA-412" }]}             │
 │                                                                     │
 │  LLM continues:  "Filed INFRA-412 ✓"                                │
 └─────────────────────────────────────────────────────────────────────┘

 ── what MCP gives an agent:  one universal way to use any vendor's
    tools without writing a per-vendor SDK. plug in 11 servers, get
    11 toolkits the LLM can use immediately.

2.3 Skill — a procedure the org agreed on, made loadable

time
 │
 ▼
 ┌──── step 1: a skill is a file on disk ──────────────────────────────┐
 │                                                                     │
 │  ~/.claude/skills/finance-rebalance/skill.md                        │
 │  ─────────────────────────────────────────────────                  │
 │  ---                                                                │
 │  name: finance-rebalance                                            │
 │  trigger: ["rebalance", "drift > 5%"]                               │
 │  resources: ["policies/allocation.md"]                              │
 │  ---                                                                │
 │  Procedure (for the assistant to follow):                           │
 │    1. read_holdings(workspaceId)                                    │
 │    2. compare allocation vs target band; if drift < 1% stop         │
 │    3. require human approval before any execute_trade               │
 │    4. emit a 1-page memo citing the policy file                     │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 2: trigger matches → skill loaded into context ────────────┐
 │                                                                     │
 │   USER: "rebalance my IRA"                                          │
 │                                                                     │
 │   matcher: "rebalance" → finance-rebalance ✓                        │
 │   runtime inlines skill.md + cited resources into the system        │
 │   prompt, before any user turn is processed                         │
 │                                                                     │
 │      ┌──────────────────────────────────────────────┐               │
 │      │  system context now contains:                │               │
 │      │  • the skill body (procedure + safety rules) │               │
 │      │  • policies/allocation.md (the cited file)   │               │
 │      └──────────────────────────────────────────────┘               │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 3: agent follows the procedure, not its instincts ─────────┐
 │                                                                     │
 │   instead of guessing what "rebalance" means, the agent uses the    │
 │   exact 4 steps in the skill — same procedure every time, across    │
 │   every user, with audit-friendly citations                         │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘

 ── what a skill gives an agent:  a fixed playbook the org has agreed
    on. removes variance run-to-run. crucial for regulated workflows.

2.4 RAG — answers grounded in your own data

time
 │
 ▼
 ┌──── step 1: embed the query into a vector ──────────────────────────┐
 │                                                                     │
 │   USER: "what's our refund policy for enterprise customers?"        │
 │                                                                     │
 │   query  ──►  embedder  ──►  [0.0142, -0.221, 0.087, … 1535 more]   │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 2: find similar chunks in the vector store ────────────────┐
 │                                                                     │
 │   vector  ──►  Qdrant.search( index="company_kb", k=4 )             │
 │                                                                     │
 │   returns top-4 matches with similarity score + source:             │
 │     0.92  policies/refund.md#L41-L78    "...enterprise SKUs may..." │
 │     0.88  contracts/MSA-v3.md#L210-218  "...refund window of 30..." │
 │     0.84  faq/billing.md#L12-L20        "...standard tier excl..."  │
 │     0.71  blog/2024-refund-update.md    "...we changed our..."      │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 3: splice chunks into the prompt and generate ─────────────┐
 │                                                                     │
 │   final prompt to the LLM:                                          │
 │     [system: you answer using ONLY the citations below]             │
 │     [context: {chunk_1}, {chunk_2}, {chunk_3}, {chunk_4}]           │
 │     [user: what's our refund policy for enterprise customers?]      │
 │                                                                     │
 │   LLM responds, with citations pinned to the chunks:                │
 │     "Enterprise SKUs allow 30 days [policies/refund.md L41],        │
 │      subject to MSA §4.3 [contracts/MSA-v3.md L210]."               │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘

 ── what RAG gives an agent:  ground-truth answers from YOUR data,
    even data the model has never seen. solves "the AI hallucinated
    something that contradicts our policy."

2.5 Tool — the mechanism every other primitive reaches the model through

time
 │
 ▼
 ┌──── step 1: model is shown the available tools ─────────────────────┐
 │                                                                     │
 │   system:                                                           │
 │     "you are an agent. use the tools when helpful."                 │
 │   tools (schema sent with every turn):                              │
 │     [                                                               │
 │       { name:"read_holdings",  args:{ workspaceId:string } },       │
 │       { name:"search_kb",      args:{ query:string, k:int } },      │
 │       { name:"run_sql",        args:{ sql:string }   }              │
 │     ]                                                               │
 │   user: "top 5 customers by revenue last quarter?"                  │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 2: model emits a tool call instead of an answer ───────────┐
 │                                                                     │
 │   LLM ──►  { name:"run_sql",                                        │
 │             args:{ sql: "SELECT name, SUM(amount) AS rev            │
 │                          FROM invoices                              │
 │                          WHERE paid_at BETWEEN '2026-01-01' AND     │
 │                                                '2026-03-31'         │
 │                          GROUP BY name ORDER BY rev DESC LIMIT 5"}} │
 │                                                                     │
 │   the model has NOT answered yet — it has asked the runtime to do   │
 │   the work and feed the result back in                              │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 3: runtime executes (sandboxed) and returns the result ────┐
 │                                                                     │
 │   runtime.executeTool("run_sql", {sql})                             │
 │     ──► validates: SELECT-only? not on auth table? row cap?         │
 │     ──► prisma.$queryRawUnsafe(sql)                                 │
 │     ──► tool_result = [                                             │
 │           { name:"Acme",     rev:412300 },                          │
 │           { name:"Globex",   rev:308100 },                          │
 │           { name:"InitVivo", rev:285450 },                          │
 │           { name:"Yotsuba",  rev:201020 },                          │
 │           { name:"Hooli",    rev:177540 }                           │
 │         ]                                                           │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ┌──── step 4: model uses the result to answer ────────────────────────┐
 │                                                                     │
 │   LLM (final): "Last quarter the top 5 customers were Acme          │
 │    ($412K), Globex ($308K), InitVivo ($285K), Yotsuba ($201K),      │
 │    and Hooli ($178K)."                                              │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘

 ── what tool-use gives an agent:  the ability to ACT during generation,
    not just produce text. every other primitive on this page reaches
    the model through this mechanism.

3. Five real-world scenarios across industries

Same five primitives, different verticals. Each step is tagged with the primitive carrying it so you can see the composition.

Legal · contract review

Audit this 80-page MSA for unusual indemnity clauses and flag anything that diverges from our standard.

  1. 1Skillcontract-review skill loaded — defines the diff procedure and citation requirements.
  2. 2RAGembeds each section and retrieves the org's standard MSA clauses for comparison.
  3. 3Toolclause_diff tool produces a structured before/after for every divergence.
  4. 4APIPOST to Linear opens a review ticket with the diff attached.
A junior associate used to spend 5 hours on this. Agent finishes in under a minute with citations to the standard.
Sales · lead enrichment

For each new lead this week, find their company size, funding stage, and recent product launches.

  1. 1APIGET /v1/leads?since=7d returns the new leads from the CRM.
  2. 2MCPbrightdata MCP scrapes LinkedIn, Crunchbase, and the company blog for each lead.
  3. 3Toolscore_lead tool merges enriched fields and computes a fit score.
  4. 4APIPATCH /v1/leads/{id} writes the enrichment back to the CRM.
What an SDR researches in 8 hours, an autonomous agent grinds through in 4 minutes for a whole week of leads.
Engineering · deploy monitor

Watch the staging deploy. If it fails, open a Linear ticket with the failing step and ping me.

  1. 1APIpolls GET /v1/deploys/{id}/status every 15s until 'failed' or 'success'.
  2. 2Toolparse_logs tool extracts the failing step and the first stack trace from build output.
  3. 3MCPlinear-server.create_issue opens INFRA-### with title, step, trace, and labels.
  4. 4MCPslack-server.dm posts to the on-call engineer with the ticket link.
Replaces a CI bot + Slack webhook + ticket template. One agent, one prompt, no glue scripts.
Customer support · personalized rebuttals

Draft a personalized reply to every negative review from last week, citing the actual feature we shipped that addresses it.

  1. 1APIGET /v1/reviews?rating<=3&since=7d returns negative reviews.
  2. 2RAGembeds each review and retrieves matching changelog entries from the product KB.
  3. 3Skillreview-reply skill enforces tone, length, and 'no false claims' rules.
  4. 4Toolsend_review_reply tool drafts each response and stages for human approval.
Support team approves drafts in 30 seconds each instead of writing from scratch. Quality goes up, time per review drops 10x.
Finance · daily reconciliation

Reconcile yesterday's Stripe payouts against the GL. Surface anything that doesn't match.

  1. 1MCPstripe-mcp lists payouts; quickbooks-mcp lists GL entries.
  2. 2Toolmatch_records tool diffs the two sets and groups by payout id.
  3. 3Skillreconciliation skill enforces the 'flag don't fix' rule — agent never adjusts GL silently.
  4. 4APIPOST /v1/reports/recon writes the report; if mismatches exist, opens a ticket.
A nightly cron now does what a controller checked manually each morning. Catches errors hours earlier.
Operations · executive Q&A

why is gross margin down 3 points this quarter?

  1. 1Skillmargin-investigation skill loaded — defines the queries and the memo format.
  2. 2Toolrun_sql tool runs the cohort-by-SKU cost decomposition.
  3. 3RAGretrieves applicable pricing-floor and supplier-contract clauses.
  4. 4APIPOST /v1/memos files the draft memo; Slack DM with the link.
The four-hour analyst job from the opening story. Now: six seconds, fully cited, ready for review.

4. Things people get wrong

MCP is just APIs with a new name.

APIs are endpoints defined by the system that owns the data. MCP is a protocol that wraps any system so an AI client can discover its tools, call them, and stream results — using one handshake regardless of vendor. The MCP server usually calls APIs under the hood; that's the wrapping, not the equivalence.

RAG is just a database.

A database stores rows; RAG is the pattern of using a vector index (or hybrid search) to fetch only the relevant fragments and inject them into the prompt before generation. The store is one ingredient; retrieval + splicing + grounded generation is the recipe.

A skill is the same as a tool.

A tool executes something — runs SQL, sends an email. A skill tells the assistant how — it's text loaded into context. Skills don't run; they instruct. Most useful skills tell the agent which tools to use in which order.

If we have an API, we have an AI-ready surface.

APIs were designed for programs, not for LLMs. To be agent-ready a surface needs typed schemas, predictable error shapes, idempotency, and a discovery mechanism. MCP adds those things on top of your API. Tool definitions add them inside your agent runtime.

An agent is just an LLM with a system prompt.

A chatbot is an LLM with a system prompt. An agent is an LLM in a loop that can pause mid-turn, call tools, observe results, and decide its next step. The loop is the agent; the prompt is just the starting condition.

RAG and fine-tuning solve the same problem.

Use RAG for facts that change (this week's policies, yesterday's tickets, this customer's history). Use fine-tuning for behaviour that should be the same across all users (tone, format, hard skills the base model lacks). You will almost always end up using both.

5. Which primitive do I need? — a decision flow

┌─── "I want to add a new capability to the agent." ──────────────────┐
│                                                                     │
│                              start here                             │
│                                  │                                  │
│   ┌──────────────────────────────┴─────────────────────────────┐    │
│   │ Is the capability a recurring procedure                    │    │
│   │ that humans should agree on once and re-use forever?       │    │
│   └──────────────────────────────┬─────────────────────────────┘    │
│                                  │                                  │
│                  yes ◄───────────┴───────────► no                   │
│                   │                             │                   │
│                   ▼                             │                   │
│              add a SKILL                        │                   │
│              (file on disk)                     ▼                   │
│                                  ┌────────────────────────────┐     │
│                                  │ Will OTHER AI clients      │     │
│                                  │ (Cursor, Claude Code, …)   │     │
│                                  │ also want to reach this    │     │
│                                  │ capability?                │     │
│                                  └────────────┬───────────────┘     │
│                                               │                     │
│                                yes ◄──────────┴──────────► no       │
│                                 │                          │        │
│                                 ▼                          ▼        │
│                          expose it as MCP          add it as a TOOL │
│                          (one server, many         (typed function  │
│                           clients reuse it)         your agent uses)│
│                                                                     │
│   ─────── if the capability READS data the model didn't see in      │
│           training ── always pair it with RAG.                      │
│                                                                     │
│   ─────── if the capability is "talk to a system the company       │
│           already exposes" ── that system already has an API; you   │
│           are just wrapping it.                                     │
└─────────────────────────────────────────────────────────────────────┘

6. Side-by-side

AspectAPIMCPSkillRAGTool
What it isHTTP endpointTool-exposure protocolMarkdown playbookRetrieve-then-generate patternTyped function the LLM can call
Who calls itAny program with credsAny MCP-aware AI clientThe assistant on triggerYour app code / agent runtimeThe LLM during generation
Stateful?No (REST norm)Yes (session per connection)No (content only)No (per-query)Per-call
Auth modelAPI key / OAuth / sessionNone / token / OAuthNone (it's a file)Inherits vector store authInherits underlying surface
Latency profile100 ms – 5 s50 ms – 30 s (stdio cold start)Free (in-context)~200 ms embed + ANNWhatever it wraps
DiscoveryDocs / OpenAPItools/list JSON-RPCFrontmatter triggerN/ASchema declared at call time
VersioningURL or headerprotocolVersion in handshakeFile version / gitIndex versionArgument schema
Composes withAnythingTools, RAG, other MCPTools, RAG, MCPTools, MCP, APIsRAG, MCP, APIs
neww.ai surfaceapps/web/src/app/api/v1/*11 user-scope serversRoadmap (not yet 1st-class)Qdrant + lib/web-data/lib/agent/tools/index.ts

7. The recipe for an autonomous agent

An autonomous agent — one that can take a goal and pursue it without a human in the inner loop — needs all five primitives, each doing the specific job nothing else can do:

  1. 1 · loopWrap an LLM in a planner that can call tools, observe results, and decide whether to continue or stop. This is the agent.
  2. 2 · SkillsGive the agent the org's playbooks for recurring tasks so the procedure is the same on every run. Removes variance.
  3. 3 · MCPWire it to every external system through one protocol. The agent picks up new toolkits with zero new SDK code.
  4. 4 · RAGConnect it to the org's living knowledge: policies, tickets, docs, code, recent decisions. Grounds every claim.
  5. 5 · ToolsWrap the primitives above in typed functions with input validation, output schemas, and sandboxing. This is the interface the model actually uses.
  6. 6 · APIsBehind each tool, the real side effect — sending the email, filing the ticket, placing the trade, running the SQL.

Drop any one of those layers and you regress: drop the loop and it's a chatbot; drop skills and you get inconsistency; drop MCP and you write SDKs forever; drop RAG and you hallucinate; drop tools and the model can't act; drop APIs and there's nothing real to act on.

8. In the neww.ai codebase

Every primitive on this page maps to a real file or route you can read today. No aspirational stubs.

Primitiveneww.ai surface
APIapps/web/src/app/api/v1/*
apps/web/src/app/api/agent/master/dispatch/route.ts
MCP11 user-scope servers in ~/.claude.json
apps/web/src/app/api/mcp/route.ts (outbound)
allsystemsmvp/tests/testmcps.py (connectivity probe)
SkillRoadmap — Claude Code skills used by builders today; platform skill layer planned at lib/agent/skills/.
RAGapps/web/src/lib/web-data/fabric.ts
apps/web/src/lib/web-data/router.ts
apps/web/src/lib/web-data/connectors/* (40 connectors)
Qdrant + Meilisearch hybrid arm
Toolapps/web/src/lib/agent/tools/index.ts
apps/web/src/lib/agent/tools/data.ts
apps/web/src/lib/agent/tools/security.ts
Wired into the model via apps/web/src/lib/ai/orchestrator/tool-loop.ts
Routingapps/web/src/lib/ai-router.ts — provider selection, retry, budget enforcement, cross-provider fallback

9. Further reading