If you are trying to scale a single GraphQL endpoint across dozens of teams, the answer is GraphQL federation: a pattern that lets you split one large schema into independently owned subgraphs, each backed by its own service, while presenting clients a single unified graph. This graphql federation tutorial walks through the architecture, schema design, and operational practices I use to build enterprise APIs that scale with both traffic and team count. Federation solves the core problem of the monolithic GraphQL server: as your schema and your engineering org grow, a single codebase becomes a bottleneck for deploys, ownership, and reliability.
Below, I cover the federation model, how to design subgraph boundaries, how to wire entities together, and how to run the whole thing in production without creating a new monolith in disguise.
Why Federation Beats a Monolithic GraphQL Server
A first GraphQL deployment usually starts as one server resolving every field. That works until multiple teams need to ship changes to the same schema. You then hit familiar friction:
- Deploy contention. Every schema change goes through one repository and one release.
- Unclear ownership. No team fully owns the
Ordertype when five teams touch it. - Scaling mismatch. The product catalog and the checkout flow have very different load profiles, but they share a process.
Federation addresses this by letting each team own a subgraph. A subgraph is a standalone GraphQL service that defines part of the overall schema. A gateway (or router) composes these subgraphs into a single supergraph and routes incoming queries to the right services.
The two common implementations are Apollo Federation and any spec-compatible router. The concepts below use Apollo Federation v2 directives because they are widely adopted, but the design principles apply regardless of vendor.
Designing Subgraph Boundaries
Before writing a single resolver, decide how to split the graph. This is the most consequential decision in enterprise api design, and it mirrors the same reasoning you apply to a microservices and platform engineering practice: align service boundaries with business capabilities and team ownership, not with database tables.
Use these heuristics:
- One subgraph per bounded context. Group types that change together. Users, accounts, and authentication belong together. Products and inventory belong together.
- Own your entities. Each entity should have exactly one subgraph that is its source of truth. Other subgraphs extend it.
- Minimize cross-subgraph chatter. If two types are queried together constantly and require many resolver round-trips, that is a signal the boundary is wrong.
A typical e-commerce supergraph might decompose into:
- Accounts subgraph:
User,Address - Catalog subgraph:
Product,Category - Orders subgraph:
Order,LineItem - Reviews subgraph:
Review
The Order type lives in Orders but references a User from Accounts and Product from Catalog. Federation lets you stitch these together without Orders importing the other services' code.
A Practical GraphQL Federation Tutorial: Defining Entities
The core mechanic of federation is the entity: a type that can be resolved across subgraphs, identified by a @key. Let's build the Accounts subgraph first.
# accounts subgraph
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.3",
import: ["@key"])
type User @key(fields: "id") {
id: ID!
email: String!
displayName: String!
}
type Query {
user(id: ID!): User
}
The @key(fields: "id") directive tells the gateway that User can be uniquely identified by its id. Any other subgraph can now reference and extend User.
Next, the Orders subgraph references User without owning it:
# orders subgraph
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.3",
import: ["@key", "@external"])
type Order @key(fields: "id") {
id: ID!
total: Float!
buyer: User!
}
type User @key(fields: "id") {
id: ID! @external
orders: [Order!]!
}
type Query {
order(id: ID!): Order
}
Two important things happen here:
- The Orders subgraph extends
Userby adding anordersfield. It marksidas@externalbecause Accounts owns it. - When a client asks for
order.buyer.email, the gateway resolves the order in Orders, then calls Accounts with the buyer'sidto fetchemail.
To make that cross-service hop work, the owning subgraph implements a reference resolver. In Apollo Server, this looks like:
// accounts subgraph resolvers
const resolvers = {
User: {
__resolveReference(reference) {
// reference = { id: "123" } sent by the gateway
return getUserById(reference.id);
},
},
Query: {
user: (_, { id }) => getUserById(id),
},
};
The gateway batches these reference lookups, so resolving twenty orders does not produce twenty separate calls if you implement DataLoader-style batching inside getUserById.
Composing the Supergraph
Once subgraphs publish their schemas, you compose them into a single supergraph schema that the router serves. With the Rover CLI:
# Validate that all subgraphs compose without conflicts
rover supergraph compose --config ./supergraph.yaml > supergraph.graphql
Your supergraph.yaml lists each subgraph and its routing URL:
federation_version: =2.3.0
subgraphs:
accounts:
routing_url: https://accounts.internal/graphql
schema:
subgraph_url: https://accounts.internal/graphql
orders:
routing_url: https://orders.internal/graphql
schema:
subgraph_url: https://orders.internal/graphql
In production, do not compose at runtime by polling every subgraph. Instead, run composition in CI as a gate. A subgraph change that breaks composition should fail the pipeline, not the live gateway. Publish the validated supergraph to a registry, and have the router pull from there.
Operational Practices for Scalable GraphQL
A federated graph is only as good as its operational discipline. The following practices separate a scalable graphql platform from a distributed monolith.
1. Schema checks in CI
Every subgraph change should run two checks before merge:
- Composition check: does the new schema still compose with the rest of the supergraph?
- Operation check: does the change break any client query currently in use?
Apollo's schema checks and similar tools compare proposed schemas against recorded operations from production traffic. This is how you safely evolve a graph shared by many teams.
2. Guard against the N+1 explosion
Federation makes cross-service joins easy, which makes accidental N+1 patterns easy too. Mitigate with:
- DataLoader batching inside every reference resolver.
- Query plan inspection. Apollo's router exposes the query plan; review expensive plans during code review.
@requiresand@providesdirectives to reduce round-trips when one subgraph already holds data another needs.
type Order @key(fields: "id") {
id: ID!
shippingEstimate: Float! @requires(fields: "weight zone")
weight: Float! @external
zone: String! @external
}
3. Authorization at the right layer
Do not centralize all auth in the gateway. The gateway should authenticate the caller and forward identity (a verified JWT or signed header) to subgraphs. Each subgraph enforces field-level authorization for the data it owns. This keeps the source of truth for access control next to the data.
4. Observability across the graph
Propagate a trace context (W3C traceparent) from the router through every subgraph call. Without distributed tracing, a slow query that fans out across four services is nearly impossible to debug. Capture per-resolver timing and per-subgraph error rates.
Common Pitfalls to Avoid
From building federated graphs for teams in regulated and high-throughput industries, the recurring mistakes are predictable:
- Treating the gateway as a place for business logic. Keep it thin. Logic belongs in subgraphs.
- One subgraph owning too many entities. That recreates the monolith.
- Shared entity ownership. Two subgraphs both claiming to own
Productcauses composition conflicts and data drift. - Skipping client operation checks. You will break a mobile app that you forgot was querying a deprecated field.
- Synchronous composition at boot. A single slow subgraph should not prevent the router from starting on a known-good supergraph.
A Sensible Rollout Sequence
If you are migrating from a GraphQL monolith, I recommend this order:
- Stand up a router in front of your existing monolith as a single subgraph. Nothing changes for clients.
- Carve out one well-bounded domain into a second subgraph.
- Move its types out of the monolith, define
@keydirectives, and verify composition in CI. - Repeat domain by domain. Let the monolith shrink rather than attempting a big-bang split.
This incremental path keeps the unified graph stable for clients while teams take ownership piece by piece.
FAQ
What is the difference between schema stitching and federation?
Schema stitching merges schemas at the gateway by writing delegation logic centrally, which puts cross-service knowledge in one place. Federation inverts this: each subgraph declares its own entities and relationships using directives, and composition is declarative. Federation scales better across many teams because ownership stays distributed.
Do I need Apollo to use GraphQL federation?
No. The federation specification is open, and multiple routers and server libraries implement it across languages including Java, Go, Rust, and Python. Apollo's tooling is common, but you can mix any spec-compatible subgraph server with any compatible router.
How does federation handle authentication and authorization?
The recommended pattern is to authenticate once at the router, then forward a verified identity token to each subgraph. Each subgraph performs authorization for the fields it owns. This avoids duplicating access rules and keeps enforcement close to the data.
Will federation hurt performance compared to a single GraphQL server?
It can if you ignore N+1 patterns, but a well-designed federated graph performs comparably. Use DataLoader batching in reference resolvers, review query plans, and apply @requires/@provides to reduce cross-service calls. The router also batches entity lookups automatically.
When is federation overkill?
If you have one or two teams and a modest schema, a single GraphQL server is simpler and easier to operate. Adopt federation when independent deployability, clear domain ownership, and divergent scaling needs justify the added operational surface.
Production-grade cloud, software, and engineering teams for scaling companies.



