An API contract is a formal, machine-readable agreement between a service provider and its consumers that specifies exactly what requests are valid, what responses to expect, and what guarantees the API offers around versioning and compatibility. Contracts can describe synchronous HTTP/RPC APIs (OpenAPI 3.x, gRPC/Protobuf), asynchronous event-driven APIs (AsyncAPI 3.x), or data schemas flowing through pipelines (Avro, JSON Schema, Protobuf in Confluent Schema Registry).
This guide covers API contracts at intermediate depth for data engineers and backend engineers preparing for technical interviews. It targets OpenAPI 3.1, AsyncAPI 3.0, Avro 1.11, and Pact v4 — all current GA versions as of 2025.
Schema-first (also called design-first) means writing the contract document before any implementation code. The spec becomes the single source of truth: server stubs, client SDKs, mock servers, and documentation are all generated from it. The alternative — code-first — derives the spec from annotations on existing code, which often results in incomplete or inaccurate contracts.
Key artefacts produced from a schema-first workflow:
openapi-generator)Schema evolution rules determine which changes are safe to deploy without coordinating a lockstep upgrade across all producers and consumers.
| Compatibility Mode | What it means | Allowed changes | Forbidden changes |
|---|---|---|---|
| Backward | New schema can read data written by the old schema | Add optional fields, remove fields with defaults | Add required fields, rename/retype fields |
| Forward | Old schema can read data written by the new schema | Remove fields, add fields with defaults | Add new required fields that old readers don't understand |
| Full | Both backward and forward at once | Add/remove optional fields with defaults only | Any breaking change |
| None | No compatibility checks enforced | Any change | — |
Confluent Schema Registry enforces compatibility modes per subject. The default is BACKWARD.
A breaking change causes existing consumers to fail without code changes on their side. Common examples:
int → string)maxLength)Non-breaking changes include adding new optional fields, adding new endpoints, adding new enum values (with caution — see Gotchas), loosening constraints, and deprecating (not removing) endpoints.
There is no universally correct versioning strategy; the best choice depends on your consumer base and deployment model.
/v1/users, /v2/users) — explicit, easy to cache, easy to route; leads to duplication if not managed well.Accept: application/vnd.myapi.v2+json) — clean URLs; harder to test in a browser or curl.?version=2) — simple but semantically odd for REST (version is not a resource filter).In CDCT the consumer defines the subset of the provider's API it actually uses, and that subset becomes a binding contract the provider must not break. Pact is the dominant open-source framework for CDCT.
Workflow:
.json pact file describing interactionsA Schema Registry is a centralised store for schemas referenced by Kafka messages (and other event streams). Producers serialise messages with an Avro/Protobuf/JSON Schema and register the schema; consumers look up the schema by ID embedded in the message wire format. Confluent Schema Registry and AWS Glue Schema Registry are the two dominant implementations. Benefits: schema enforcement at produce-time, evolution compatibility checks, single source of truth for data contracts in pipelines.
↑ Back to topA large e-commerce platform decomposes into Order, Inventory, Payment, and Notification services. Each service publishes its REST API as an OpenAPI 3.1 spec stored in a central spec repository. A CI gate runs oasdiff breaking on every PR to prevent accidental breaking changes. Consumer-driven Pact tests run on the provider in the Notification service's pipeline so the Order service can confidently add new webhooks without manual coordination calls.
A trading firm streams trade execution events into Kafka topics consumed by risk, compliance, and analytics systems — each owned by a different team. All topics are registered in Confluent Schema Registry with FULL compatibility. A data contract YAML file (following the Open Data Contract Standard) is co-located with each Avro schema, documenting SLAs, data quality rules, and owner contact. Schema changes require an automated PR review from all registered consumers before merging.
A platform team exposes infrastructure APIs (provision a database, create a service account) via an internal API gateway. APIs are spec-first OpenAPI documents, used to generate Terraform provider schemas and CLI tool completions. Contract tests run in a staging environment after every deployment, using Dredd to replay the spec's example requests against the live service and assert expected responses.
A data team ingests clickstream events from a mobile app. The mobile app's schema evolves frequently. By registering the event schema in AWS Glue Schema Registry with BACKWARD_ALL compatibility, the Glue ETL job reading from Kinesis can handle old messages still in the stream while the mobile app ships a new schema version. Downstream Redshift tables are auto-updated via schema evolution in the Glue job.
# openapi: 3.1.0 — uses JSON Schema 2020-12 dialect
openapi: "3.1.0"
info:
title: Order Service API
version: "2.1.0"
description: Manages customer orders. Breaking changes follow SemVer.
paths:
/v2/orders:
post:
operationId: createOrder
summary: Create a new order
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/CreateOrderRequest"
responses:
"201":
description: Order created
content:
application/json:
schema:
$ref: "#/components/schemas/Order"
"422":
description: Validation error
components:
schemas:
CreateOrderRequest:
type: object
required: [customer_id, items]
properties:
customer_id: { type: string, format: uuid }
items:
type: array
minItems: 1
items: { $ref: "#/components/schemas/OrderItem" }
coupon_code: { type: string, nullable: true } # optional field
OrderItem:
type: object
required: [sku, quantity]
properties:
sku: { type: string }
quantity: { type: integer, minimum: 1 }
// v1 — original schema registered in Schema Registry
{
"type": "record",
"name": "TradeEvent",
"namespace": "com.example.trading",
"fields": [
{ "name": "trade_id", "type": "string" },
{ "name": "symbol", "type": "string" },
{ "name": "quantity", "type": "int" },
{ "name": "price", "type": "double" },
{ "name": "timestamp_ms", "type": "long" }
]
}
// v2 — adds optional "venue" field with a default: BACKWARD compatible
{
"type": "record",
"name": "TradeEvent",
"namespace": "com.example.trading",
"fields": [
{ "name": "trade_id", "type": "string" },
{ "name": "symbol", "type": "string" },
{ "name": "quantity", "type": "int" },
{ "name": "price", "type": "double" },
{ "name": "timestamp_ms", "type": "long" },
{
"name": "venue",
"type": ["null", "string"], // union: null | string
"default": null // must default to the first union type
}
]
}
import pytest
from pact import Consumer, Provider
# Consumer: Notification service expects Order service to return this shape
pact = Consumer("notification-service").has_pact_with(
Provider("order-service"),
pact_dir="./pacts",
)
def test_get_order_returns_expected_shape():
expected_body = {
"order_id": "abc-123",
"status": "confirmed",
"customer_id": "cust-456",
}
(
pact
.given("order abc-123 exists and is confirmed")
.upon_receiving("a request for order abc-123")
.with_request(method="GET", path="/v2/orders/abc-123")
.will_respond_with(200, body=expected_body)
)
with pact:
import requests
resp = requests.get(f"{pact.uri}/v2/orders/abc-123")
assert resp.json()["status"] == "confirmed"
# Pact writes ./pacts/notification-service-order-service.json
from confluent_kafka.schema_registry import SchemaRegistryClient, Schema
client = SchemaRegistryClient({"url": "http://schema-registry:8081"})
# Register v1 schema under subject "trades-value"
schema_v1 = Schema(
schema_str=open("trade_event_v1.avsc").read(),
schema_type="AVRO",
)
schema_id = client.register_schema("trades-value", schema_v1)
print(f"Registered schema ID: {schema_id}")
# Set subject-level compatibility to FULL before registering v2
client.set_compatibility("trades-value", compatibility_level="FULL")
# Register v2 — Schema Registry rejects it if it violates FULL compatibility
schema_v2 = Schema(
schema_str=open("trade_event_v2.avsc").read(),
schema_type="AVRO",
)
schema_id_v2 = client.register_schema("trades-value", schema_v2)
print(f"Registered v2 schema ID: {schema_id_v2}")
# Check compatibility before registering (dry-run)
is_compatible = client.test_compatibility("trades-value", schema_v2)
print(f"Will v2 pass? {is_compatible}")
# .github/workflows/api-contract-check.yml
name: API Contract Check
on: [pull_request]
jobs:
breaking-changes:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- name: Install oasdiff
run: |
curl -sSfL https://raw.githubusercontent.com/tufin/oasdiff/main/install.sh | sh
- name: Check for breaking changes
run: |
# Compare base branch spec vs PR branch spec
oasdiff breaking \
origin/main:openapi/order-service.yaml \
openapi/order-service.yaml \
--fail-on ERR # exit 1 on any breaking change
↑ Back to top
| Technology | Best For | Schema Format | Sync/Async | Code Generation | Streaming Support |
|---|---|---|---|---|---|
| OpenAPI 3.1 | REST HTTP APIs, public/partner APIs | JSON Schema 2020-12 | Sync (request/response) | Excellent (50+ languages via openapi-generator) | No (use AsyncAPI for events) |
| gRPC / Protobuf | Internal microservices, low-latency RPC, mobile backends | .proto files | Sync + streaming (HTTP/2) | Excellent (native) | Server-side, client-side, bidirectional streaming |
| AsyncAPI 3.0 | Event-driven APIs, Kafka topics, WebSocket channels, MQTT | JSON Schema / Avro / Protobuf | Async (pub/sub) | Good (asyncapi-generator) | Native — designed for events |
| Avro + Schema Registry | Kafka data pipelines, schema evolution in streams | .avsc JSON | Async (embedded in messages) | Moderate (avro-tools, fastavro) | Excellent — compact binary, schema embedded by ID |
| GraphQL | Client-driven queries, BFF layer, product APIs | SDL (Schema Definition Language) | Sync (+ subscriptions for events) | Good (graphql-codegen) | Subscriptions only |
| JSON Schema | Document validation, config validation, webhook payloads | JSON Schema | Any (schema only) | Moderate | No native transport |
model_config = ConfigDict(extra="forbid")). Always check consumer-side strictness. Similarly, adding a new enum value is technically non-breaking in the spec but will crash deserializers that use exhaustive switch/match statements.
"type": ["null", "string"], the default must be null (the first type). Declaring "default": "" causes a schema parse error that can be silent until runtime. Always put null first in nullable unions in Avro schemas.
TradeEvent v1 schema from Example 2 and register it in a local Confluent Schema Registry (Docker Compose). Then attempt to register a v2 schema that renames price to trade_price. Observe the compatibility error. Fix it by adding trade_price as a new optional field alongside the original and verify v2 registers successfully under BACKWARD mode.
GET /v2/orders?page=1&limit=10 and expects a response with fields items (array), total (integer), and next_cursor (nullable string). Write the Pact interaction, run it to generate a pact file, then implement a minimal Flask provider that satisfies the pact and run provider verification.
User resource with a GET /users/{id} endpoint. Commit it to a git repo. Then create a branch that removes the email field from the response schema. Add an oasdiff breaking step to a GitHub Actions (or local shell) workflow that compares main vs the branch and fails the build. Confirm the gate triggers, then make a non-breaking change (add optional phone field) and confirm the gate passes.
Backward compatibility means the new schema can read data written by the old schema — consumers can upgrade independently. Forward compatibility means the old schema can read data written by the new schema — producers can upgrade independently. Full compatibility requires both. In practice: adding a new field with a default satisfies Backward; removing a field satisfies Forward; adding an optional field with a default satisfies Full.
Avro's specification requires that the default value for a union field must be valid for the first type listed in the union. So ["null", "string"] requires "default": null, while ["string", "null"] would require a string default. This is a common gotcha: reversing the union order to get a non-null default can accidentally break backward compatibility, since old readers won't know about the null variant if it's not first.
The Pact triangle refers to the three-way relationship between consumer, provider, and the Pact Broker. Consumer tests generate pact files; the broker stores them; provider verification runs against the broker. The key operational benefit is the "can-i-deploy" query: before deploying service X to production, the CI pipeline asks the broker whether the currently deployed consumers are compatible with X's latest pact verification results. This lets teams deploy independently without manual coordination, as long as all pacts are verified.
Many deserializers and code generators produce exhaustive switch/match/when statements over enum values. When a producer sends a new enum value that the consumer's generated code doesn't know about, the consumer throws an unhandled case exception. In Protobuf this is handled gracefully (unknown values are preserved as integers), but in Avro, JSON Schema, and OpenAPI-generated clients, new enum values can cause hard failures. The safe pattern is to always include an UNKNOWN or catch-all case in consumer logic.
Prefer gRPC/Protobuf when: (1) performance is critical — Protobuf binary encoding is 3–10× smaller than JSON and faster to serialize; (2) you need bidirectional streaming (not possible with REST); (3) the API is internal-only and all consumers can handle the binary protocol; (4) you want strong typing enforced at compile time via generated stubs. Prefer OpenAPI when: the API is public or partner-facing (HTTP+JSON is universally accessible), you need browser or curl accessibility, or your consumers can't run a gRPC stack.