Building a Monorepo CI Pipeline That Doesn't Fall Apart at Scale
Monorepo CI should only build what changed. Learn affected-service detection with git diff, Nx, Turborepo, and Bazel, plus remote caching, shared library versioning, and practical GitHub Actions configurations.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

The Monorepo Tax on CI
A monorepo sounds great until your CI pipeline takes 45 minutes to run on every push because it's rebuilding every service, even when you only changed a README. Monorepo CI pipelines need to be smart about what to build, test, and deploy. The key challenge is affected-service detection -- figuring out which packages actually changed and only running their pipelines.
This isn't a solved problem with a single tool. It's a spectrum of approaches from simple git-diff scripts to sophisticated build systems like Bazel that model your entire dependency graph. The right choice depends on your repo size, team count, and how much you're willing to invest in build infrastructure.
What Is Affected-Service Detection?
Definition: Affected-service detection is the process of determining which packages, services, or applications in a monorepo need to be rebuilt and tested based on the files that changed in a given commit or pull request. It prevents unnecessary CI work by only running pipelines for code that was actually modified or depends on modified code.
Approaches to Affected Detection
Here are the major strategies, from simplest to most sophisticated:
| Approach | Complexity | Accuracy | Best For |
|---|---|---|---|
| Path filters (CI-native) | Low | Low -- misses cross-package deps | Simple repos with independent services |
| Git diff + scripts | Medium | Medium -- manual dep mapping | Small monorepos (<20 packages) |
| Nx affected | Medium | High -- understands project graph | JS/TS monorepos |
| Turborepo | Low-Medium | High -- hash-based caching | JS/TS monorepos wanting simplicity |
| Bazel | Very High | Perfect -- hermetic builds | Large polyglot monorepos (100+ packages) |
Level 1: Path Filters in GitHub Actions
The simplest approach -- use paths filters to only trigger workflows when relevant files change. This is built into GitHub Actions and requires zero additional tooling.
# .github/workflows/api.yml
name: API CI
on:
push:
branches: [main]
paths:
- 'packages/api/**'
- 'packages/shared-utils/**' # API depends on this
- 'package.json'
- 'pnpm-lock.yaml'
pull_request:
paths:
- 'packages/api/**'
- 'packages/shared-utils/**'
- 'package.json'
- 'pnpm-lock.yaml'
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v4
- run: pnpm install --frozen-lockfile
- run: pnpm --filter api test
This works for repos where services have clearly separated directories and minimal shared dependencies. It falls apart when you have a shared library used by 15 services -- you'd need to list every service's workflow in the shared library's path filter, and every new dependency relationship requires a YAML update.
Level 2: Git Diff With Dependency Mapping
A step up: write a script that checks git diff against a dependency map to determine which services are affected.
#!/bin/bash
# scripts/affected.sh -- determine affected services from git diff
BASE_REF=${1:-"origin/main"}
CHANGED_FILES=$(git diff --name-only "$BASE_REF"...HEAD)
# Map directories to services
declare -A DEPS
DEPS[packages/shared-utils]="api web worker"
DEPS[packages/api]="api"
DEPS[packages/web]="web"
DEPS[packages/worker]="worker"
DEPS[packages/db]="api worker"
AFFECTED=()
for dir in "${!DEPS[@]}"; do
if echo "$CHANGED_FILES" | grep -q "^$dir/"; then
for svc in ${DEPS[$dir]}; do
AFFECTED+=("$svc")
done
fi
done
# Deduplicate and output
echo "${AFFECTED[@]}" | tr ' ' '\n' | sort -u
# Use in GitHub Actions
jobs:
detect:
runs-on: ubuntu-latest
outputs:
affected: ${{ steps.affected.outputs.services }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # need full history for git diff
- id: affected
run: |
SERVICES=$(bash scripts/affected.sh)
echo "services=$SERVICES" >> "$GITHUB_OUTPUT"
test-api:
needs: detect
if: contains(needs.detect.outputs.affected, 'api')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pnpm --filter api test
Watch out: Always use
fetch-depth: 0when running git diff in CI. The default shallow clone only fetches a single commit, sogit diff origin/main...HEADwill fail or produce incorrect results. For very large repos where full clones are slow, usefetch-depth: 100or similar to get enough history.
Level 3: Nx Affected
Nx understands your project's dependency graph by analyzing imports, package.json dependencies, and its own project configuration. The nx affected command computes which projects are impacted by changes since a base commit.
# See which projects are affected
npx nx affected -t test --base=origin/main
# Run tests only for affected projects
npx nx affected -t test --base=origin/main --parallel=5
# Run lint + test + build for affected projects
npx nx affected -t lint test build --base=origin/main
# .github/workflows/ci.yml with Nx
name: CI
on:
push:
branches: [main]
pull_request:
jobs:
main:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: nrwl/nx-set-shas@v4 # sets NX_BASE and NX_HEAD
- uses: pnpm/action-setup@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: 'pnpm'
- run: pnpm install --frozen-lockfile
- run: npx nx affected -t lint test build --parallel=3
Nx's project graph is the critical advantage. If you change shared-utils, Nx knows that api, web, and worker import from it and will test all three -- without you maintaining a manual dependency map.
Level 4: Turborepo and Remote Caching
Turborepo takes a different approach. Instead of computing what's "affected," it hashes each task's inputs (source files, dependencies, environment variables) and caches the outputs. If the hash matches a previous run, it skips the task entirely and replays the cached output.
// turbo.json
{
"tasks": {
"build": {
"dependsOn": ["^build"],
"inputs": ["src/**", "tsconfig.json", "package.json"],
"outputs": ["dist/**"]
},
"test": {
"dependsOn": ["build"],
"inputs": ["src/**", "test/**", "tsconfig.json"],
"outputs": ["coverage/**"]
},
"lint": {
"inputs": ["src/**", ".eslintrc.*"]
}
}
}
# Run tasks with caching
turbo run build test lint
# Output shows cache hits
# api:build -- cache hit, replaying logs
# web:build -- cache miss, running
# shared-utils:test -- cache hit, replaying logs
Remote caching is where Turborepo really shines. Local caching only helps when you re-run the same task on the same machine. Remote caching (via Vercel or a self-hosted server) shares the cache across all developers and CI runners. If another developer already built shared-utils with the same inputs, your CI run reuses their cached output.
# GitHub Actions with Turborepo remote cache
jobs:
ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v4
- run: pnpm install --frozen-lockfile
- run: pnpm turbo run build test lint
env:
TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
TURBO_TEAM: my-team
Level 5: Bazel for Large-Scale Monorepos
Bazel is the nuclear option. It models every source file, dependency, and build action in a directed acyclic graph with hermetic (fully reproducible) builds. It caches at the action level -- not the package level -- so even a single file change only rebuilds the actions that depend on that file.
Bazel is the right choice when you have hundreds of packages across multiple languages (Go, Java, TypeScript, Python) and need build times that scale sublinearly with repo size. Google, Stripe, and Uber use Bazel for this reason. But the learning curve is steep, BUILD files are verbose, and integrating third-party dependencies is more work than with language-native package managers.
| Tool | Learning Curve | Language Support | Cache Granularity | Cost |
|---|---|---|---|---|
| Nx | Low-Medium | JS/TS primary, plugins for others | Project-level | Free (OSS) / Nx Cloud from $0 |
| Turborepo | Low | JS/TS | Task-level (hash-based) | Free (OSS) / Vercel remote cache |
| Bazel | Very High | Any language | Action-level (finest) | Free (OSS) / remote exec costs |
| Pants | High | Python, Go, Java, Shell | Action-level | Free (OSS) |
Shared Library Versioning Strategies
Shared libraries in a monorepo present a versioning challenge. When shared-utils changes, do you version it independently? Do consuming packages pin to a specific version or always use the latest?
Three Approaches
- Workspace protocol (no versioning) -- all packages use the latest code via
"shared-utils": "workspace:*". Changes are immediately visible to all consumers. This is the simplest approach and works well when all packages are deployed together. - Internal versioning -- shared libraries follow semver and consumers pin to a version range. Changes require a version bump and consumers update on their own schedule. More isolation, more coordination overhead.
- Changesets -- automated versioning based on changeset files. Developers describe their changes, and the tool bumps versions and generates changelogs automatically. Best balance of automation and control.
# Using changesets in a monorepo
npx changeset add
# Prompts: which packages changed? major/minor/patch? describe the change.
npx changeset version
# Bumps versions based on accumulated changesets
npx changeset publish
# Publishes changed packages to npm (if applicable)
Pro tip: For internal monorepos where all packages deploy together, skip independent versioning entirely. Use
workspace:*references and deploy all affected services on every merge to main. Independent versioning only pays for itself when different teams own different services and need to control their own upgrade schedule.
Optimizing CI Run Time
Beyond affected detection, these techniques keep monorepo CI fast:
Parallelization
Run independent tasks concurrently. Both Nx and Turborepo support --parallel to run tasks across packages simultaneously. In GitHub Actions, use a matrix strategy to run service tests as parallel jobs.
Dependency Caching
Cache node_modules, Go modules, Python venv -- whatever your package manager downloads. The actions/cache action with a lockfile hash key prevents redundant downloads.
Docker Layer Caching
If your CI builds Docker images, cache the layers. Docker BuildKit's inline cache or GitHub Actions' docker/build-push-action with cache-from/cache-to prevents rebuilding unchanged layers.
Selective Checkout
For very large repos, use sparse checkout to only pull the directories your CI job needs:
- uses: actions/checkout@v4
with:
sparse-checkout: |
packages/api
packages/shared-utils
package.json
pnpm-lock.yaml
Frequently Asked Questions
What is a monorepo CI pipeline?
A monorepo CI pipeline is a continuous integration setup designed for repositories containing multiple packages, services, or applications. Unlike single-project CI, it must determine which parts of the repo changed, only build and test affected packages, and manage shared dependencies between them. The goal is to avoid rebuilding everything on every commit.
How does Nx affected work?
Nx builds a project dependency graph by analyzing imports and package.json dependencies across your monorepo. When you run nx affected, it compares the current branch against a base commit, identifies changed files, maps those files to projects, then traverses the dependency graph to find all projects that depend on the changed ones. Only those projects are built and tested.
What is the difference between Nx and Turborepo?
Nx focuses on understanding your project graph and computing affected projects explicitly. Turborepo focuses on hash-based caching -- it hashes task inputs and replays cached outputs when the hash matches. Nx has a richer plugin ecosystem and more features (code generation, project graph visualization). Turborepo is simpler to set up and has tight Vercel integration for remote caching.
When should I use Bazel instead of Nx or Turborepo?
Use Bazel when your monorepo spans multiple programming languages (Go, Java, TypeScript, Python), has hundreds of packages, and you need action-level caching granularity for fast builds at scale. Bazel's learning curve and maintenance overhead are significant, so it's typically justified only at Google-scale repos or when language-specific tools can't keep up with your build times.
How do I handle shared library changes in a monorepo?
Three strategies: use workspace protocol references (workspace:*) for all packages to always use the latest code, use independent semver versioning with pinned ranges for isolation, or use Changesets for automated version management. For internal repos with coordinated deployments, workspace protocol is simplest. For repos where teams deploy independently, Changesets provides the best balance.
Does fetch-depth matter for monorepo CI?
Yes, critically. Affected detection tools use git diff to compare your branch against main. The default shallow clone (fetch-depth: 1) only includes the latest commit, so the diff comparison fails. Use fetch-depth: 0 for full history, or a value like 100 for large repos where full clones are slow. The nrwl/nx-set-shas action handles this automatically for Nx.
How do I keep monorepo CI under 10 minutes?
Combine four techniques: affected detection (only run what changed), aggressive caching (dependencies, build outputs, Docker layers), parallelization (concurrent tasks and matrix jobs), and remote caching (share build outputs across CI runs and developers). Most teams can get from 30+ minutes to under 10 by implementing affected detection and remote caching alone.
Conclusion
Monorepo CI isn't hard -- it just requires being intentional about what runs and when. Start with path filters if your services are independent. Move to Nx or Turborepo when shared dependencies make path filters unmaintainable. Save Bazel for when you genuinely need polyglot, action-level caching at massive scale.
The biggest mistake is accepting slow CI as the cost of a monorepo. A well-configured pipeline should only build and test what changed, cache everything else, and finish in the same time as a single-project pipeline. If your monorepo CI takes longer than 10 minutes on a typical PR, there's optimization left on the table.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
GitHub Actions Deep Dive: Writing Reusable Workflows
Go beyond basic GitHub Actions: composite actions, reusable workflows, secret passing, dependency caching, matrix builds, and permission lockdown patterns you can apply immediately.
10 min read
CI/CDSemantic Versioning and Automated Releases with Conventional Commits
Version numbers should encode compatibility, not vibes. Learn semantic versioning, the Conventional Commits spec, commitlint enforcement, and fully automated releases with semantic-release and Release Please.
9 min read
CI/CDContainer Image Scanning: Catching Vulnerabilities Before They Ship
Container images carry hundreds of dependencies you didn't write. Learn how to scan them with Trivy, Grype, Snyk, and Docker Scout, manage false positives, choose minimal base images, and automate dependency updates.
10 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.