devops

CI/CD β€” Delivery Engineering

Pipeline design, build-test-scan-artifact, secrets management, deployment strategies, and failure handling


CI vs CD

Continuous Integration (CI)Continuous Delivery (CD)Continuous Deployment
WhatMerge + build + test automaticallyAlso packages + deploys to stagingAlso deploys to production automatically
GoalCatch integration bugs fastAlways have a deployable artifactShip to users on every commit
Human gateAfter testsBefore productionNone
Required forAll teamsMost product teamsHigh-maturity teams with great tests

The confusion: β€œCI/CD” usually means CI + Continuous Delivery. True Continuous Deployment to production is rare and requires very mature test coverage.


Pipeline Design Principles

  1. Fast feedback β€” developers should know if something broke in minutes, not hours
  2. Fail fast β€” run the fastest checks first (linting before integration tests)
  3. Reproducible β€” same input always produces same output
  4. Idempotent β€” running the pipeline twice doesn’t cause problems
  5. Observable β€” logs, artifacts, and metrics at every stage
  6. Secure β€” secrets never in logs, minimal permissions per stage
Commit β†’ [Lint] β†’ [Unit Test] β†’ [Build] β†’ [Security Scan] β†’ [Integration Test] β†’ [Push Artifact] β†’ [Deploy Staging] β†’ [Deploy Production]
↑ fast ↑ parallel possible ↑ gate here ↑ ↑ manual approval?

Pipeline as Code

Everything in version control. No clicking in UIs to configure pipelines.

# GitHub Actions example
name: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm run lint
- run: npm run typecheck
test:
runs-on: ubuntu-latest
needs: lint # only run if lint passes
services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: test
options: >-
--health-cmd pg_isready
--health-interval 10s
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test
env:
DATABASE_URL: postgres://postgres:test@localhost:5432/test
build-and-push:
runs-on: ubuntu-latest
needs: test
if: github.ref == 'refs/heads/main' # only on main branch
permissions:
id-token: write # for OIDC auth to AWS
contents: read
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-role
aws-region: us-east-1
- name: Build and push to ECR
run: |
IMAGE_TAG="${{ github.sha }}"
docker build -t myapp:$IMAGE_TAG .
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp:$IMAGE_TAG

Build β†’ Test β†’ Scan β†’ Artifact

Build

# Reproducible builds β€” always specify exact versions
- name: Build Docker image
run: |
docker build \
--build-arg BUILD_DATE=$(date -u +%Y-%m-%dT%H:%M:%SZ) \
--build-arg GIT_SHA=${{ github.sha }} \
--build-arg VERSION=${{ github.ref_name }} \
-t myapp:${{ github.sha }} \
.

Test

# Run tests with coverage
- name: Run tests
run: npm test -- --coverage --ci
- name: Upload coverage
uses: codecov/codecov-action@v4

Test pyramid: More unit tests (fast, cheap) β†’ fewer integration tests β†’ even fewer E2E tests (slow, expensive).

Scan

# Security scanning β€” multiple layers
- name: Scan dependencies for vulnerabilities
run: npm audit --audit-level=high
- name: Scan Docker image
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
severity: 'CRITICAL,HIGH'
exit-code: '1' # fail pipeline if found
- name: Scan for secrets in code
uses: trufflesecurity/trufflehog@main
with:
path: ./

Artifact

# Tag with both git SHA (immutable) and version tag
- name: Tag and push artifact
run: |
GIT_SHA="${{ github.sha }}"
VERSION="1.2.3"
docker tag myapp:$GIT_SHA myrepo/myapp:$GIT_SHA
docker tag myapp:$GIT_SHA myrepo/myapp:$VERSION
docker tag myapp:$GIT_SHA myrepo/myapp:latest
docker push myrepo/myapp:$GIT_SHA # immutable reference
docker push myrepo/myapp:$VERSION
docker push myrepo/myapp:latest
# Store build artifacts
- name: Upload build artifacts
uses: actions/upload-artifact@v4
with:
name: build-artifacts
path: ./dist/
retention-days: 30

Secrets Management

Never Do This

# WRONG β€” secrets in code, in logs, in history
- run: docker login -u myuser -p mysecretpassword123
env:
API_KEY: sk-live-abc123 # visible in git history forever

Do This Instead

# Use GitHub Secrets
- run: echo "${{ secrets.DOCKER_PASSWORD }}" | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
# Use OIDC (no long-lived credentials at all)
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions
aws-region: us-east-1
# GitHub gets temporary credentials via OIDC β€” no secrets stored anywhere

Secrets management tools:

ToolUse case
GitHub/GitLab SecretsCI/CD pipeline secrets
AWS Secrets ManagerRuntime secrets, auto-rotation
HashiCorp VaultMulti-cloud, complex secret workflows
AWS Parameter StoreConfig + simple secrets (cheaper)
SOPSEncrypted secrets in Git
External Secrets OperatorSync secrets into K8s

Deployment Strategies

Rolling Deployment

Replace instances one at a time. Zero downtime, slow rollback.

Before: [v1] [v1] [v1] [v1]
Step 1: [v2] [v1] [v1] [v1]
Step 2: [v2] [v2] [v1] [v1]
Step 3: [v2] [v2] [v2] [v1]
After: [v2] [v2] [v2] [v2]
# Kubernetes rolling update (default)
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # create 1 extra pod before removing old
maxUnavailable: 0 # never reduce capacity during update

Best for: Standard deployments where you can tolerate both versions running simultaneously.

Blue-Green Deployment

Run two identical environments (blue = current, green = new). Switch traffic instantly.

Blue (v1): [LB] β†’ [v1] [v1] [v1] ← receives traffic
Green (v2): [ ] [v2] [v2] [v2] ← idle, being deployed
Switch:
Blue (v1): [ ] [v1] [v1] [v1] ← idle (kept for rollback)
Green (v2): [LB] β†’ [v2] [v2] [v2] ← now receives traffic

Pros: Instant rollback (just switch LB back), no mixed versions in production Cons: Double the infrastructure cost during deployment, DB migration complexity

Canary Deployment

Send a small percentage of traffic to the new version, gradually increase.

v1: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 95% traffic
v2: β–ˆ 5% traffic ← canary
Monitor metrics... if healthy:
v1: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 50% traffic
v2: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 50% traffic
Continue until...
v1: 0%
v2: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 100%

Best for: High-risk changes where you want to validate with real traffic before full rollout. Requires good observability to detect regressions.


Rollback Mechanisms

Terminal window
# Kubernetes β€” rollback deployment
kubectl rollout undo deployment/myapp
kubectl rollout undo deployment/myapp --to-revision=3
# Check rollout history
kubectl rollout history deployment/myapp
# Docker Swarm
docker service update --rollback myapp
# ECS
aws ecs update-service \
--cluster production \
--service myapp \
--task-definition myapp:42 # previous task definition
# General principle: tag images with git SHA, deploy by SHA, rollback = redeploy old SHA

Failure Handling

In the Pipeline

# Allow a step to fail without failing the whole pipeline
- name: Optional security scan
run: trivy image myapp:latest
continue-on-error: true
# Retry on flaky steps
- name: Integration tests
run: npm run test:integration
timeout-minutes: 10
# Notifications on failure
- name: Notify on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "Pipeline failed on ${{ github.repository }} (${{ github.sha }})"
}

In Deployment

Terminal window
# Always verify deployment after rolling out
# Health check endpoint pattern:
GET /health β†’ 200 OK
{
"status": "ok",
"version": "1.2.3",
"git_sha": "abc1234"
}
# Automated smoke tests after deploy
curl -f https://myapp.example.com/health || \
(echo "Health check failed, rolling back" && kubectl rollout undo deployment/myapp)

Pipeline Observability (Logs & Artifacts)

# Always upload test results
- name: Upload test results
uses: actions/upload-artifact@v4
if: always() # upload even on failure
with:
name: test-results
path: |
./test-results/
./coverage/
# Annotate failures with test details
- uses: dorny/test-reporter@v1
if: always()
with:
name: Test Results
path: test-results/*.xml
reporter: java-junit
# Track deployment in external system
- name: Create deployment record
uses: chrnorm/deployment-action@v2
with:
token: ${{ secrets.GITHUB_TOKEN }}
environment: production
ref: ${{ github.sha }}

What to track per pipeline run:

  • Git SHA being built/deployed
  • Start time, end time, duration
  • Test results (pass/fail counts, flaky tests)
  • Artifact versions and locations
  • Deploy target and version before/after
  • Who triggered the pipeline (human or automated)