Introduction to DevOps
DevOps is a set of practices, cultural philosophies, and tools that combine software development (Dev) and IT operations (Ops) to shorten the development lifecycle and deliver high-quality software continuously.
What is DevOps?
Before DevOps, development and operations teams worked in silos. Developers wrote code, threw it "over the wall" to operations, and ops struggled to deploy and maintain it. DevOps breaks down that wall.
What are Silos?
A silo is an organizational pattern where a team works in isolation — with its own goals, tools, processes, and definition of success — largely disconnected from adjacent teams.
In a siloed setup, a typical software company looked like this:
[ Dev Team ] [ QA Team ] [ Ops Team ]
Writes code → Tests code → Deploys & runs it
Goal: ship Goal: find Goal: stability
features fast bugs (resist change)
Each team optimized for its own KPIs, which often conflicted:
The result: releases became rare, high-risk events. When something broke in production, teams pointed fingers because no one had full ownership of the outcome.
Concrete signs of silo thinking:
Core idea: automate and collaborate so that code goes from a developer's laptop to production reliably, quickly, and repeatedly.
The Three Ways (Principles)
DevOps thinking is often framed around three foundational principles:
The DevOps Lifecycle
Plan → Code → Build → Test → Release → Deploy → Operate → Monitor
^___________________________________________________|
(continuous loop)
| Stage | What happens |
|---|---|
| Plan | Define features, track work (Jira, Linear, GitHub Issues) |
| Code | Write source code, peer review via pull requests |
| Build | Compile or package the application |
| Test | Automated unit, integration, and end-to-end tests |
| Release | Tag a version, produce an artifact (Docker image, binary) |
| Deploy | Push the artifact to an environment (staging, production) |
| Operate | Run and maintain the live system |
| Monitor | Observe metrics, logs, and alerts; feed insights back to Plan |
Key Practices
Continuous Integration (CI)
Developers merge code changes to a shared branch frequently — at least once a day. Each merge triggers an automated build and test run.
Why it matters: Catch integration bugs early, when they are cheap to fix.
Push code → CI server picks it up → Build → Run tests → Pass/Fail notification
Continuous Delivery (CD)
Every change that passes CI is automatically packaged and made ready for deployment to any environment at the push of a button.
Continuous Deployment
One step further than Continuous Delivery — passing changes are deployed to production automatically, with no manual approval.
CI (pass) → staging deploy → smoke test → production deploy
Infrastructure as Code (IaC)
Servers, networks, and databases are defined in code files and version-controlled like application code.
Benefits:
Popular tools: Terraform, Pulumi, AWS CloudFormation
Monitoring and Observability
You can't improve what you can't measure. Observability is built on three pillars:
| Pillar | What it tells you | Tools |
|---|---|---|
| Metrics | Numeric measurements over time (CPU, request rate, error %) | Prometheus, Datadog |
| Logs | Timestamped event records | Loki, CloudWatch, ELK Stack |
| Traces | End-to-end path of a single request | Jaeger, Zipkin, OpenTelemetry |
DORA Metrics
DORA (DevOps Research and Assessment) metrics are the four industry-standard measurements used to assess how well a software team is delivering. They were established through years of research by the DORA team (now part of Google Cloud) and published in the State of DevOps reports.
The key insight: high performers excel at all four simultaneously — speed and stability are not a trade-off, they reinforce each other.
| Metric | What it measures | Elite benchmark |
|---|---|---|
| Deployment Frequency | How often you deploy to production | On-demand (multiple times/day) |
| Lead Time for Changes | Time from commit to running in production | Less than 1 hour |
| Change Failure Rate | % of deployments that cause a production incident | 0–5% |
| Mean Time to Recovery (MTTR) | How fast you restore service after an incident | Less than 1 hour |
Deployment Frequency
Measures how often your team ships to production. Low frequency usually signals large, risky batches — the bigger the release, the harder it is to isolate what broke.
Low performer: monthly or less Elite performer: multiple deploys per day
Lead Time for Changes
The elapsed time from a developer committing code to that code running in production. Long lead times point to slow CI pipelines, manual approval gates, or infrequent merges.
Low performer: 1 month to 6 months Elite performer: less than 1 hour
Change Failure Rate
The percentage of production deployments that result in a degraded service or require a hotfix/rollback. A high rate signals insufficient testing, missing feature flags, or poor deployment practices.
Low performer: 46–60% Elite performer: 0–5%
Mean Time to Recovery (MTTR)
How quickly you restore normal service after a production incident. Teams with low MTTR invest in observability, runbooks, and on-call processes so they can diagnose and fix fast.
Low performer: 1 week to 1 month Elite performer: less than 1 hour
Tracking DORA Metrics
You can derive these from your existing tools:
Common DevOps Tools
Version Control
CI/CD Pipelines
Containerization
Infrastructure as Code
Cloud Platforms
DevOps vs. Traditional IT
| Aspect | Traditional | DevOps |
|---|---|---|
| Release cadence | Monthly / quarterly | Daily / on-demand |
| Team structure | Dev and Ops siloed | Cross-functional teams |
| Failure response | Blame-oriented | Blameless post-mortems |
| Infra changes | Manual, undocumented | Code-reviewed, automated |
| Feedback loop | Weeks | Minutes |
A Simple CI/CD Example
A minimal GitHub Actions workflow that builds and tests a Node.js app on every push:
# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
- run: npm test
This single file gives you:
Where to Go Next
Once comfortable with the basics, explore these areas in depth: