Home / Blog / Engineering
Engineering

Metrics You Must Measure From Day One

Most teams only start measuring after something breaks. But there are metrics that should be tracked from the very first line of code — even before you have a single user.

Yudi Nugraha
May 16, 2026
11 min read

There is one mistake almost every engineering team makes early on: they only start measuring after something breaks.

Error rates spike before anyone sets up monitoring. Deployments keep failing before anyone thinks about change failure rate. Code becomes unmaintainable before anyone realizes there are no tests.

But metrics are not just alarms. Metrics are how you understand a system before it becomes a problem.

---

Why Metrics Matter

Without metrics, you are managing a system by gut feeling.

You do not know whether the code you wrote yesterday made the system more stable or more fragile. You do not know whether that last deployment slowed response times. You do not know how long it will take your team to recover from an incident.

Metrics turn feelings into facts. And facts are the only solid foundation for decisions.

One thing that often gets overlooked: technical metrics are leading indicators, while user complaints are lagging indicators. By the time a user reports a problem, the damage has already been done — and in business, damage that has already happened usually comes with a cost. Teams that only monitor complaints are always reacting. Teams that monitor metrics can be proactive.

What is interesting is that not all metrics need to be measured at the same time. Some can — and should — be tracked from day one of coding. Others become relevant once the system is running in production. Others only matter once the team starts collaborating.

---

From Day One of Coding

These are metrics that do not require production. You can start measuring them today, even before you have a single user.

MetricTargetWhy it matters
Test coverage≥ 80%Your primary safety net
Cyclomatic complexity≤ 10 per functionComplex code is a bug magnet
Linting errors0Consistent code standards

Test Coverage: Your Primary Safety Net

Test coverage measures how much of your code is executed when the test suite runs. The 80% target is not a magic number — it is realistic enough to achieve while staying productive.

More important than the coverage number itself is what is not covered. Low coverage in critical business logic is far more dangerous than low coverage in configuration boilerplate.

Start by writing tests for the happy path first. Then add edge cases gradually. Do not chase 100% — chase coverage in the places that carry the most risk.

In business terms, low test coverage is debt that compounds over time. The larger the codebase, the longer each debugging session becomes whenever something changes. Teams that skip this measurement usually only notice when velocity has already dropped sharply after 6–12 months — and by then, the cost per feature is far higher than it should be.

Cyclomatic Complexity: Complex Code Is a Bug Magnet

Cyclomatic complexity measures the number of independent paths through a function. Every if, for, while, and case adds one more path.

A function with complexity above 10 usually signals one of two things: it is taking on too many responsibilities, or its logic can be simplified.

The higher the complexity, the more test cases are needed. The more test cases needed, the more likely something gets missed. That is why complex code is always where bugs hide.

Most modern linters can measure this automatically. Turn on the rule from the start.

From a business perspective, high complexity means longer onboarding time for new engineers. A new engineer who needs two weeks to understand a single module is an engineer who is not productive for two weeks — that is a real cost that never appears in a report but always shows up in timelines.

Linting Errors: Consistent Code Standards

Zero linting errors is not about aesthetics. Linting catches code patterns that are statistically more likely to cause bugs — unused variables, unintentional type coercion, unawaited promises.

Beyond that, linting makes team code feel like it was written by one person. Consistency reduces cognitive load during code review. And lower cognitive load means reviewers can focus on logic, not writing style.

In business terms, this is about hidden overhead. Every PR review spent debating formatting and style is time from a senior engineer that could have gone toward something more valuable. Zero linting errors is not a neatness obsession — it is an investment in process efficiency.

---

From the First Deploy

Once the system is running in production — or even in staging — a new set of metrics needs to be monitored.

MetricTargetWhy it matters
Error rate< 1%Know about problems before users report them
Response time (p95)< 500msPerformance baseline
Availability / uptime≥ 99.5%A reliable system

Error Rate: Know About Problems Before Users Report Them

Error rate is the percentage of requests that result in an error compared to total requests. A target below 1% does not mean one error per 100 requests is acceptable — at scale, 1% can mean thousands of users experiencing problems every hour.

What makes error rate important is not just the number itself, but the trend. An error rate that gradually rises after a deployment is a clear signal that something is wrong — even before the first user sends a complaint email.

Set up alerting from the start. A single email notification when error rate crosses a threshold can save hours of investigation later.

The business translation is direct: if your app's conversion rate is 3% and your error rate is 2%, more than half of potential transactions are failing. Error rate is not purely a technical metric — it is a revenue metric that happens to be monitored by engineering.

Response Time (p95): Performance Baseline

Response time p95 is the response time experienced by the fastest 95% of your users. It is more meaningful than the average because it cannot be easily skewed by outliers.

A target below 500ms for p95 is a fairly conservative standard — meaning even the most "unlucky" users (excluding the bottom 5%) still get a reasonable response.

The important thing is to establish a baseline early. When you change a database query or add new middleware, any shift in p95 will be immediately visible. Without a baseline, you will not know when the system starts slowing down.

Research from Google and Amazon has shown that each additional 100ms in response time can reduce revenue by up to 1%. This is not something you feel directly — there is no alert when revenue quietly drops — but it accumulates in silence. Response time p95 is a business metric disguised as a technical one.

Availability / Uptime: A System You Can Rely On

99.5% uptime sounds high. But it means the system can be down for about 43 minutes per month. For most business applications, that is a reasonable starting target.

Low uptime is not just about users being unable to access the system. Downtime erodes trust — and trust is far harder to rebuild than a broken system.

Monitor uptime from external monitoring, not just from inside the server. The server can be running while the application is dead. External monitoring is the only way to see what users are actually experiencing.

For the business, downtime carries two types of cost: direct and indirect. The direct cost is the transactions that fail while the system is down. The indirect cost is trust — and trust is far more expensive to rebuild than fixing a broken system. If you have SLAs with clients, uptime is no longer a technical target — it is a legal contract.

---

From When the Team Starts Collaborating

Once more than one person is working on the same codebase, the dynamics change. There are engineering metrics that only become meaningful in the context of collaboration.

MetricTargetWhy it matters
Lead time for changes< 1 dayDelivery speed
Change failure rate< 15%Stability of every deploy
MTTR< 1 hourRecovery capability
All three come from the DORA Metrics framework — the result of years of research by DevOps Research and Assessment, analyzing thousands of engineering teams worldwide.

Lead Time for Changes: Delivery Speed

Lead time measures how long it takes from a code commit to it running in production. A target of under one day does not mean you have to deploy every day — it means your pipeline is efficient enough that you could, if needed.

Long lead time is usually caused by one of three things: slow review processes, a test suite that takes too long, or layers of approval steps that add no real value.

Find the bottleneck. Sometimes one small change — like parallelizing tests — can cut lead time from a week to a day.

In business terms, lead time is your speed of response to the market. A team that can deploy in one day can respond to user feedback, capitalize on momentum, or fix a critical bug before a competitor notices there is a gap. A team with a two-week lead time is always one step behind.

Change Failure Rate: Stability of Every Deploy

Change failure rate measures the percentage of deployments that cause problems in production — whether that means a hotfix, a rollback, or a full incident.

A target below 15% does not mean failing 15 times out of 100 deploys is fine. It is more of an upper bound that separates teams with healthy deployment processes from those without.

Teams with high change failure rates are usually missing one of two things: sufficient test coverage, or a staging environment that accurately mirrors production.

The business impact is often invisible in the short term but very real: teams that frequently experience failed deployments will start to fear deploying. When teams fear deploying, features get stuck. When features get stuck, business opportunities are lost. A high change failure rate is not just a technical problem — it is a brake that slows down the entire organization.

MTTR: Recovery Capability

Mean Time to Recovery measures how long it takes on average to recover from an incident. A target of under one hour does not mean you have to fix every bug within an hour — it means your system needs fast rollback capability, enough observability for diagnosis, and clear runbooks for common scenarios.

Low MTTR reflects team readiness. Teams that recover quickly are not teams that never experience problems — they are teams that have practiced and built the infrastructure to handle them.

MTTR is the number CFOs and investors ask about first after a major incident: "how long was the downtime?" Behind that question is a simple calculation — how much was lost per minute, and has the engineering team prepared to minimize it. Teams with low MTTR are not just more technically reliable; they are safer as a business.

---

How to Talk About Metrics With Non-Technical Stakeholders

Technical metrics often fail to land not because the concepts are difficult, but because they are presented the wrong way. Engineers speak in the language of systems; business speaks in the language of money, time, and risk.

Three translations that almost always work:

Money — calculate the concrete loss. Not "uptime dropped 0.5%", but "the system was down for 43 minutes this month, and during that time no transactions could be processed."

Time — show the consequence to the timeline. Not "our lead time is two weeks", but "our competitors can release new features two weeks faster than us every cycle."

Risk — frame it as insurance. Not "we need to invest in test coverage", but "without this, every code change is a gamble — and we do not know we have lost until a user reports it."

Once technical metrics can be translated into these three languages, conversations with stakeholders shift from defense to dialogue.

---

Start With What Is Closest to You

You do not need to measure all of these metrics at once.

If you are just starting to code: enable the linter, start writing tests, and set up a tool to monitor complexity. That alone puts you ahead of 80% of teams just getting started.

If you just made your first deploy: set up error tracking and uptime monitoring before you tell anyone it exists. No matter how small the app.

If your team is just starting to collaborate: start recording how long it takes from PR creation to merge, and how many hotfixes appear after each deployment.

Metrics do not need to be perfect to be useful. Even rough data is better than no data at all.

The most important thing is to start — and start with what is closest to where you are right now.

Tags

Software EngineeringEngineering MetricsDevOpsCode QualityObservability
Y

Yudi Nugraha

Software Engineer | Builder

More Articles

Explore more articles on similar topics

View All Articles