When reading "Working Backwards" by Amazon employees, Colin Bryar and Bill Carr, a particular quote resonated with me, given my experiences working with successful engineering teams.
The most successful teams invested much of their early time in removing dependencies and building "instrumentation"—our term for infrastructure used to measure every important action—before they began to innovate, meaning, add new features.
In the rush to release new products, it's tempting to skip adding metrics and monitoring. Many software teams cut corners and ship features without these; after all, instrumentation isn't a feature customers see.
I saw that first-hand. A project from the past, with overpromised delivery, ended with tension between the C-level and the team building a strategically critical feature.
Is it done?
No, we need one more week.
Is it done?
No, again...
Is it done finally?
It is! Live and working.
Perfect! So, how is the adoption going? Do we have any first transactions already?
We don't know... Ask the BI team, maybe...?
It's not like we didn't have any metrics. We measured resource usage and crashes with some default Grafana dashboards. But no one thought about adding critical metrics related to product and ops.
Actually, no one knew whether what they had built for 6 months served real customers.
Innovate with Eyes Open
Moving fast without visibility is just flying blind. And flying blind isn't bold – it's reckless. A counterintuitive truth is that moving fast alone doesn't guarantee innovation. You also need to move smart, and this is where instrumentation shines.
Instrumentation means baking measurement into everything you build. Conceptually, it's not about any specific tool or dashboard – it's the practice of designing your software and business processes to collect data on their own behavior.
Every time a user clicks a button, every API call, every product purchase or error event can be captured. The purpose is simple: to know what's going on under the hood in real time. If your team relies on a Product Manager to define desired metrics in tickets they give yoy, you seriously lag behind high-performing organizations.
"But no one told us what the success metrics are for this project 🤷" - this was our explanation for not having any critical telemetry. "BI team will calculate that from databases anyway, no?"
What Is Instrumentation and Why Should You Care?
At its core, instrumentation serves as the nervous system of innovation. It feeds continuous, unbiased feedback to your team. This enables quicker course corrections and smarter risk-taking (read more about why it's important: The Three Ways: Flow, Feedback, and Continuous Learning).
For example, by instrumenting user interactions and system performance, you gain early signals (like a sudden drop in sign-ups or a spike in load time) before they become catastrophes. And let's be clear - if your team owns a certain domain (user authentication, payments system, complex data processing), part of your job is to know the volume of such events. If someone has to tell you about unexpected spikes or drops in the system you own, that's bad.
You don't need to know all the nuances around revenue, churn rates, or how your piece of code influences the company's EBITDA. But if, for instance, you own a payments system, it would be great to understand how many of such actions are processed, what's the rejection rate, and what are the root causes if something goes wrong (whether it's your system failing, a 3rd party, or it's just a customer's mistake).
Good instrumentation reduces risk by catching issues early and improves decision-making by grounding discussions in facts, not hunches.
The Purpose of Instrumentation
Instrumentation can be classified in multiple ways. Here's an example breakdown:
System metrics: CPU usage, memory consumption, disk I/O, and network throughput.
Application metrics: Request rates, error rates, and response times.
Product metrics: System events, feature adoption, user interactions.
Business metrics: User engagement, conversion rates, revenue, volumes, and other metrics tied to business outcomes.
While each category provides unique insights, real power comes when you combine them. For instance, monitoring application metrics (like response times and errors) alongside product metrics (like feature usage) helps you directly correlate technical performance with user engagement and, ultimately, critical business metrics.
In mature companies, such a combination often falls under reliability, which is seen as a product feature and prioritized accordingly (read more: Intro to SLA, SLO, and SLI).
For example, Google discovered that every 500 ms increase in search page load time led to a 20% drop in traffic. Amazon, likewise, found that a mere 100 ms of latency costs 1% in sales. Multiple research papers and online publications state how, e.g., loading time and corresponding progress bars influence our perception of "waiting."
The Temptation to Skip Instrumentation
Under intense pressure, instrumentation can feel like an optional "nice-to-have" – something that slows you down. After all, writing analytics events or setting up dashboards doesn't visibly push new features to customers. The assumption is that you can always add monitoring after the feature proves itself.
But skipping instrumentation is dangerous. It's like flying a plane without a cockpit.
Here's why:
The uncomfortable truth is that most of the features we build are only hypotheses. Neither a Product Manager nor your CEO knows if the functionality you pushed to customers is the one they truly need and are willing to pay for (read more here: The Role of Engineering in Product Model Transformation - Changing the Way Problems Are Solved).
The first version of the product is very rarely successful. Usually, you need countless iterations, fixes, and amendments, preceded by customer interviews, A/B tests, and research.
Having reliable instrumentation will fuel your decision-making capabilities on system, application or product levels. Teams that charge ahead without metrics often pay the price in nasty surprises. An outage goes unnoticed until users complain, or a costly feature gets built that nobody actually uses, and no one realizes it for months.
Amplification
Amplification is a powerful concept introduced by Gene Kim in Wiring Winning Organizations. It emphasizes the need to make problems visible early and consistently, allowing organizations to swarm, contain, and resolve issues before they escalate into larger, systemic failures.
Amplification ensures that problems are clearly identified, communicated, and resolved. Why is this so important? According to the DevOps Handbook, processes should be designed so that defects are swarmed and fixed immediately rather than passed down the value stream.
By amplifying issues and addressing them swiftly, leaders can create a culture of continuous improvement. In this culture, problems are tackled before they escalate, and teams are empowered to learn from their mistakes. This process reduces the risk of systemic failures and drives faster decision-making and better collaboration across teams.

Data Before Ideas: How Instrumentation Fuels Innovation
Instrumentation provides visibility that enables creativity with confidence. For example, Netflix, famous for its experimentation culture, runs countless A/B tests on its streaming platform. It can innovate with new features (like personalized thumbnails or playback speed options) because it has robust instrumentation to immediately tell it how those changes affect user engagement.
When Netflix introduced a new algorithm tweak, they weren't just hoping it would improve retention – they were measuring every click and play to see if it did. This data-driven trial-and-error accelerates innovation by quickly killing bad ideas and doubling down on good ones.
The best ideas win because the metrics prove them out, not because the HiPPO 👨💼 decreed them.
In “Working Backwards”, the authors emphasize that Amazon's innovation is driven by a relentless focus on metrics and "input" instrumentation.
Amazon's process starts by defining how they'll measure success for a new initiative. Then, teams work backwards from the desired customer outcome and figure out what metrics (inputs) will lead to that outcome. Crucially, they instrument those inputs from day one. Why? Because having real-time feedback on these controllable factors lets them experiment safely.
When you can see the effect of a change immediately, you're free to try more daring ideas with less risk. If an experiment moves the needle in the wrong direction, strong instrumentation will show it quickly and clearly – allowing a fast rollback or adjustment.
As a side note, good metrics are also part of my Problem-Solving Framework. They are essential for defining your problem before pursuing immediate solutions.
Principles of Effective Instrumentation
Instrumentation isn't a one-time task, it's a philosophy. To implement it effectively, consider these core principles, no matter the tech stack or tool:
Instrument Early and Everywhere: Don't wait until after launch to add metrics. Bake instrumentation into the initial design and development. Every feature should come with the question: "How will we know if this works and how will we detect issues?"
Measure What Matters (Not Just What's Easy): It's easy to count logins or page views, but the real power comes from measuring the right things. Focus on key actionable metrics tied to customer experience and business outcomes. Amazon calls these "controllable input metrics" – the levers you can pull that drive results
Make Data Accessible and Actionable: Instrumentation isn't helpful if the data just sits in a silo. Ensure that dashboards, alerts, and logs are readily visible to the team. Set up regular reviews. When an alert fires or a dashboard trend dips, have clear owners who investigate. The goal is to turn raw data into decisions.
Iterate and Refine Your Metrics: Just as products evolve, your metrics should too. Treat metrics as living code – periodically ask if they're still serving their purpose. Amazon literally audits and revises its metrics over time to ensure they stay aligned with reality.
Embed Ownership and Accountability: Effective instrumentation has clear ownership. Assign team members or roles (like a "telemetry owner") to ensure instrumentation is working and data quality remains high. Netflix's example is instructive – they have clear ownership of telemetry quality with regular reviews. This means someone is thinking about whether events are firing correctly, whether logs are too noisy, or if alert thresholds need tuning.
Treat Observability as a Culture, Not a Box to Tick: Perhaps most importantly, foster a culture where instrumentation and observability are valued. This starts with engineering leadership setting the tone. Celebrate team members who catch a problem because of an alert – that positive reinforcement shows that finding issues early is a win, not a nuisance.
The Flywheel of Observability
Here's what separates winning engineering leaders from the rest: Instrumentation isn't infrastructure—it's intelligence.
For engineering leaders, the ultimate goal is to weave instrumentation into the fabric of their company's operations. This is more than just installing a monitoring tool—it's about establishing a mindset that every meaningful action should leave a data trail.
In cultivating this culture, remember that it's a journey. Early wins build momentum. Perhaps your startup begins by instrumenting key user flows and using those insights to improve onboarding. That success can justify expanding instrumentation more broadly. As the data-driven mindset takes hold, you'll see a shift: debates get settled faster because the numbers are available, and risky ideas get tested more often because you trust your monitoring to catch issues.
In effect, you create a flywheel of observability – the more you measure, the more confidently you innovate; the more you innovate, the more there is to measure.
Complementary Articles
Here are more articles from Practical Engineering Management that can aid you in implementing the culture of Observability:
When teams know they have visibility into outcomes, they approach innovation more like scientists and less like gamblers. It turns gut-feeling decisions into measured bets. I’ve seen teams shift from arguing opinions to debating which input metric to monitor... and that’s a powerful upgrade in culture.
Helpful read! Often, teams only care about metrics when customers complain.