Managing Complexity of Software Systems
You Cannot Create or Remove Complexity. You Can Only Shift It.
It’s already a tradition for AWS to host its annual conference in the last quarter of the year, re:Invent. Like Google I/O, Apple WWDC, and others, it’s a time to present the latest advancements in the company’s portfolio, showcase use cases, customer success stories, and more.
There is a particular presentation from AWS I always await with impatience—Werner Vogels’ keynote. I view Amazon’s CTO presentations primarily as philosophical considerations in computer science (yes, there is also a sales pitch element, but I find it minimal).
In 2022, Vogel discussed an asynchronous world, mainly around event-driven architectures. In 2023, he touched on costs as part of product Non-Functional Requirements and software architecture (it brought us great content available here: thefrugalarchitect.com).
This year, Werner Vogels walked us through the complexity of engineering systems, which is the subject I’d like to explore today.
Here, you can find a PDF cheat sheet summarizing this article.
Engineers Manage Complexity
Let’s zoom out. As repeatedly stated here, engineering's role is to solve customers’ problems. We engineers build products to make things possible or easier. It’s as simple as that.
The complexity of our lives is reduced by technology. The First Industrial Revolution harnessed steam, coal, and railway networks to reduce the complexity of moving around. The Second Industrial Revolution built on that, adding electricity, mass production, and the automotive industry, so traveling, transportation, and communication became even more straightforward. The Third Industrial Revolution brought the digital age: computation, IT, and high-tech communication, enabling the true globalization of information. The Fourth Industrial Revolution democratizes access to data and adds various forms of “intelligence” to that.
Each of these advancements makes things less complex for us people.
Tesler’s Law of Conservation of Complexity
Although today you can send a text message across the globe at the speed of light, it’s not entirely true that the complexity of such an operation was reduced. In reality, it’s just moved somewhere else.
Your smartphone has computational power orders of magnitude greater than the Apollo V rocket. Your message travels through satellites or fiber cables under the sea. Such systems are anything but simple. What happened here is that complexity was moved from customers to service providers.
According to Tesler’s Law, any system has a certain amount of complexity that cannot be reduced. If that’s true, we can only shift complexity elsewhere, taking this burden away from our users.
Explore Tesler’s Law and other laws of UX in this excellent resource: Laws of UX.
“Complexity Can Neither Be Created nor Destroyed, Only Moved Somewhere Else.”
In his keynote, Werner Vogels splits complexity into two categories: Intended and Unintended.
It’s crucial to distinguish between these two because the former is necessary for system functionality and growth—it’s exactly about shifting the complexity burden from customers to us. The latter, unintended complexity, often arises from ad hoc changes and a lack of architectural oversight. It can hinder progress and maintainability, making our system more fragile.
From my experience in the fintech world, making a money transfer between two countries can be as simple as scanning a fingerprint in Google Pay. But under the hood, we have banks, card issuers, 3DS security checks, payment gateways, KYC processes, and a network of payout partners. It’s a highly complex system, but when built correctly, it can remove 90% of these difficulties from the end user.
But engineers take shortcuts—set hardcoded values, rush with delivery, don’t supervise architectural supervision, don’t write tests, or automate their work. It leads to declining feature velocity, more errors and support tickets, and tons of debugging.
This is unintended complexity, or as we often call it—technical debt.
Complexity Warning Signs
In his presentation, Werner Vogels highlights several signs of unintended complexity:
Declining feature velocity
Frequent escalations
Time-consuming debugging
Excessive codebase growth
Inconsistent patterns
Dependencies everywhere
Undifferentiated work
Many of these points may sound familiar, primarily if you have worked in an organization for an extended time.
As systems become increasingly complex, implementing new features becomes more challenging. This leads to more errors and issues, resulting in a surge of escalations and support tickets. Because you spend more time debugging and fixing issues, the work becomes undifferentiated for customers and stakeholders.
This isn’t a one-way street. This situation requires intervention and can be improved. Just start with good telemetry. It could be DORA’s deployment frequency, change fail rate, and others—or some SLIs and SLOs. While one-time measures aren’t always insightful, tracking these metrics over time will help you understand the rise of complexity.
Once you know something is off, it’s worth doing some classification to pinpoint the exact causes. You can either use the Ten Types of Software Engineering Waste or the Ten Types of Technical Debt from Google or ThoughtWorks.
"Simplicity Requires Discipline"
Complexity can be addressed not only reactively but also proactively. Some practices will help you manage it more effectively, but it requires continuous and conscious effort.
Here are the highlights from the keynote:
Intentional Design from Day One: Even if a system starts simple, consider its future evolution from the initial design phase. During the presentation, Canva’s CTO, Brendan Humphreys, discussed how they initially launched a stateless monolith. This approach allowed them to get to market quickly while maintaining a clear path to scale. They modeled the monolith around key entities and encapsulated each with a service interface, which later became a good foundation for a distributed microservices architecture.
Resisting the Temptation of Quick Fixes: Pushing changes without additional tests, endlessly extending existing components leading to the “mega-service” anti-pattern, or the “we’ll fix it later” approach (read more: Optimizing Your 20% Time For Tech Debt). While these may seem like quick wins or low-hanging fruit, when accumulated, they lead to unmanageable complexity. You must be disciplined when dealing with these temptations.
Investing in Manageability Upfront: The time spent running a system far exceeds the time spent building it. When planning your work, remember that ownership is much broader than delivery (more here: You Build It, You Run It). Don’t ignore critical infrastructure for your work, such as proper telemetry, alerting, documentation, and fast CI/CD processes.
Maintaining simplicity is not a one-time task but rather a discipline that needs to be embedded in the engineering culture and practices. At the very least, as an engineering leader, define some basic Non-Functional Requirements that are non-negotiable when doing engineering work.
How to Manage Complexity - Amazon’s Principles
According to Amazon, managing complexity can be distilled into 6 Lessons in Simplexity:
Make Evolvability a Requirement
Break Complexity into Pieces
Align Organizations with Architecture
Remove Uncertainty
Automate Complexity
Embrace Time as a Building Block
1. Make Evolvability a Requirement
Remember that your product's first push is only the beginning. There will be endless iterations and adjustments and hopefully, constant growth. Ensure that your initial design accommodates at least some future changes. This can be done through modular architecture or simple and fast deployment processes. For inspiration, see Simplification Through Incrementalization.
2. Break Complexity into Pieces
No single person fully understands every detail of how an iPhone is built. The only way to make it possible to build such complex systems is to decompose them into modules designed to work together but also be developed and understood in isolation.
There are many different approaches to modularization—microservices, modular monoliths, plug-in architectures, and others. The important part is to:
Decompose complexity into smaller modules or business domains
Make module development independent
Define clear interfaces and boundaries for each module
One way to achieve this is through Domain-Driven Design. For more, see my article "Simplification Through Modularization."
3. Align Organizations with Architecture
Conway’s Law states: “Any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.”
Once you decide to modularize your system, ensure the team structure mirrors that. It will help with ownership practices and reduce cross-team dependencies.
An excellent resource for designing the organization is Team Topologies.
4. Remove Uncertainty
Another way to manage complexity is—no surprise—reducing the potential for unexpected issues and simplifying management. It means ensuring that data processing and behavior are predictable. For example, if writing tests for your software is difficult, it might be due to too much uncertainty - consider it as a bug to fix.
This is also about not overcomplicating your solutions. One nice example from Vogel’s keynote was the network configuration for Amazon S3. Yes, it would be possible to implement it with an event-based approach, making the system react to every occurrence independently. But was it really needed? Does the system need to reconfigure itself every second? Probably not.
Here, they adopted a “Constant Work” pattern, where the configuration is periodically reloaded by pulling the full configuration file and constantly starting from a clean slate. This creates a self-healing mechanism because even if you provide a broken configuration, it’ll be fixed by the next pass.
5. Automate Complexity
Automation is key to managing complexity at scale. Werner Vogels promotes an approach in which, rather than asking, “What else can we automate?”, you should go all in and exclude only those cases where human input is critical.
Good automation is a baseline for practices like Continuous Deployment. One critical area related to automation is your approach to testing (read more: Do You Need More Testers or Better Tests?).
6. Embrace Time as a Building Block
Werner Vogels's last principle focused heavily on Amazon’s Time Sync Service and related products. While I may not have deeply experienced this topic, let’s at least scratch the surface.
Today, we know time is relative. A physical concept called time dilation occurs when differences in elapsed time occur due to certain physical conditions. For example, GPS satellites must account for relativistic effects on time, meaning corrections must be applied for proper synchronization.
Achieving precise time synchronization is difficult and has introduced additional complexity in distributed systems, where algorithms must implement conflict resolution or distributed transactions.
The availability of Amazon’s highly accurate and synchronized clocks helps make these traditionally complex problems significantly simpler.
End Words
Understanding that complexity cannot be eliminated—only moved—and applying principles such as modularization, organizational alignment, and intentional design can help us better manage the challenges we face as engineers.
In the end, managing complexity is not a one-time event but a disciplined practice woven into the fabric of our engineering culture.
For better insight, I highly recommend watching this (and previous year's) re:Invent keynote by Werner Vogels.
This was a great read, especially nowadays a lot of people claim that by using “x” you reduce complexity. Feels good to see this diferent perspectives. I also wrote a piece about complexity handling more or less on the same principle but I like to use a noise canceling analogy to explain how complexity appears and how you can handle it.
https://buildsimple.substack.com/p/the-conservation-of-complexity-an?r=66p49
A nice overview, Mirek. I watched it also. Along with these things, I liked the part on operating complex systems by Andy Warfield, where he talked about some principles they employ to manage S3, such as challenging everything and focusing on ownership.