What to Blame: Infrastructure or Code?
- David Peček
- Feb 3, 2020
- 2 min read
Updated: Sep 13, 2020

I wanted to bring this up as I am seeing a common trend in troubleshooting where the infrastructure supporting the applications is being blamed for the issues. I will admit it can be the easy to blame the infrastructure first, since it is one of the easiest things to change. However oftentimes it can result in more costs than just fixing the root cause of this issue.
Its important to take a balanced approach to troubleshooting and look at all of the symptoms before requesting changes to infrastructure.
Weigh all the Data
Consider some of these factors when doing troubleshooting of an issue to see where the issue lies:
Application memory usage. Is the container the application running in large enough to support what it needs? Does the application have a memory leak which is causing it to use excessive amounts of allotted memory? On the contrary is the application allotted too much memory as this too can cause issues?
Database size and performance. When looking at used up connection pools or high CPU usage of a database there may be other factors at play. Why are the applications making so many connections? Are the tables optimized for the types of queries you are running? Are connections being closed correctly?
Larger instance or more nodes? When looking at a clustered or load balanced application which is crashing due to resource constraints: consider the characteristics of the application and if it would benefit from more instances to spread the load or if increasing to the next size instance would help out.
When Infrastructure is the Answer
I have seen these circumstances be when infrastructure is the answer to an issue.
Increased transactional traffic though a system. When you are looking at the natural growth of a system it is logical to upgrade a database instance or add more nodes to the cluster. If possible use auto scaling on your clusters to prevent unnecessary use of resources when no longer needed.
More applications accessing resources. As you continue to add more services to your stack, it can put a strain on messaging services and databases given the extra volume of requests being sent through. At this point it makes sense to upgrade these resources to handle the additional volume.
Comments