In control theory, observability is the ability to determine how well the internal states of a system from the knowledge of its external outputs.
In software, observability is our ability to know and discover what is going on in our systems. It helps us to have a holistic view and deep understanding of our systems, identify issues faster, understand what caused the issue, and ultimately offer better customer experiences.
Since systems are exponentially growing complex, things that can go wrong are increasing too. Often, we find ourselves looking for different answers to different questions from yesterday. This increasing complexity is why observability is so important and necessary today. Because an observable system allows us to ask any questions at any point in time and helps us to find our way from effect to cause.
Observability helps us to understand what’s slow, what needs to be optimized, when an error or an issue happens, and more importantly why.
An observable system can also tell us so many things, like:
So it also can help us answer questions about our users, validate (or invalidate) our ideas, and make decisions.
In other words, observability can give us a much deeper, shared understanding of our systems and what needs to be responded to quickly.
Observability focuses on asking any question about how the system works. That means we need to start asking questions and gather good data to be able to answer them.
Traditionally, observability is a combination of telemetry data; metrics, logs, and traces (these are also referred to as the “three pillars of observability”).
It doesn’t mean that these are going to be the only sources of information, but they are usually the main source of information. The important thing is to decide what is valuable and what does qualify for your systems. The next step is to correlate these different sources to be able to use them to quickly answer our questions. For example, using a unique request ID can get all the context from a user’s request at a specific point in time…like the time when the user complained but your monitors said things were all good.