Much is being written about how our applications and systems should be observable. Of course with something as new an popular as Observability, there are many different definitions for it. A review of those definitions and my contribution to them can be found in, Observability is About the Data.

Gaps in Observability

Applications are not Observable

The first gap in Observability is with applications. Much of what is being written about Observability states that if you do things in this way, or have this kind of data, that your system is Observable. Most of this focuses either upon whether logging inserted into the application, metric collection inserted into the application or monitoring agents inserted into the application lead to Observability. Which leads to the first point. Applications are not observable unless you take steps to make them so. There is no standard API for applications like there is for software infrastructure (JMX). Therefore the developers of every commercial packaged application have to build the instrumentation into their application and the developers of every custom developed application have to either build in the instrumentation or deploy an appropriate APM tool.

Networks are not Observable

While every physical network device and every virtual network service exposes metrics that tell you something about how the device or service is working, all of the data from the network layer is completely lacking in application and transaction context. The root cause of this problem is that it is currently impossible to know through which network devices or services a particular transaction or set of microservice interactions are flowing. Netflow is the closest thing that the networking vendors have provided but it only provides source and destination IP address, the ports, and the amount of traffic, which again completely lacks application and transaction context.

Therefore if the goal is to make an entire system observable, the network represents a huge gap in this endeavor.

The Public Cloud is not Observable

If the goal is true end-to-end Observability through the entire stack, then we have to know what hardware our applications and services are running on, what level of contention exists for the CPU, memory, network, and disk resources associated with that hardware, and what else is using that hardware that is causing that contention. This level of visibility into the operation of the underlying environment is simply not provided by the three major public cloud vendors (AWS, Azure or Google). This is the principal reason why many enterprises choose to run VMware vSphere either on their own hardware or on hardware rented from the three major cloud providers – because in these cases the behavior of all of the virtual and physical objects and metrics are exposed through the vCenter API.

Summary

Full-stack Observability is currently impossible due to the inherent lack of Observability in applications and the gaps in Observability that exist in networks and public clouds.