Sunday, October 01, 2006

The Irrelevance Of Elements

An executive (who will not be named to protect his guilt), once remarked to me, "Response time is an over-hyped metric. If you monitor all of the elements of an application system with a sufficient level of frequency and granularity, you will detect things that can cause response time problems before they actually do cause response time problems".

I could not disagree more. Monitoring every element of an application system (all of the hardware and all of the software) is both a hopelessly complex task and an increasingly irrelevant one. Let's deal with the complexity first. Exactly why did Enterprise Management Frameworks fail (or if you don't think that they have "failed" why are they hated by the people who pay for them and use them)? Answer, it is a hopelessly complex undertaking to try to keep up with how fast every aspect of the hardware and software industry evolves in one management product. For example most management frameworks do not do a good job with wireless networks, VOIP, .Net based applications, and virtualized servers. Why, because these things are all too new to have been properly integrated into the agents for the various platforms where these new technologies operate.

On to the point about irrelevance. The infrastructure (network, server hardware, and systems software) is becoming so redundant that failures in any single element of the infrastructure often just do not matter. There is either an ability to route around the failure, or an ability to shift the workload somewhere else (to another server in a load balanced farm).

The last point is that monitoring elements of the system is not only irrelevant, but also a misguided reaction to how the infrastructure has evolved. Since frameworks are not doing the job, each group that is responsible for each supporting technology (the database server team for example) gets their own tool. This is good for point tool vendors, but it does nothing to help solve the problem of ensuring good user experience. As a matter of fact having each team monitor their own slice of the applications system makes the problem worse, since people either waste time on things that do not matter, or they fool themselves (and their management) into thinking that everything is all good because their point product shows all green lights.

Perhaps it is time to rethink the problem entirely. Let's start the process with what matters, the ability of the users of the system to do their jobs. Let's measure that, and when it is degraded figure out why. Nothing else matters, does it?

Bernd Harzog
CEO
End-User-Experience.com