NETSCOUT, QUANTIVA and the Role of Statistics in APM
NETSCOUT SYSTEMS TO ACQUIRE QUANTIVA
One of the difficult things about implementing an effective APM solution (one that accurately measures all user transactions and accurately points the finger at the real cause of the problem) is that the number of things that can go wrong are large in number, and greatly varied in character. Furthermore, problems with end user experience often lack any kind of repeatable pattern in their cause (it is something different every time).
There are two basic approaches that one can take to figuring out what went wrong with application service level or end user experience. The first approach is a deterministic one. This means that the APM product has to know all of the interactions that an application has with its own components and its supporting infrastructure. Examples of a deterministic approaches that work are products from Wily Technology, Symantec/Veritas, Quest, Tivoli, Identify Software, and AviCode that all measure performance at the web server, and then time how long each module of code takes to do its job, and then point out the guilty line of code. This works just fine when the application is written to J2EE or .Net and there is specific support in the code inspection tool for applications written to that infrastructure.
There is a far larger problem of how to do APM when the application includes pieces that may not be J2EE or its Windows equivalent (previously COM+ and now .Net), and the infrastructure for the application includes many different types of servers. The problem is also expanded in complexity once you include the LAN and WAN components as part of the application infrastructure.
In order to deal with both the problem of complexity in applications systems and the fact that each problem tends to have a unique cause several vendors have deployed rich and complex statistical analysis of the data that is collected by both their own APM systems and by agents from other vendors. In the late 1990's CA tried this with Neugents (and failed), and ProactiveNet and Netuitive tried this with their technology. Today, ProactiveNet focuses upon predicting performance problems and is selling its product as a statistical overlay on top of existing infrastructure management products from HP, CA, IBM/Tivoli, NetIQ and BMC. Netuitive is focusing its statistics technology upon analyzing alarms from BMC and NetIQ to determine which ones are valid (alarm reduction). VIEO employs statistical analysis in its APM solution for the Citrix market which helps its offering do problem determination for applications hosted in the Citrix MetaFrame environments.
Augmenting deterministic features in an APM offering with statistical capabilities allows these vendors to bring the advantages of statistical analysis to their customers. Some of the advantages of statistics in APM are:
- A statistics engine can learn normal behavior patterns, and can stop alerts related to "normal busyness" from flooding the administrators of systems. So, if a particular application system is busy at 10 AM and 2 PM on weekdays, and CPU regularly spikes to 70% on applications servers during those times, admin's can be spared a large number of "false positive" alerts.
- A statistics engine can detect emerging correlations between degradations in performance (user experience) and resource bottlenecks somewhere in the applications system. The ability to figure out that contention for a particular resource on a particular server is the cause of a performance problem is probably the single most valuable aspect of applying statistics to APM. However this is only true if the statistical engine can dynamically and automatically evolve its model as the behavior of the system changes.
On the other hand there are issues with certain implementations of certain statistical systems that constrain their usefulness, and increase the cost and complexity of their use. These issues are:
- Some statistical systems need regular daily patterns of utilization to learn and establish a baseline of normal operations. This is fine for a stock trading web site, but may be a deal killer for many internal applications that have much more varied usage patterns.
- Some statistical systems need an up front training period to build a model (this was the flaw in the neural network technology that was the basis of CA Neugents), and this model must be rebuilt whenever demand and resource utilization patterns change.
The acquisition of Quantiva by Netscout is another example of how rich statistical analysis can be applied to large amounts of data in order to find patterns and speed the problem resolution process. Netscout already has a strong APM offerings, and it will be made stronger by the addition of the statistical performance analytics purchased from Quantiva. Enterprises should be careful however to evaluate the Quantiva technology and make sure that it fits the demand patterns, utilization patterns, and speed of change inherent in the underlying application system.
Bernd Harzog
CEO
APM Experts
bernd.harzog@apmexperts.com
