The Changing Face of IT Monitoring
The goal of IT infrastructure and application monitoring solutions must be more than to provide insight into IT operations. Here's what to look for.
Monitoring applications need to do more today than yesterday, and certainly more than they did 10 years ago. Cloud computing, continuous integration (CI)/continuous delivery (CD), automation and increased application-to-administrator ratios have changed how practitioners treat (and rely on) monitoring solutions. So what should a top-notch IT infrastructure and application monitoring solution consist of?
Applications -- and how we use them -- have evolved greatly over the past few decades. How we monitor these applications should have, as well. Unfortunately, monitoring solutions on the whole have been slow to evolve. While the installation difficulties that plagued IT monitoring and management suites during the mid-2000s have largely been resolved, ease of configuration and simplicity of day-to-day usage remain common problems for most solutions.
A new generation of cloud-based monitoring startups has aimed to change this. While none of them offer the breadth of the larger, more established monitoring solutions, these startups have excelled at their own chosen slice of monitoring. This redefinition of excellence and utility has in turn applied pressure to the established monitoring packages to up their game.
Slowly, cautiously, and with seemingly enormous effort, the large and established monitoring solutions are beginning to change. The question that remains is: What should they seek to change into? Looking specifically at IT infrastructure and application monitoring, what is important that modern monitoring applications do? And where are the most common areas of needed improvement?
For the most part, today's monitoring solutions get the underlying technical bits right. There's generally a split between network monitoring and server and application monitoring. Network monitoring is its own product, with everything else falling under server and application monitoring.
Server and application monitoring solutions mostly seem to monitor the same basic components, such as IP address space monitoring, DHCP and DNS. Isalive monitoring and in-depth application monitoring are almost always part of the package, with event log handling and log file analysis and performance monitoring also rounding out the common technical bits that are monitored.
Full network monitoring solutions go much more in-depth on network issues than server and application monitoring solutions. In addition to gathering network insights from owned network hardware, modern network monitoring solutions use advanced techniques to perform root cause analysis on network issues, even when those networks do not belong to the organizations monitoring the links in question.
Unfortunately, there remain areas where many of the large monitoring solutions could stand to see substantial improvement on basic technical monitoring capabilities. One such area is that of IT asset management.
Creating a comprehensive inventory of IT assets and monitoring that inventory for changes should be one of the most basic features of monitoring solutions. However, it's one of the least-updated features. Every few years vendors might pay it some attention, but hardware is constantly evolving and this needs to be a perpetual priority.
As a group, the large monitoring solutions are also appallingly terrible at detecting runaway event generators. Log and event management is a critical part of monitoring package functionality, and yet it remains disconcertingly easy for one rogue event generator to fill up a database without the monitoring solution flagging up to administrators that perhaps something has gone awry.
Similarly, automated discovery of infrastructure components (and especially of applications) borders on pathetic for many of the large monitoring solutions. While basic discovery capabilities do tend to exist, correlating discovered items in a way that administrators can understand the interrelationships remains beyond most of these solutions.
For example, a SQL administrator doesn't just want to see that a SQL server exists on the network. He needs to see which operating system environment (OSE) it's running on, on which hypervisor that OSE is executing, which physical server that hypervisor is executing on, to which cluster it belongs, and which storage and networking resources are underlying those clusters.
Administrators will want to know which workloads depend on that SQL server. They'll want to know about the backup status of that SQL server, whether it's replicating to a partner, and what the update status of the entire chain of dependencies is.
Today's monitoring solutions may be able to see all of these individual items, but they are not very good at revealing the interrelationships and dependencies. This problem is magnified when dealing with hybrid infrastructure, and is especially true when dealing with hybrid applications.
Another area where most monitoring solutions could stand to improve is to update their automatic discovery solutions to move beyond infrastructure. Application discovery is the most important feature of any IT infrastructure monitoring solution, whether the solution be networking-focused or server-and-application-focused.
It is the applications that administrators care about, and our diagnostics capabilities should focus on those applications. That should be true whether you're trying to trace why an application is performing slowly, or why you can't connect to it. It's the applications that matter.
Because it's unrealistic to expect any monitoring solution to know about all applications, support for the creation of templates is critical. Templates need to be easy to share with the wider community that (hopefully) exists around the monitoring product, and community-created templates need to be easily discoverable and consumable. In turn, this requires a strong community into which the vendor puts continuous effort.
The existence of a templating capability cannot, however, be used as a crutch. The vendor needs to be constantly updating the monitoring solution's application awareness, an area of effort where virtually every vendor has been sorely lacking.
Data visualization is perhaps the single most important thing that a modern monitoring solution does. Human beings aren't robots. We're not meant to look at endless streams of text and somehow deduce both actionable information and context from it. We don't just look at the code of matrix and see what's going on. We need summaries and graphs; charts, maps, and other pretty pictures.
This makes a monitoring solution's data visualization among the most important feature sets that a modern monitoring solution has. Without at the very least some robust reporting and analytics capabilities, the monitoring solution in question is more liability than benefit.
Data visualization should ideally also include fully featured Network Operations Center (NOC) displays and customizable dashboards. Those dashboards should be fully integrated into the complete suite or monitoring and management applications offered by the monitoring vendor, allowing administrators to click on visually displayed errors and warnings and go directly to solutions allowing administrators to visualize and remediate the problem.
Monitoring solutions need to allow for integration of external data. This allows administrators to add important context to their data visualization.
Another feature that monitoring applications need to adopt from the startups is the visual mapping of application and service chains. As discussed earlier, monitoring applications need to not simply monitor individual applications, but understand the entire context of the infrastructure upon which that application depends.
This needs to be represented visually, especially when reviewing historic performance data. Infrastructure context chain visualizations allow administrators to look at a badly performing application and examine how the infrastructure elements upon which it depends are responding -- or have responded -- all aligned by time.
If at 5:00 p.m. the Web server throws a wobble, then administrators need to be able to quickly and visually see that the proxy server, the load balancer, the underlying OSE, hypervisor and storage array were all fine, but the storage network seemed to be constrained at exactly that time, allowing for instant root cause analysis.
All of this could have been written 10 years ago and it would have been just as true. Monitoring is a slow-moving field, and the large monitoring applications take a long time to adapt to demand. There are, however, emerging requirements and higher-level considerations that have only entered mainstream necessity in the past few years.
The most important high-level missing feature of many of today's large monitoring applications is hybrid multi-cloud support. Organizations of all sizes are using on-premises solutions, services provider clouds, as well as offerings from the major public cloud providers. An application that monitors only on-premises workloads, which only monitors the big four public clouds, or (worst of all) favours one cloud provider above the others is useless.
Monitoring solutions need to monitor the infrastructure that organizations actually use, and that infrastructure is increasingly diverse. Microsoft-only organizations ceased to be a thing a decade ago, and in the intervening 10 years monitoring applications that don't offer robust support for Linux, macOS and mobile platforms have proven themselves to be wholly inadequate.
In a similar fashion, multi-cloud support is now table stakes. Organizations are also demanding support for the myriad Internet of Things devices they have to monitor, and -- increasingly -- support for non-x86 infrastructure and workloads. Mainframes are growing again, and it's time they came in from the cold. ARM solutions are increasingly part of the modern datacenter, and support must be extended to them.
Application monitoring needs to evolve to include native support for distributed applications. It's great to be able to look at a SQL server and understand the context in which that individual instance operates, but the wider context of the application may have it participating in a SQL cluster that includes multiple SQL instances running on multiple different clouds, as well as on-premises.
That SQL cluster will also produce backups somewhere, which have to be tracked and monitored. Applications will depend on that cluster, and they need to be monitored; they may even need to be monitored from multiple locations in order to understand how those applications respond to different users around the world, especially when they're relying on a clustered database solution where any individual database instance can have its own performance issues and failures.
Last, monitoring solutions need full integration into service desks. When an error is detected by monitoring software, that software needs to instantly raise tickets. Those tickets should contain the context of the error, or at the very least links to data visualization solutions that can provide the context.
The goal of IT infrastructure and application monitoring solutions must be more than to provide insight into IT operations. The goal of IT infrastructure and application monitoring solutions must be to make remediation of problems simple, and to spot patterns that typically lead toward issues so that those issues can be avoided before they happen.
Today, organizations can do all of the above, but it often takes lashing together multiple offerings. The result of a mess of overlapping solutions, with poor-to-non-existent integration between them. Fortunately, some of the larger monitoring solutions have started to evolve, and organizations of all sizes will be better for it.
Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.