In-Depth

From the Trenches: Have a Crash-Free Day

Heroix RoboMon acts as a beacon for systems administrators whose networks are about to flounder on the rocks.

Tony Canella didn't want to have to make excuses--not to anyone. So, if he could keep his network up and running, crash-free, he could avoid such a situation.

"RoboMon has given me the confidence and flexibility that I don't have to be in a situation with a crashed network, in the middle of the day especially." Canella is the Director of Network Administration at HITT Contracting, the fifth largest construction firm in the Mid-Atlantic, with clients such as AOL Corporate Campus in Dulles, Virginia and GEICO Direct in Fredericksburg, Virginia. HITT has offices in Fairfax, Atlanta, and Charleston, with 300 users, a third of whom are scattered up and down the coast using dial-in.

Product Information
RoboMon 7.5, $595 to 3,295
Heroix Corporate Headquarters, USA
617-527-1550, 800-229-6500,
www.heroix.com
The higher price includes the base product, along with management console, event monitor, and reporting and graphics modules. For a cool, interactive demo, go to www.heroix.com/aspscript/WalkthroughReg.asp.

The network is running a mixed environment of Microsoft Windows NT 4.0, SCO Unix version 5, and Novell NetWare 4.2. Canella uses most of the protocols NT has to offer--WINS, DHCP, TCP/IP, and IPX. He also has a WAN, VPN, email services, and enterprise resource systems software installed. He monitors most of these components with Heroix's RoboMon 7.5, an enterprise-wide package that detects issues in the network, notifies the administrator of the situation, and can even act proactively to prevent problems from actually arising.

So, Why RoboMon?

What led Canella to RoboMon? The bottom line was system crashes. About every other month something would lock up and bring the system down. He wasn't getting good reports with NT's Performance Monitor, and it was difficult to pin down what was causing the lock-ups. Canella read trade journals to see what other people were using and recommending. He also looked into Tivoli, but after demoing RoboMon, was sold on the product, seeing an improvement in performance within the first week. Now, he's been using it a little over a year and is still sold on it.

RoboMon has helped Canella avert the usual catastrophes: users filling up the hard drive and impending crashes caused by memory resource depletion. At HITT the fax server is attached to mail servers, and in one situation, RoboMon helped him prevent a crash; someone had been sending out 315-page faxes--just a bit taxing on resources.

Of course, there have been other perks, in addition to feeling confident that he can keep the network up and running. "Prior to using RoboMon, the IT department spent most of its time monitoring the network and putting out fires." Canella estimates that he easily saves four to five hours a week with RoboMon.

Two reviews in MCP Magazine have evaluated RoboMon:

"Ease Your Network Management Pains" by Scott R. Burgess in the January 1999 issue.

"Check the Pulse of Your SQL Server 7.0 Apps" by Mike Gunderloy in the August 1999 issue

Out-of-the-Box Rules

RoboMon is a rules-based systems management software designed to work out of the box. Canella says he didn't do much tweaking because RoboMon comes with every possible rule he could think of to need. The only thing he had to do was set up the email and pager notifications. The programmers at Heroix had already figured out what tweaks are needed for each service or product, such as WINS, SQL Server, Exchange Server, and so on, to take advantage of system resources. Canella has, however, modified some parameters, based on the information RoboMon has gathered, and the potential problems it has uncovered: not enough swap file space, not enough RAM on this machine, not enough drive space, or the drives are too slow. As Canella says, "That's why I'm so fond of the product--it's helped me tweak my systems to the point that they're quite reliable."

Figure 1. Heroix's RoboMon allows the systems administrator to manage the NT enterprise by establishing rules for network processes; a large number of rules are set by default out of the box.

Expertise was built into the Rules Engine by RoboMon's developers by watching how systems administrators actually solve problems. A rule defines a condition to check for and one or more actions that will arise from the rule, such as page the sys admin if a disk becomes less than 20 percent free. RoboMon consults a variety of data sources, including NT Event Logs, COM objects, SNMP traps and variables, and databases. Rules monitor network services and resources, as well as DHCP, WINS, and any proprietary applications to perform these functions:

  • Condition detection
  • Event correlation
  • Problem investigation
  • Notification
  • Corrective action
  • Follow up
  • Escalation
  • Resolution

RoboMon's Rule Engine runs on all the machines where the administrator wants to monitor, detect, and correct problems locally. Because data is sent across the network only when RoboMon performs a remote action or notify a central monitoring location, network traffic is kept to a minimum. Although RoboMon loads five or six services on the server, Canella says there hasn't been a performance degradation of his network.

Event Monitor

Event Monitor runs as a client/server application, but because the RoboMon processes monitoring the servers run as autonomous agents, there's no single point of failure. The administrator can monitor and manage all sites from a central location and consolidate enterprise-wide events across NT, Unix, OpenVMS, and any SNMP agent. (The software doesn't encompass monitoring of NetWare directly.) Enterprise Manager lets you make a change at any level, on any process, computer, or domain across the entire enterprise, which makes RoboMon easy to scale. First, you view the rule properties in Enterprise Manager; then, use Solutions Manager to tailor rules quickly by customizing detection thresholds, selections, and other settings--without writing code.

Self-configuring sensors dynamically adjust to changes in system configuration to prevent the need for on-going maintenance. This means you can add software or devices, and RoboMon automatically reconfigures your system. It observes an application-for example, Exchange-determines its typical utilization, and then automatically monitors for deviations from normal.

Remote RoboMon Help for NT
Heroix RoboMon Emergency Repair (ER), working with RoboMon NT, lets you remotely repair locked or unresponsive mission-critical servers running NT 4.0 or higher (on Intel and Alpha), Exchange, IIS, SQL or other BackOffice components without rebooting. RoboMon ER connects to the Internet via TCP/IP or a serial host.

In DOS mode, RoboMon ER gives you command-line access to any NT system. This allows you to reach and repair systems that are inaccessible via graphical or Web-based interfaces. The product provides diagnostic and repair commands to free up memory, shut down services, or fix other problems causing lock ups.

The ER option includes a remote console and an agent that resides on the server operating as a real-time process. You remotely access the NT server through Telnet and after being authenticated can view performance statistics, change the characteristics of a service, view processes and resources being used, and also manipulate the NT Registry. That way, if you do have to reboot, at least it's a clean one.
--Sandy Burd

RoboMon in the Future

Canella says that if he were installing RoboMon today, "I might do an enterprise-wide installation. Until it was up and running for awhile and I had had a chance to play with it, I wasn't fully aware of all its capabilities." Canella had set RoboMon up on individual servers thinking that the firewalls and schemes for the WAN would be too complex to configure. Now, he doesn't think it would be any problem to do so; but at this point, with everything in place and working fine, he sees no reason to change it. Soon, HITT Contracting will be moving to a new system using Microsoft's Terminal Server and Citrix's MetaFrame, which RoboMon supports, and which will simplify things. Remote users will be able to log in over the Internet and have full use of their desktops, even on low speed connections.

Canella says when he upgrades, he'll definitely consolidate events from many of his company's various OSs under RoboMon. He expects to upgrade to Windows 2000 six months or so after it's out--when the jury is in and he's convinced¾and will do a network-wide rollout to avoid any problems that might occur in the Domain from having some boxes running NT 4.0 and others Win2K. In fact, Canella expects the Win2K version of RoboMon--expected out at the same time as Win2K itself--to help him with the rollout by alerting him to any services that are failing because of the installation. Canella also expects to deploy J.D. Edwards' OneWorld, an enterprise framework, in the same timeframe and eliminate the SCO and NetWare platforms.

Canella says what he likes best about RoboMon is that he can see the health of his mission-critical production severs. "RoboMon shows me what's occurring on the machines, so I can see if there are any issues I need to address or if I can take the rest of the day off."

Featured

comments powered by Disqus

Subscribe on YouTube