Event Transcript Library
Datacenter Automation Crash Course: How To Get (Almost) Hands-Free Datacenter Management
As data volumes explode and IT complexity skyrockets, your datacenter needs AI -- and fast. This session dives into the pressing realities of modern infrastructure and the urgent shift toward intelligent, hands-free operations.
Transcript
Hi, everyone. Welcome to the latest installment of our ongoing editorial webinar series, Coffee Talk. Each hour-long, information-packed episode, organized by the hard-working folks at Redmond magazine, features the observations and insights of an independent expert on a wide range of tech industry topics. Many thanks to the underwriting sponsor of this episode, high-performance enterprise networking solution provider Juniper Networks. Without their support, this series would not be possible. And thanks to you for joining us.
I'm John K. Waters, Editor in Chief of the Converge360 group of 1105 Media, and I'll be your moderator. Today, we're bringing you "Datacenter Automation Crash Course: How to Get Almost Hands-Free Datacenter Management." Our lead presenter is technologist, creator of compelling content, and senior resultant Howard M. Cohen.
But before we get started, just a bit of housekeeping: this episode is being recorded for later access. Keep an eye out for an email with a link to that recording—it'll be coming your way in the next few days. We'll make some time at the end of each presentation for questions, but please feel free to type your questions into the Q&A box at any time. We'll do our very best to get to all of them. Our sponsor has provided some extra resources you won't want to miss—they're available now on your console. And as a small thank you to the first 200 attendees who stick with us to the end, we will be sending you a $5 gift certificate to Starbucks. It's a cup of joe to go with the info.
Now, I'd like to introduce our first presenter. Howard M. Cohen has spent more than 40 years in the IT industry. During that time, he has held senior executive positions at many of the top channel partner organizations, and he currently writes for and about IT and the IT channel. He's a sought-after speaker and insightful observer of the technology landscape—and one of our very favorite presenters. You're in for a great session.
Take it away, Howard.
Howard M. Cohen's Presentation
Thank you, John, and hello everybody. Welcome to our presentation today. As John said, I'm Howard M. Cohen. I'm a creator of compelling content who spent 40 years in the IT channel and writes about it and writes for it. Now, the two QR codes you see there will take you to more information about me, and I hope you'll check them out.
Those of you who have attended my sessions in the past know that I love to begin with the word of the day, and today's word, if you haven't guessed it already, is volume. Pump up the volume. Now, I'm not going to pump up volume, and we're not going to blare any music behind me. I'm not talking about a volume of sound. I'm talking about a volume of data. And if you reckon this is an example of a data stream coming in from a typical server across the network, there's tons of traffic constantly crossing the internet, and that generates tons and tons of data that's constantly coming through your network. And the datacenter is meant to capture all that data, log it, and so keep a record of everything that's gone into and out of the datacenter.
It's one thing when we think about a server, right? It's a very small—it's not even a datacenter. It's a network room, a closet, maybe. But what if you have two, three, four, five—what if you have 500 servers? Many datacenters have had hundreds of servers, and each one of them is generating data constantly, and there's data coming across the internet toward them constantly. That's an awful lot of data. To put the challenge best, I'll share a quote from our friends at Red Hat. Red Hat has said, as IT systems continue to evolve and grow—and we all know, the evolution of technology hasn't stopped in 50 years, and it shows no sign of slowing down—and as it evolves, it grows, more and more people adopt more and more technology, and as the scale grows, as the size grows, the complexity is growing right along with it, and that becomes harder and harder to manage.
So at this point, I love the way they say the sheer volume of data these systems generate is overwhelming. And indeed, that's the problem. It's all overwhelming. There's no way people can keep up with it. Try to imagine sitting at a console watching this data and getting any meaning out of it, as you do. It's crazy without sufficiently intelligent monitoring.
And now, analysis tools—and that's the key sentence here. There's a deep, deep need for sufficiently intelligent monitoring and analysis tools. If you don't have them, you're going to miss alerts. When you miss an alert, a problem goes unchecked that can turn into a disaster. You miss opportunities to improve things, and that's going to cost you, one way or another, and of course, you're going to experience a lot of downtime, and downtime is very, very, very costly.
So the problem is a big one. Well, what I'd like to do—you see that sentence, "sufficiently intelligent monitoring"? I'd like to focus in on that word "intelligent," okay? Because it's all about adding intelligence to the way in which we manage the datacenter. And when we talk about an intelligent datacenter, or when we talk about intelligence and the datacenter, it's really a two-way street we're talking about. What do I mean?
On the one hand, we obviously need artificial intelligence and machine learning to help us better automate the datacenter so we can run better. So that we don't have to capture all that data—maybe it can capture all that data—and with enough intelligence, it can analyze that data and determine what actions need to be taken to remediate problems. Great. On the other end of the argument, we need better datacenters, more intelligent datacenters, to better support artificial intelligence, run applications. Artificial intelligence is needed to make a better datacenter. Better datacenters are needed to make better artificial intelligence. It's kind of a catch-22, but it's one we're going to have to manage.
To bring that all together, I grabbed up a Gartner diagram that illustrates the role of the datacenter in today's business world. Now, of course, the datacenter performs all of its classic functions. It manages the servers, the storage, the routers, the switches, the CSU/DSUs—all the different equipment that it takes to operate a network—and that's all still in the datacenter. But more recently, and I'm talking the last 20–25 years, instead of expanding the datacenter and getting more space, many companies have opted to use a co-location facility and put some of their equipment there. So they're not paying as much in rent or heating and cooling or electricity. They are simply paying a fixed fee on a regular basis to a co-location facility to house some of their network infrastructure, and the datacenter still manages it from where they are. In some cases, the co-lo can manage it, but that's not what we're talking about today.
Now, when you think about a co-location center, of course, ultimately, your mind is going to turn to the cloud. Because if you think about it, the cloud really represents the biggest co-location facility imaginable—servers in thousands upon thousands, maybe millions, of datacenters all around the world at this point—and you can connect to any one of them. And basically, when you do that, the original NIST definition tells us that you're probably using software as a service. So the application is running in that far-flung datacenter, but your users are using it on their PCs and tablets and smartphones.
Or you're using the infrastructure in those datacenters to replace infrastructure that you would otherwise have to buy. So many people are shedding capital investments, instead choosing to use infrastructure as a service delivered from a cloud datacenter. And many are also supporting their developers with platform as a service, providing them with the databases and the tools and the analytics and the languages, and also with the ability to spin up environments quickly for development and testing and so forth. So the original three models that the NIST definition of cloud talked about, they're still deeply in use today, and we're managing them all from our datacenter.
In addition to that, of course, we have to talk about branch locations. Many organizations have had branch locations for as far as anybody can remember, and indeed, there's computing going on in those branch locations. And the headquarters datacenter usually has to participate in the management of those in some way, shape, or form.
Now, the whole concept of a branch location took on a dramatically new meaning when we suffered through the COVID pandemic, and all of a sudden, we had little teeny branch offices in all of our employees' homes. So instead of having hundreds of branch offices around the country, suddenly, we may have had thousands of little branch offices with driveways, and we have to manage those too. The datacenter has to be concerned about those. And the fact is, a lot of those people are not coming back to headquarters. They're not coming back to an office. They like working from home, and it's become a far more efficient and far more cost-effective way for a lot of companies to work. So we're going to be managing that for a long time to come.
Now, everything we've talked about so far has been about people on the network. Great. The fact is, though, that we have to look at the edge—the ever-expanding edge of the network. Used to be all the users in our building. Now the edge is global. We've created an Internet of Things. And this is not new. I constantly say this. There's a company that suggests that there have been more things on the internet than people since like 2008, and it's possible. But the fact is, today there are many more. We see all kinds of sensors and controls and switches, gauges. There's just tons and tons of stuff out there that your datacenter is now responsible for monitoring and managing, and most of those are little, tiny things. They're very low powered, they're very lossy. They need a lot of help, and so the management burden on the datacenter just keeps increasing and increasing.
To sum it up, we're dealing with more endpoints than ever that need to be managed and protected because they're the most vulnerable points on the network. More applications are running—of more types—with more demands than ever before. If you have more applications running on more endpoints, you probably have a lot more users. Although these days, every user comes with three to five different devices each.
And then finally, you take all of this together, and it's going to have to generate more data than ever, because at the end of the day, we all know it's all about the data. And I'm talking big data. And yeah, John, I included the sizzling bacon slide just for you. John likes the sizzling bacon. Who doesn't like bacon? Everybody loves bacon. The only thing better than bacon is more bacon. And everybody loves data. And the only thing better than data is more data. So data is indeed the new bacon, and we need help managing all that data, because people can't possibly keep track of it on their own. There's too much of it.
And so we're talking about how to engage artificial intelligence and machine learning to help us manage the data. Why artificial intelligence? Because without it, technology can track ons and offs, thresholds being exceeded—simple stuff like that. But it can't detect relationships between things that have any level of subtlety. It can't interpolate or extrapolate. We need something that can really look at a pattern and read into that pattern what it means, what it's telling us.
People can do that. People can interpret—when they've seen the same pattern, they know how to recognize patterns. Well, so does artificial intelligence, except that artificial intelligence preserves it much better, remembers it much better than we can, and more of it, and can analyze much faster than we can. And so artificial intelligence has a big role. And in fact, it will change the shape. It will change the look of datacenter control.
We see a list here of all the different things that are going on in a datacenter. And we're looking at people sitting at consoles, each with multiple monitors, trying to keep track of all this stuff. And clearly it's getting to be so much that this will not be the datacenter model of the future. We will not see tons of people sitting at tons of consoles for a variety of reasons. But then I'll ask you, how many of you want to be one of those people whose entire day is spent sitting at consoles trying to figure out what's going wrong? It's a difficult job, it's a demanding job, and any help—any artificial intelligence we can bring to help—is going to be a really good thing.
What these people are looking for is represented by the challenges that a datacenter faces. We've already talked about the first one—the first datacenter challenge we talked about: the volume of data that's coming at us every day. And the fact is that volume is only one of the three V's. It's not just the volume of data, which increases diametrically all the time. It's also the variety of different data types that we're facing, and that continues changing as well. And also the velocity. We're seeing data move through the network faster and faster. We're seeing improvements in bandwidth and processors and so forth. And so it's going to come bigger, faster, and in more different varieties than ever before—and that's not going to slow down. That growth is accelerating, and the acceleration is also accelerating.
We're confronted with a huge maze that we have to navigate through. There's increasing complexity. I mean, we get better and better at this stuff, the systems we develop and the systems we put in place are more sophisticated all the time, and we would have it no other way. You know that we crave more sophistication. We crave more compute power. We actually don't want to admit it, but we crave more complexity. But then we have to deal with it.
And also, it's getting bigger. It's just growing all the time. Not only is it growing bigger, but it's growing more difficult to understand. The fact is, the sheer analysis of the data, the sheer analysis of the network and the components of the network and routing across the network, and the nature of DNS—I mean, we're supposed to be shifting from an 8-bit word to a 64-bit word, and we haven't done that yet. IPv4 to IPv6 was supposed to happen 13 years ago, and because it's so complex, we've been delayed and delayed and delayed and delayed. And imagine the explosion of new capability that we'll get when we do make that change.
But just analyzing the network has become so complex that few humans can actually understand it, and that's why we need artificial intelligence, which can gather the data faster, analyze the data faster, identify patterns within the data, and give us some meaningful reports so that we can take some useful action.
Another thing is we're still in the forest, and we still don't know the forest from the trees. That is to say that all the things we do today look at the symptoms of what's going on on our network. We see an anomaly. We see something not working the way it's supposed to. We see something coming in that's an unusual pattern that doesn't look like healthy data, or it resembles a signature that we've downloaded, that we're comparing it against, because there are companies that are constantly detecting new signatures and sharing them with everybody so that we have new signatures to compare with.
That helps us identify symptoms. But the fact is that the more important identification is taking place underground, at the roots of the tree. That is—those are the causes, the things that are causing the symptoms we see. And the fact is, were we able to identify one cause and correct it, we'd solve for dozens of different symptoms. So it's a much more efficient way to proceed—to really focus in on the causes—than it is to focus in on the symptoms that are currently visible.
Bottom line, you've all heard the phrase "we need to improve root cause analysis." We need to deepen—excuse the pun—we need deeper root cause analysis than we have today to help us get to a place where we're knocking down those symptoms faster than we ever have before.
Okay, so once again, everything we're talking about so far talks about the way in which people are using the network, using the internet, and the way the datacenter supports them. We can't leave out the edge. The edge has gone global. Literally any one of us can put edge devices literally anywhere, if it can connect to the internet. And so we have these packets of different kinds of devices that we're deploying in all kinds of places, and there may be a whole bunch of sensors coming back to a concentrator that connects back across the internet to your datacenter. And as I said before, those are crappy little devices. They're very, very small. They're mass produced. They leach power from wherever they can. They tend to lose packets by the droves. They're hard to manage, but our datacenter has to take them on. And the bigger the edge gets, and the more we try to reach across that global edge, the more difficult it's going to become for us to keep track of all that. So we have to deal with the constantly expanding network edge. And for those of you Trekkers amongst us in the audience—yeah, that is a picture of the Borg-assimilated Earth. So you don't have to wonder about that anymore.
Another challenge—or two other challenges—that face us are cybersecurity and compliance. Another thing that's increasing every day out there in the wild woolly world of the internet is that there's an increase in security threats and an increase in compliance requirements. We're being victimized both by the internet itself, where bad actors, cybercriminals of all kinds, are launching new kinds of campaigns, new kinds of assaults, and we're victimized by the government, which is constantly inflicting new regulations upon us. And I'm sure, for all very good reason.
But the fact of the matter is that this is not one problem. This is two completely separate problems. For example, it is conceivable that you are fully compliant with all the regulations that your company is subject to—congratulations, that's great. Doesn't mean that you're totally secure, or even secure, very secure, secure at all. The fact is that fulfilling regulatory compliance involves making sure that people are doing what they're supposed to do. And yeah, computers are doing what they're supposed to do, and everything in your operation is doing what it's supposed to do, the way it's supposed to do it, and operating within the confines of the regulation. That really has nothing to do with security—very little to do with security. There's plenty that still has to be handled on the security side. And similarly, you may have the best imaginable security going on in your organization. It has no bearing upon whether or not you're really compliant with all your regulations.
So—security and compliance: two big challenges that you're facing, in addition to all the performance challenges and the operational challenges and so forth. And here's the one I think that's going to probably resonate with you best and probably aggravate you most. That is: red alert. Quick show of hands—how many of you have never uttered the phrase, "I'm sorry, I'm too busy putting out fires"? If you're in IT, you're putting out fires all the time. And put out enough of them, it leads to alert fatigue. You just get to the point where you're exhausted from constantly responding to all these emergencies that are happening. And alert fatigue is a real thing, and it diminishes your ability to concentrate on what's going on in front of you. And you may miss alerts, or you may misinterpret alerts, or you may misdetermine what you need to do about those alerts, and that's bad.
So alert fatigue is a challenge. If we could take a portion—and by the way, several surveys say that there are more than 50% of companies today that get more than 50,000 alerts per day, and a quarter of them get more than 100,000 alerts per day—we need to cut that down badly. And if we can get automation to do it, if we can get artificial intelligence to capture those alerts and action those alerts, we're way, way better off. Especially because—success of the expectations of our user community—and that is to say that our user community expects what we affectionately refer to as rapid detection and response. Okay.
Well, what it really means the user really expects is that before they even realize there's a problem, you've fixed it. And that is, at best, difficult. You're not this guy who can scan the world in an instant and come swooping in from space in seconds to solve a problem, but that really is what many users expect of us. And the problem is, there just aren't enough of us to do that. The fact is that the skills gap, which we first started talking about in like 2006–2007—well, here we are, and last year, I said that there are more than 4 million unfilled tech jobs around the world. More than 4 million jobs waiting for people to take them. That number is closer to 5 million now, and at least a quarter of them are here in the United States. We can't find people. We simply can't find enough well-qualified, trained people who can come in, participate in the datacenter, and help us solve all these challenges. And so even if we don't want to engage artificial intelligence, it's really all that's available to us. And even then, the artificial intelligence that's available to us now needs an upgrade. We need bigger, better processing power to keep up with it all, and that's coming too, right?
You're looking at a quantum computer. And yeah, it looks a little bit like Robby the Robot from Lost in Space, but that is a true quantum computer operating at unimaginable speeds, and we need those to drive more powerful datacenters that can support artificial intelligence and machine learning. Because we need artificial intelligence and machine learning to better support our datacenters. And we're back to that original catch-22—we need to pump up the processing power.
In closing—we've talked about artificial intelligence and we've mentioned automation, and people often ask me, "Well, which one is it? Are we going to use artificial intelligence or are you going to use automation? Which one is it?" And I don't know. I think it's becoming more and more obvious that it's artificial intelligence plus automation. The two together. Actually, it's automation driven, powered by artificial intelligence. And there's a word for that. It's, of course, another compound word, but it is today referred to as AIOps—using artificial intelligence to operate our datacenters.
Now, in the title of this session, we suggested that that could lead to hands-free operation. But we conditioned it there, and I'll condition it here to say that it's almost hands-free. Ultimately, we will get to a place where operating a datacenter is almost hands-free. A larger and larger proportion of it is being managed by the automation driven by artificial intelligence. But some of it still requires that human touch. In other words, we can look forward to seeing humans and artificial intelligence operating—ready for it—hand in hand.
Okay. Enough puns. As we gaze upon our friends here who are working together to operate our datacenters, I'm going to turn back to John and ask if we have any questions from the audience.
[Audience Q&A follows; listen to it here.]