Microsoft's First, Best Customer: A Q&A with Microsoft CIO Ron Markezich

More than 300,000 devices. About 10,000 servers, 2,000 IT staff and the same number of contractors and vendors. The most attacked network in the world. Meet the man responsible for keeping it all going.

As CIO for Microsoft IT, Ron Markezich's role is a lot like the top IT management job at nearly any organization: He worries about security threats, reducing costs and driving up SLAs. But as the head of a 2,000-person IT department, with 10,000 servers on a 300,000-device network at the world's largest software company, it's also very different.

Markezich started working with Microsoft in 1995 while he was employed by Anderson Consulting (now Accenture), and joined Microsoft's IT staff in November 1998. Once inside Microsoft, he started out handling the IT side of Microsoft's finances. From there, he moved up to running Microsoft's IT infrastructure for two years. About a year ago, he was tapped as chief information officer.

In a wide-ranging interview with Redmond magazine News Editor Scott Bekker, Markezich talks about how he approaches his unique role, the cultural issues he faces at Microsoft, what he's looking forward to in the Longhorn wave of products (getting rid of passwords, for one thing), why he thinks virtualization's potential is overblown, and his big plan for grid computing.

Redmond: As CIO of Microsoft, what do you think are some of the main ways that your job at Microsoft is similar to other CIOs, and how is it very different?
I think of my job as three jobs in one. One is a typical CIO job. I talk to small businesses; I talk to companies, like GE, that are larger than us; I talk with governments. The issues I deal with are exactly the same—security threats, proving the value of IT, dealing with reduced costs, keeping infrastructure at a high availability. It's very similar.

The two jobs of mine that aren't very similar are my customer role and working with the product teams. About one-third of my time is spent externally talking to customers about how we use the technology. A lot of CIOs do that—I probably do that more because we're running the stuff that we sell.

The other third is working with the product teams. The most important part of my job is making our products better. So I meet with BillG [Microsoft Chairman and Chief Software Architect Bill Gates] regularly and meet with the product group executives regularly. I really look at our organization, the IT organization, as an extension of the product teams. So they're constantly running that test product before it's released, and getting feedback from us on how to make it better.

One of the things we do is sign off on all products before they're released. So, we have to run them, we have to show the capabilities.

But divided in those three ways, the first one is very similar to any other CIO.

One of the things with this job that I love, that's also one of the challenges, is that there are very few jobs in Microsoft so broad in terms of the technologies. Bill Gates' is, obviously. But this is one of those [positions] where you have to cover business intelligence to networking in the same job.

Describe Microsoft's overall IT operations.
We have about 58,000 employees now, and I have about 90,000 users [including] contractors and vendors. Of that population, one of the big advantages I have is that everyone has a computer and network access. By default we give everyone e-mail. That means I can push a lot out via technology.

As you would expect, Microsoft IT is the ultimate Microsoft shop. There are no non-Windows PCs outside of Microsoft's competitive labs and every production database runs SQL Server, Markezich says.

If you look at the IT organization, we support across our network about 300,000 devices. Most of those are PCs and servers with some routers and network devices, but primarily PCs or servers. We have about 280,000 PCs and servers. They are Windows PCs—we don't have any non-Windows machines.

I have about 10,000 servers that I manage [from] our data centers. Those are primarily production data-center servers, infrastructure servers and lab servers. The rest of those are employee machines, client machines or other lab servers.

Employee-wise, I have about 2,000 [IT] employees and about another 2,000 contractors and vendors.

We do most of our work in Redmond. Over the years we've consolidated a lot of our infrastructure and our activities around the world. We also have a team in India, so those are the two hubs. We have an operations center in Dublin, in Singapore and Reno [Nevada], so those centers have some small IT staff, not very large.

Describe your hiring process.
New hires mostly come from outside. Actually, IT is a feeding ground for our product teams. I very rarely get a product person to come into IT, but I quite often send IT people into the product groups. It's painful for me, but it's by design; the skills they build in IT—the operational aspects and manageability aspects—are great skills for our product development teams. My folks are in touch with the product teams. The person who runs messaging goes to Dave Thompson's staff meetings. [Thompson is corporate vice president of the Exchange Server Product Group.] That's attractive to a lot of people coming from the outside.

Redmond is Microsoft's main IT hub, with 6,000 of the company's 10,000 production servers there. Microsoft has another IT hub in India and data centers in Reno, Nevada, where all U.S. licensing is handled; Ireland; and Singapore. The company also has a business continuance data center in Silicon Valley. Explains Markezich, "People are going to say 'Silicon Valley is a funny place to have a business continuance center.' But when we bought Hotmail it came with that facility and they already had a data center in Silicon Valley. The chance of having earthquakes in Seattle and that center at the same time are very small. And bandwidth is actually very inexpensive between the sites—you can replicate."

I also compete for talent with the product teams. I get most of my folks from two places. One is off-campus. We have a very large intern program, and the interns will come in for a six-month internship before they graduate. We have a very high success rate of hiring those people back after they graduate—we've never had an intern turn down an offer.

The other big category is contractors; I have about 2,000 contractors. Quite often those contractors and vendors will choose to come work at Microsoft.

How much do you have to worry about servers that are used by Microsoft developers in the product teams?
It depends how you define 'worry.' When it comes to patch compliance, I worry about them a lot. I need to ensure that they're patched, and they're not providing a potential vulnerability to my network, because they all hang on the network—all 300,000 devices are on that network. Outside of that, I don't do that much. We scan every machine in the environment at least once a day to make sure they're adhering to certain security policies. There's a cultural issue around consolidation of computing power, especially in our product labs. Developers like to see those computers when they come in, and see them when they go home at the end of the day. What we're doing now is consolidating those into our off-site data centers where we can have a high-availability environment and lower rent.

The largest servers Microsoft uses in production are 32-processor Unisys systems. The only 64-processor HP Superdomes at Microsoft are for product testing and benchmarking. In production, Microsoft doesn't have any really CPU-intensive functions, given that partners handle roughly 80 percent of day-to-day transactions and feed data to Microsoft monthly.

As we move more and more to those environments, we're going to start managing those more and more like data-center servers.

One of the things we've done in the last few years is consolidate quite a few servers, especially on the infrastructure side. That 10,000 number, most of it is application servers. We've reduced our infrastructure servers—Exchange servers, Active Directory domain controllers, WINS, DHCP—by about one-third. When I talk about consolidation of the infrastructure outside of Redmond, that's really what we did a lot of.

[Ed. Note—This next section makes frequent reference to Longhorn—a wave of products that includes the desktop OS, a server OS, Office 12 and other products. Microsoft officially named the desktop OS component of the Longhorn wave "Windows Vista" after this interview took place.]

What features are you hoping to take advantage of in Longhorn?
Most of our focus right now is on how we're going to take advantage of Longhorn in our environment.

[A big area is] Network Access Protection. From a security standpoint, right now, we treat internal access different than external access. So if you, as an employee or vendor, want to get into Microsoft and you're outside a Microsoft building, we force you to use a smart card. We put you in a quarantine state. We scan your machine before you get an IP address.

What Markezich considers his mission-critical systems at Microsoft (in order):
1) E-mail: "Microsoft e-mail is like oxygen. We're geographically distributed and culturally dependent on it."
2) Core infrastructure: "We're doing a lot of development around the world, a lot of product support around the world."
3) Customer support systems.

With Longhorn, we'll switch that. We will treat internal users the same as external users. To get into the corporate network, you'll have to use a smart card and we'll put you into a quarantine state using Network Access Protection before we let you in.

That's a fundamental shift for us. I think it's a bad assumption that bad guys are [only] on the outside. There will also be bad people on the inside who want to do something. We'll keep building Longhorn to help us with that.

Are you planning to get rid of domain credentials at Microsoft in the Longhorn timeframe?
Everyone who requires RAS [Remote Access Services] access has a smart card. We have RAS credentials to log on. With Longhorn we'll actually get rid of domain credentials. So you'll have your smart card and your PIN. You won't have your domain credentials. The reason is that domain credentials add risk because someone can harvest those or steal them—people aren't always that good with their passwords. And if they gain access to a building that doesn't require smart cards, then they gain access to the network. Requiring smart cards everywhere will also eliminate the need for domain credentials. [It will require] something you have and something you know to gain access.

How is Microsoft IT taking advantage of virtualization?
Virtualization is obviously continuing to grow, and people say virtualization benefits are good. But I'll tell you, they're not as significant as what we could do to help decrease labor costs or complexity. By consolidating and centrally managing infrastructure, we've taken our infrastructure costs down significantly. Virtualization decreases your costs of your existing infrastructure. But we've taken out so much of our infrastructure spending [already through consolidation and central management that the opportunity to save additional money on infrastructure through virtualization is not large]. On the app side, about 95 percent of spending is labor [developers, consultants and administrators]. Even though there's a lot of talk about virtualization, and we'll drive that as well, it's not going to make me really drive that much improvement in the organization because I need a way to keep adding new business benefits through efficiencies in my app development and new ways to deliver solutions.

How much has Microsoft reduced its infrastructure spending through consolidation and centralized management?
My total IT investment, two-thirds of it's on apps and one-third's on infrastructure. Just three years ago we were 50/50—50 percent infrastructure, 50 percent apps. Five years ago we were three-quarters infrastructure, one- quarter apps. We saved that money and invested in apps that meet new business needs or provide new functions.

Microsoft's corporate standards for data-center servers are Dell and Hewlett-Packard.

Any plans to use grid computing within Microsoft?
Yes. Our big goal with grid computing that would probably provide the most benefit is on the product side, from using the computing power we have across the company to do our builds, especially because the build process that we have is fairly computing-intensive. One of the things we're looking at, and we're working with Bill Gates' technical assistant on this, is how to use all that environment—all the 300,000 machines in the company—to help that build process. We don't have that much, other than builds and the product development functions, that are huge CPU-intensive activities.

You've been in this job about a year. What are you proudest of? A project, a process or anything?
What I'm proudest of is the influence we have on the product teams. We sit down with Bill at least once a quarter [or as often as twice in two weeks] to help influence what Longhorn looks like, to help influence the telephony strategy, to help influence our business intelligence strategy. When I will feel good is when those ideas, the input that we had with Bill over the last year, make it into the product and customers start benefiting from those things.

More Information

More of our Conversation with Ron Markezich

Redmond: What third-party software do you use in Microsoft IT?

Markezich: We use quite a few third-party tools actually. Where Microsoft doesn't have a product, we'll fill that gap with a third-party tool. We run our entire ERP system on SAP -- we have one instance, one database server.
We use Siebel for our core CRM system, and we also use Microsoft CRM. We have a CRM product for mid-size companies. We use that inside Microsoft on a departmental basis. When we look at our enterprise CRM system, because MSCRM isn't targeted at a 90,000-person enterprise, we use Siebel for that.
We also use Clarify for product support, so if you were to call with a Microsoft issue, your incident is tracked with Clarify. We use Siebel for internal support. So internal users who call are tracked with Siebel. We also have some third-party products in our treasury system. Our goal is to get as much running on core Microsoft enterprise products as possible.

Is there anything unusual that Microsoft IT is doing with Visual Studio 2005 in advance of its release?

One of the big challenges is the governance compliance for any company -- Sarbanes-Oxley, HIPAA. What we've had a role in influencing is how does our product help you in that compliance area? We can actually use the Visual Studio Team System now to embed our software development lifecycle (SDLC) controls within that product. So we have the different templates and control checks that are required for SDLC embedded within the Visual Studio Team System so it ensures the developers are complying with that.
The big internal control is making sure all your apps adhere to the same software development lifecycle. Your developers move to another team, you still have this application that you're running. The person who comes in to support this application needs to ensure that there's some common standard that was developed at the time so they can support that. And then you'll also understand if there's a financial aspect to this application, if there is built-in control check at this point sending a feed to this location.

You mentioned that Exchange is a mission-critical application for Microsoft. How do you guarantee high availability for your Exchange servers?

For e-mail, we will cluster our SANS. We'll cluster to drive up the availability. We also consolidated a lot of sites with Exchange 2003. We were at about 75 sites worldwide. I think we're at five now. We will get down to three sites worldwide. It's hard to have high availability at 75 sites because you can't have a clustered SAN at every one. Sometimes it's counterintuitive. We actually looked to consolidate and then wrap in the high-availability infrastructure. With e-mail, we actually don't have geographical diversity. If an e-mail server went down in Redmond, we'd have downtime with this user.

With some applications, like product activation, [you can't have downtime]. You go to activate Windows -- that better work. We have geographic diversity for product activation, so if Redmond went down for some reason you'd fail over to Dublin.
We cluster our environments where possible, we have geographic diversity where necessary. We also have a business continuance site so our 30 most mission critical applications are all replicated out to Silicon Valley. So if we had downtime in Redmond, we'd fail over to Silicon Valley in a certain period of time.


comments powered by Disqus

Subscribe on YouTube