In-Depth

What Does Microsoft Know About You?

We take a look at all the various sources of data Microsoft collects from customers, how it stores and uses that data, and how its use of it stacks up against Google and other competitors.

Just about every software vendor or Web service collects information about its users. Some do it with more subtlety than others, but the fact is that there's hardly an application or Web site that doesn't gather some sort of intelligence about you every time you use it. Microsoft, of course, is no exception. From Windows Activation Technologies (WAT) to Bing, Microsoft stockpiles information on you even when you don't sign up for services such as Hotmail. But what does Microsoft know about you?

Actually, a more appropriate question might be: What kind of information about you can Microsoft see? It could see a lot, but there are some things that the company chooses not to view or store. For example, the unpopular WAT, formerly known as Windows Genuine Advantage, is perhaps the least intrusive of the Microsoft information-gathering tools. Bing and Hotmail get a little more personal, but experts and IT professionals say that they're less worried about Microsoft regarding privacy than they are about some other high-profile vendors.

What Microsoft Knows
Microsoft starts collecting information on you and your system within minutes of you starting up a brand-new system. We asked Brendon Lynch, senior director of privacy strategy at Microsoft, to help us compile a step-by-step explanation of what Microsoft knows and when it knows it.

The flow begins when you first start your system, log on to Windows and go through the WAT validation process.

"WAT doesn't store your name, address, e-mail address or any other information that Microsoft can use to identify you or your computer," Lynch says. "The tools will collect information used for aggregate reporting, which include computer make and model; version information for the operating system and software; region and language settings; a unique number assigned to your computer by the tools; [hashed] product key and product ID; BIOS name, revision number and revision date; and [hashed] hard drive volume serial number.

"When a system is identified as non-genuine," Lynch continues, "additional information may be sent to Microsoft to better understand why your system failed validation. This information can include error codes and the names of file paths that compromise the integrity of your system."

Make the choice to use Microsoft Online Services, and the transfer of data to Redmond continues. Microsoft Online Services go a little deeper than WAT. Here, for instance, is what Microsoft collects on a random user who searches via Bing: "When Microsoft receives a Bing search query, we collect a number of pieces of information, including the search query provided, IP address, unique identifiers contained in cookies, browser configuration and the time and date of the search," says Lynch, rattling off a list of information that's standard collection fare for search providers.

But, Lynch adds, while Microsoft might know a lot about your browsing habits, it doesn't really know -- or want to know -- that much about you. "As part of our privacy safeguards, search terms are stored separately from account information that could personally and directly identify an individual, such as e-mail address or phone number," he says. "This helps protect against unauthorized correlation of these details. In early 2010, we announced that we'd enhance our existing search data-retention processes by deleting the entire IP address from search queries after six months. This step provides even stronger privacy protections for Bing users."

Lynch also says that Hotmail follows a similar policy. "It's important to note that when demographic information is used for advertising purposes, we use a technical method known as a one-way cryptographic hash to ensure that personal information, such as name or a user's personal e-mail address, isn't used to select which ads a [Windows Live] user receives," he says. "Our advertising platform selects ads based only on data that doesn't personally and directly identify the individual. As a matter of policy, Microsoft takes steps to separate any information that can be used to personally and directly identify a user from the information in its ad-selection system."

Run into your first application crash, and you've got the option to send more information to Microsoft. One of the biggest concerns users have about Microsoft and privacy is about error reporting. The Microsoft privacy policy on error reporting spells out very clearly what kind of information the company collects when a user reports that an application has crashed. According to the policy, error reporting involves collecting information on:

  • Where the problem happened in the software or hardware
  • Type or severity of the problem, if known
  • Files that help describe the problem (typically system- or report-generated files about software behavior before or after the problem occurred)
  • Basic software and hardware information (such as OS version and language, device models and manufacturers, or memory and hard disk size)

The error-reporting process also collects your IP address, but the Microsoft policy says that the company doesn't use IP addresses to identify or contact users. Microsoft says that users who are concerned about the privacy of error reporting simply shouldn't submit reports at all.

The policy states: "Reports might unintentionally contain personal information, but this information is not used to identify you or contact you. For example, a report that contains a snapshot of memory might include your name, part of a document you were working on or data that you recently submitted to a Web site. If you are concerned that a report might contain personal or confidential information, you should not send the report."

All of that adds up to a solid data-retention policy that respects privacy, especially in comparison to the data collection that some other vendors do, says Marius Oiaga, editor in chief of Softpedia, a software Web site based in Romania. In fact, he says, the average computer user would do well to follow a privacy policy as strict as that of Microsoft.

"Microsoft certainly does a better job at protecting information it collects compared to some users that share even the most intimate details of their life on social networks, or with customers that hand over their username and password after receiving a message claiming their e-mail accounts will shut down unless they comply," Oiaga explains.

Useful Information
So, Microsoft could know a lot about you, but it chooses not to -- or at least not to associate specific information with individuals. But what does Microsoft do with all of the information it collects? Good things, mostly, IT pros and experts say. The Microsoft error-reporting process, driven by data collected from users, has led to more stable versions of Windows and other improvements in recent years, says Dave Nickason, an IT pro at a law firm in Western New York. Microsoft also tracks information such as how customers use drop-down menus as opposed to buttons in Word, or how people navigate in Excel. The company takes that information into consideration when updating and developing new applications.

"I've seen the results of the customer-experience stuff," Nickason, a Microsoft Small Business Server MVP for more than six years, says. "All they're trying to do is make a better product, and I really believe we've seen that a lot. The computer hangs less often. I don't ever get blue screen from one year to the next. I think error reporting is behind a lot of that."

Of course, Microsoft data collection isn't all about improving interfaces and stabilizing products. WAT, for instance, is an anti-privacy measure in large part, and one that has been the subject of quite a bit of criticism since Microsoft introduced the first version of it more than five years ago. Still, IT pros don't report having too much of a problem with it.

In fact, Redmond readers generally defend Microsoft and its right to protect its intellectual property from piracy. "Microsoft has every right to protect all products and software it develops," says one reader, who prefers to remain anonymous. "I've purchased legit Windows XP, Vista and Windows 7, and I know Microsoft wants me to prove it somehow through Windows updates. No problem."

A Matter of Policy
One common complaint about WAT, the purpose of which is to ensure that users are running licensed software, is that it "phones home" to Redmond with user information. As Lynch notes, WAT does phone home -- and it tells the user that it's going to do so. A lot of other vendors' wares do something similar.

With many vendors' privacy policies being long, complex and loaded with legalese, even many savvy users don't bother to read them before agreeing to them. In fact, we rarely, if ever, lack the ability to control which bits of information we send where. Many users just choose not to actively control the dissemination of their info.

It's not hard to see why. For one thing, with most big vendors sending information back to home base in one way or another, it's hard to use a computer or a major Web site without agreeing to some sort of (generally not too negotiable) privacy policy. Those policies aren't always simple and straightforward. In fact, they almost never are.

For instance, the Google Inc. terms-of-service agreement runs more than 4,000 words, and only about 50 of the words in it concern privacy. The section on privacy directs the user to a different link entirely, one that goes to the Google "Privacy Center" page, which contains a separate, 2,100-word privacy policy that even manages to feature links of its own to other, separate privacy documents.

The Microsoft Online Privacy Statement also runs more than 4,000 words and offers a host of links to other Web pages. One leads to the Bing Privacy Supplement, a separate document that's a subsection of the Online Privacy Statement, which checks in at a relatively paltry 800 words or so. Added up, and tossed out by vendor after vendor, these policies end up being far too dense for even most techies to digest.

"Practically nobody reads the stuff," says privacy expert Lauren Weinstein, co-founder of People for Internet Responsibility. "If you try, your eyes glaze over; your brain freezes; your hands shake."

Plus, Weinstein says, how well many privacy policies and license agreements would stand a legal test is questionable. "It's not even clear how much of this stuff is enforceable," he says. "The whole issue of 'click here and accept' is still the kind of thing that has all kinds of interesting legal wrinkles to it. Most of them are ephemeral, these agreements. From the standpoint of the average computer user, they don't exist. Even to the extent that those agreements spell out what's being done, it's not clear to me that they really represent consent."

Nevertheless, privacy policies do exist, and for the intrepid, they do include specifics. It's in those specifics where Microsoft actually starts to look like one of the most privacy-conscious vendors in the industry.

"User data collection is a wide practice in the IT industry and not limited to Microsoft in any way," Oiaga says. "Google does it, and so do Yahoo!, Apple, Novell ... almost everybody. [Microsoft] is extremely transparent as far as user data collection goes, more so than others."

For the most part, IT professionals don't seem overly worried about the data-collection habits of Microsoft. Most of the readers who contacted Redmond on this topic show little or no concern over how much and what kind of information Microsoft is collecting on them and their companies. For many IT pros, other matters take priority.

"I think we're less concerned with privacy and more concerned with security," says Jeff Cornish, senior systems architect at Teracai Corp. in Syracuse, N.Y. "It doesn't matter to me whether people are gathering statistics and using that in a larger demographic."

Exposure of company information is even less of a concern than a lack of personal privacy, says Nickason, the Western New York IT pro. "[Microsoft doesn't] ask for any of that stuff," he says. "They're looking for our feedback in more general terms. They're not looking for any kind of data other than how big your Exchange database is."

Besides, Nickason reckons, even if Microsoft were hoarding critical user information, it would be unlikely that anybody in Redmond would have the time or inclination to dig through it. "People think that [Microsoft is] looking at their information," Nickason opines. "Who do they think is poring through 90 million people's information? What would they be looking at?"

Fear and Googling in IT
If there's a company that strikes a privacy nerve with IT pros, it's Google: Cornish's older brother sequestered himself in the woods in order to maintain privacy. While Cornish says that he doesn't plan to take such drastic measures, he does wonder exactly what Google has on him and what the company might someday do with that information.

"[Google has] some fantastic magical algorithms they deploy to target you," Cornish says of Gmail. "The technology is there, and there's not much you can do about it unless you take the path of my older brother. Your name and/or your IP address [could be] collected and shipped off to spam companies. I think that's kind of malicious. That's what concerns me."

Nickason takes things further: "Google has your Gmail; they've got your searches; they've got Google Docs, and they're monetizing that by selling search," he says. "What happens if a health-insurance industry group goes and starts seeking information from Google? You can see a slope where Google's interest is in having all that data and monetizing it. Where are they going to sell that information next? If they want to sell the fact that I searched for information on diabetes, there's nothing stopping them from doing that. I'm not saying that they're doing it or that they would do it, but they have that information, and they don't seem overly predisposed to getting rid of it."

Weinstein says that such a scenario is not possible under the current Google policy. "If you read the Google privacy policies, it's very clear that something like that just doesn't happen," he says. "The ramifications of [inappropriately selling user data] are so extraordinary for either [Microsoft or Google] that neither would risk it even if they wanted to."

For the record, the Google privacy policy defines "sensitive information" thusly: "'Sensitive personal information' includes information we know to be related to confidential medical information, racial or ethnic origins, political or religious beliefs or sexuality and tied to personal information," the policy says.

The company's privacy policy lays out specific use cases for sensitive information: "We will not collect or use sensitive information for purposes other than those described in this Privacy Policy and/or in the supplementary service privacy notices, unless we have obtained your prior consent," the policy states online.

How, then, does the policy address the usage of user data? First of all, Google search gets more personal than Bing does. While Google "anonymizes" IP addresses from search terms and deletes the last quarter of each address after nine months, Microsoft Bing deletes the addresses altogether after six. Google also does not explicitly state -- as Microsoft does with Bing -- that it doesn't correlate personal information with search terms or that it doesn't use personal information in determining which ads it feeds to users.

Still, the Google policy regarding sharing of information with third parties does appear to validate Weinstein's assertion that the company won't sell search terms about diabetes to the health-care industry. His assertion that doing so would land Google in a world of hurt from several camps is very likely accurate as well.

The Google Response
Google officials also confirm Weinstein's notions. Alma Whitten, lead privacy engineer for Google, says Nickason's diabetes scenario would simply never happen: "Even if we had information in the way that he's picturing it, we don't share [it]," Whitten says. "That's not part of the picture."

Whitten adds that Google doesn't make money by selling information; it drives revenue from advertising. "One of the things that we're finding is that a lot of people are actually confused about how Google makes its money," Whitten says. "People are thinking of older industries that work with data. Selling data -- that's not our model at all.

"One of the points that I think people don't realize is that anyone can come and act with Google systems as an advertiser," she continues. "Go into the part of your account that says, 'I'd like to buy an ad,' and you can see how the system works. It's not the behind the scenes. Being in the position of the matchmaker where the advertiser pays you to show your ad to an appropriate person -- we like that model."

So, what kind of information does Google collect on a random searcher? According to Whitten, it records an IP address (which could be that of a laptop or a cable modem that's hooked up to a wireless device and supports multiple laptops), the date and time, the search query itself and the browser's characteristics. Google also installs a cookie.

"[Searches] are not processed with any kind of profile that's aimed at identifying the individual behind it," Whitten says. She adds that Google, like Microsoft, relies on user information to improve its product. For instance, Google might record how far down a user scrolled before clicking on a link -- say, the fifth link rather than the first or second. The idea is to take that information and make Google search faster and easier to use, Whitten says. "Those sequences are really rich and powerful for us," she explains.

Google, however, can't seem to escape controversy regarding privacy. The company recently came under fire from privacy chiefs in 10 countries -- including the United Kingdom, France, Germany and Canada -- for its handling of the ill-fated launch of the Google Buzz social-networking application, as well as for the controversial Google Street View tool. Google has gone to great lengths to make amends for the Buzz fiasco, but it nevertheless continues to make headlines that drive privacy concerns. News broke in late April that Google was recording locations of Wi-Fi hotspots and collecting users' MAC addresses.

Whitten says the company is making efforts not only to comply with regulatory bodies but also to be more transparent about how its products work. "We try to be really forthcoming to the general public and to regulatory bodies," she says. "In so much of this, what we're working on is trying to provide more information not just to say [what] we don't do but to explain what it is that we do. The short explanation of how we learn to do search quality from the log data: It took us a bunch of work to figure out how to do that well. It was obvious to us but that didn't mean the whole world, that it was obvious to them."

And Google doesn't always come off as a bad guy; sometimes, Microsoft does. Google recently pulled out of China after the government there refused to address its concerns regarding the privacy of searches. Microsoft still does business in the country with Bing.

"[Google was] never happy about being involved in that kind of Chinese censorship regime," Weinstein says. "Microsoft is happy to participate. There are clearly discernable differences in the way each of them has shaped its trajectories."

Weinstein also notes that Google resisted demands for search information from the U.S. government. "When the government did try to get at a lot of the Google data, Google resisted. They said no," he says. "Microsoft got itself into a lot of trouble by using tactics that were eventually found to be inappropriate," Weinstein adds, referring to the software giant's antitrust problems. "Google has gotten big by providing services people like."

Subject to Change
Whether they're ironclad or not, there's one major snafu with most vendors' privacy policies, including those of Microsoft and Google: Like most corporate policies, they could change at any time, and the only notice the companies would provide would be a re-post of their (already largely ignored) privacy policies or possibly an e-mail to registered users featuring some slippery legal language.

Given that, Weinstein says he worries more about Microsoft than he does about Google, as Microsoft has access to so many facets of a user's computing operation. (That distinction could disappear soon, however, with the many Google initiatives to increase its footprint through the Chrome browser and OS.) Already, he says, Redmond is moving toward making options that reduce privacy in the default options for software installations. He recently upgraded a Windows XP laptop to Windows 7 and made some observations.

"One of the things I was noting to myself was that during the install procedure, when you get to the point where it's pushing you on those update settings, it tries really hard to push you toward 'sharing' choices," Weinstein says. "If you look at the recommended settings, one of the things it includes is turning on the customer-experience module, which results in ongoing statistical reporting of your activities. Google has gone the other way with their options. They're defaulting those [to] 'off' generally now. There's a check box, but it tends to be unchecked. Microsoft seems to be going toward it's already checked. This is one of the areas where there seems to be some divergence between Microsoft and Google."

With both Microsoft and Google, as with most big vendors, there's generally not an opt-out option when it comes to information sharing. Sending some information back to the vendor is, most of the time, part and parcel of using the vendor's products. Google, for instance, says in its privacy policy that some of its services won't work if users set their browsers to refuse cookies.

Oiaga makes a point about Windows: "It's perfectly legal to run the OS on a computer that's not connected to the Internet," he says. "In this regard, no information is harvested from the machine directly, as there's no pipeline to Microsoft servers. However, customers are still required to activate their product, and they can do so over the phone. Otherwise, there's no data exchange between the computer and Microsoft servers. But Windows customers have to agree to the EULA [end-user licensing agreement] ahead of getting to use their copy of the OS, and the agreement clearly stipulates that the platform will send information to Microsoft via the Internet. If they don't agree with the EULA, they can't run the product."

Nevertheless, both Microsoft and Google do have to consider pressure from users, companies, partners, advocacy groups and governments with regard to privacy. It's unlikely, then, that either company will make the kind of rash move that would lead to a privacy nightmare for users.

But users and IT pros would be well served to start paying more attention to those terms-of-service agreements and privacy policies that most of us gleefully skip through. Anything could happen, and users should know what's happening and be as prepared as possible for just about any eventuality.

"How many people do you think agreed to the Windows 7 EULA without reading the privacy portion?" asks Oiaga. "I'm willing to bet that was the vast majority. Truth be told, customers everywhere should exercise more control over the way their personal data ends up on the Internet."

The mystery of data collection is what's so troublesome, but it doesn't have to be. The fact is that we volunteer data on the Internet all the time. Hundreds of millions of people have freely given their personal information -- much of it intimate -- to Facebook, which has itself struggled with privacy issues.

But when users don't know for sure -- or aren't willing to spend hours reading about -- what vendors and content providers know about us, they get nervous. And, in general, that's the case with software vendors, particularly Microsoft. Vendors really should make their privacy policies clearer and more readily available. They should list in a simple format the information they collect on users.

Still, the primary message in this story is that Microsoft is not out to get us. It's not collecting secret dossiers containing our most personal information or gathering data on us with the intent of selling it and ruining our lives. (For that matter, Google is not doing that, either, despite the considerable fear lots of people seem to have of the search giant.)

No, Microsoft is mainly collecting innocuous info that, in the long run, helps it make better products. Microsoft is experienced at gathering user data properly and has learned in the last 30 years or so how to collect user information without invading privacy -- for the most part. Experts say that it's one of the industry's most conscientious companies with regard to data collection.

Google, on the other hand, is a different animal. It's young. It experiments. It takes risks that a company like Microsoft wouldn't take. Google will admit to its mistakes. The company issued public mea culpas after its Buzz fiasco and after the discovery that Street View trucks had collected users' MAC addresses -- a frankly egregious breach of privacy.

Google isn't evil, but it does make mistakes regarding privacy, and those mistakes are amplified given its ubiquity and power to collect information. The company needs to be more careful, especially with Street View and some of its more ambitious initiatives.

As we've pointed out in these pages, Microsoft has a lot of data on you, but has strong policies to keep the information in silos and anonymous. The key to protecting privacy is to be vigilant but not paranoid -- to keep vendors honest but also to let them do their work. That, it seems, is a wise course of action for any consumer regarding just about any kind of product.

Featured

comments powered by Disqus

Subscribe on YouTube