Microsoft Changes Privacy Platform Name to SmartNoise
Microsoft Research has changed the name of its "differential privacy" platform from "WhiteNoise" to "SmartNoise," according to a Wednesday announcement.
Differential privacy is a term that comes from cryptography, where statistical noise is introduced using mathematical expressions to de-identify data. Microsoft's newly renamed SmartNoise platform for implementing differential privacy on data queries can be found at this landing page.
Microsoft built its SmartNoise platform in collaboration with researchers at Harvard's Institute for Quantitative Social Science and School of Engineering and Applied Sciences. Microsoft worked with the Harvard-led OpenDP Initiative, which developed open source tools to encourage data sharing while protecting privacy. It's done by introducing distortion (sometimes called "white noise") in returned query results, which is conceived as protecting the privacy of individuals.
In May, Microsoft had introduced its WhiteNoise toolkit, developed with the help of the Harvard OpenDP Initiative researchers, which was put on the GitHub code repository. In June, an OpenDP blog post had indicated that the Harvard group and Microsoft were "actively in the process of finding a new name for the differential privacy system that we have been developing together." The idea was to avoid prejudices sparked by certain words, such as "whitelist" (good) versus "blacklist" (bad), the Harvard post explained.
SmartNoise was the end result of that renaming effort.
Microsoft's Other Efforts
Microsoft collects a lot of data from its solutions, which apparently sparked its differential privacy efforts. The announcement recounted privacy efforts associated with the Windows operating system, as well as the Workplace Analytics service.
Microsoft added noise to its Windows "telemetry" data collection process, the announcement explained. Telemetry, of late, has been sort of a bad word at Microsoft. About three years ago, a Dutch privacy organization found Windows 10 telemetry collection to be infringing on Dutch privacy laws.
Microsoft also is using differential privacy protections in its Insights for People Manager, a feature in the Workplace Analytics service, as explained in this document. The Workplace Analytics service is "a collaboration analysis tool that can be used to analyze behavior and network patterns," per a Microsoft FAQ document description. Typically, Human Resources departments use Workplace Analytics to track worker productivity or proclivities, but Microsoft claims that no one gets the raw data, and supposedly it's not possible to map the collected information and identify individuals, per the FAQ.
One concept Microsoft is considering for not providing access to the raw data is to use so-called "synthetic data." The announcement explained that "the idea behind synthetic data is that it preserves all the key attributes of a dataset but doesn't contain any actual private data." However, synthetic data just applies "if we know the specific task or question the algorithm needs to understand."
Microsoft sees the use of synthetic data as promising because "anyone could access and use it without privacy ramifications," but details on how to implement it are still getting ironed out.
Differential Privacy for Everyone
The differential privacy approach used by Microsoft is explained in a rather upbeat document called "Differential Privacy for Everyone" (PDF download). In essence, an algorithm is used to introduce distortion in a query response. The "decision" to add that distortion happens automatically via a "privacy guard" solution.
Here's the explanation:
Roughly speaking, DP works by inserting an intermediary piece of software between the analyst and the database. The analyst never gets to access or actually see the contents of the database; instead the intermediary acts as a privacy-protecting screen or filter, effectively serving as a privacy guard. The guard takes the questions (queries) that the analyst wishes to ask of the database and evaluates the combined privacy implications of that question and those that have preceded it.
It's this privacy guard that will introduce distortion in a returned query result, based on the nature of the questions asked. This elaborate process is done because researchers found that it's possible to use pieces of information, such as "birth date, zip code and gender," from one database and compare it with another database to identify people.
The "Differential Privacy for Everyone" document offered an example of the kind of distortion introduced by the privacy guard. For instance, it might return a fraction for the number of people rather than a whole number in query results. Of course, researchers could infer that a fractional result represents imposed distortion and is close to the number one. They would then be able to guess that the result was "1," or one person.
So the example doesn't seem like it's much of a safeguard.
Kurt Mackie is senior news producer for 1105 Media's Converge360 group.