The Schwartz Report

Blog archive

Microsoft and Google Are Taking Different Cloud-Based AI Paths

Artificial Intelligence is the hottest term in IT these days as players of all sizes talk up their latest AI projects. Google this week strutted its latest advances in AI by launching Google Home, a device similar to the Amazon Echo,  that taps the Google search engine and has its own personal assistant. The company also jumped back into the smartphone market with its new Pixel. The slick new Android phone is the company's first with the personal assistant built in. Google's latest AI-based wares come on the heels of last week's Ignite Conference in Atlanta where Microsoft CEO Satya Nadella talked up how the company has deployed field-programmable gate arrays in every node of its Azure cloud to enable it to process the tasks of a super computer, accelerating its machine-learning, AI and Intelligent Bots framework.

Just as Microsoft is building out Azure to power the AI-based capabilities it aims to deliver, the Google Cloud Platform will do the same for the search giant. Google has the largest bench of AI scientists, while Microsoft and China's Baidu search engine and cloud are a close second, said Karl Freund, a senior analyst at Moor Insights and Technology. Microsoft  last week said it has formed an AI group staffed with 5,000 engineers.

Freund explained in a blog post published by Forbes that Microsoft's stealth deployment of field programmable gate arrays in Azure over the past few years is a big deal and will likely be an approach that other large cloud providers looking to boost the machine learning capabilities of their platforms consider, if not already under way.

Microsoft Research networking expert Doug Burger, who joined Nadella on stage during the Ignite keynote, revealed the deployment of FPGAs and GPUs in every Azure node, providing what he described as "ExaScale" throughput, meaning it can run one billion operations per second. That means Azure has "10 times the AI capability of the world's largest existing supercomputer" Burger claimed, noting that "a single [FPGA] board turbo charges the server, allowing it to recognize the images significantly faster." From a network throughput standpoint, Azure can now support network speeds of up to 25Gbps, faster than anyone else has claimed to date, he said.

Freund said in an interview that he was aware of Microsoft's intense interest in FPGAs five years ago when Microsoft Research quietly described Project Catapult, outlining a five-year proposal of deploying the accelerators throughout Azure. The company first disclosed its work with FPGAs two years ago when Microsoft Research published a research paper describing its Project Catapult deployment of the fabric on 1,632 servers to accelerate the Bing search engine.

Still, it was a surprise that Microsoft actually moved forward with the deployment, Freund said. Freund also emphasized how Microsoft's choice of deploying FPGAs contrasts with how Google is building AI into its cloud using non-programmable ASICs. Google's fixed function chip is called the TPU, the tensor processing unit, based on the TensorFlow machine learning libraries and graph for processing complex mathematical calculations. Google developed TensorFlow and contributed it to the open source community. Google revealed back in May that it had started running the TPUs in its cloud more than a year ago.

The key difference between Google and Microsoft's approach to powering their respective clouds with AI-based computational and network power is that FPGAs are programmable and Google's TPUs, because they're ASIC-based, are not. "Microsoft will be able to react more readily. They can reprogram their FPGAs once a month because they're field-programmable, meaning they can change the gates without replacing the chip, whereas you can't reprogram a TPU -- you can't change the silicon," he said. Consequently, Google will have to swap out the processors in every node of its cloud, he explained.

The advantage Google has over Microsoft is that its TPUs are substantially faster -- potentially ten times faster -- than today's FPGAs, Freund said. "They're going to get more throughput," he said. "If you're the scale of Google, which is a scale few of us can even comprehend, you need a lot of throughput. You need literally hundreds or thousands or even millions of simultaneous transactions accessing these trained neural networks. So they have a total throughput performance advantage versus anyone using an FPGA. The problem is if a new algorithm comes along that allows them to double the performance, they have to change the silicon and they're going to be, you could argue, late to adopt those advances."

So who has an advantage: Microsoft with its ability to easily reprogram their FPGAs or Google using its faster TPUs? "Time will tell who has the right strategy but my intuition says they are both right and there is going to be plenty of room for both approaches, even within a given datacenter."

And speaking of the datacenter, while Microsoft didn't say outright, officials and partners acknowledged the potential to deploy them in Azure Stack hardware at some point after their release, scheduled for next summer. Indeed, many organizations with large SANs use them to boost connectivity and modern network infrastructure has them as well.

As for Amazon Web Services, the technology it uses for its EC2 cloud is a well-guarded secret. Back in June, AWS launched its Elastic Network Adapter (ENA), boosting its network speed from 10Gpbs to 20Gbps. While Amazon isn't revealing the underlying hardware model behind its ENAs, Freund said it's reasonable to presume the company's 2015 acquisition of Israeli chip maker Annapurna Labs is playing a role in boosting EC2. Annapurna was said to be developing a system-on-a-chip-based programmable ARM network adapters, he said.

Baidu already has taken a huge jump into deploying FPGAs as well, Freund said, and Microsoft's push as well lends credibility that Intel's $16.7 billion acquisition of FGPA leader Altera last year was prescient in boldly predicting that 30 percent of all servers will have FGPAs by 2020. "Intel certainly saw the writing on the wall," he said. "They realized this would be a threat to their business if they didn't have a place in it."

Posted by Jeffrey Schwartz on 10/07/2016 at 1:30 PM


comments powered by Disqus

Subscribe on YouTube