Posey's Tips & Tricks
Microsoft 'Seeing AI': Imagining a New Use for Computer Vision
A Microsoft Research project is expanding the capabilities of computer vision systems to help visually impaired users navigate everyday tasks.
When I think of computer vision, my mind naturally gravitates to the Xbox Kinect. For those who aren't familiar with the Kinect, it's a special camera that tracks your motion, allowing you to play a game without using a traditional controller. For example, if you jumped in real life, your character on the screen would also jump.
Using the Kinect was a fun way to get a bit of exercise, but it definitely wasn't the perfect game controller. It sometimes had a tendency to lose track of your position, causing your character in the game to behave erratically.
Thankfully, computer vision has really matured since the days of the Kinect. Today, computer vision is used in driverless cars and a variety of other applications. One of the drones I fly uses computer vision for obstacle-avoidance and object-tracking. Similarly, the Microsoft HoloLens uses a form of computer vision to create a 3-D spatial map of the area that the device is being used in.
The point is that computer vision seems to be finally coming of age and there are plenty of new and creative uses for the technology. Recently, however, I stumbled onto a whole new use for computer vision.
Late last year, I ordered a current-generation iPad Pro to use in an underwater cartography project I am working on (it's a long story). Since I don't normally use iPads, I needed to spend some time familiarizing myself with the device. It was then that I discovered the iPad Pro had integrated LiDAR scanning capabilities. LiDAR is another type of computer vision. It uses lasers to spatially map the surrounding area.
As I was experimenting with the device's LiDAR scanning, I stumbled onto a project from Microsoft Research called "Seeing AI" that seeks to use computer vision systems like those that are integrated into the iPad Pro (including the normal device cameras) as a tool to help the disabled.
In some ways, the idea of using computer vision to help the physically impaired isn't all that new. I heard a story several years ago about someone developing an app that would verbally read text to the user. The user needed only to point their phone's camera at the text, and the app would recognize then verbalize the text.
Microsoft's Seeing AI project is also capable of speaking text that appears in front of a device's camera, but it does a lot more. For example, it can look at a barcode on a product and tell the user what the product is. Additionally, the app is able to use facial recognition to tell the user who among their friends is in the room, and even what their emotions are.
The app also performs currency recognition. If a disabled person is paying for a purchase with cash, the app can identify the value of the bills that are being exchanged, thus helping prevent the visually impaired person from being ripped off.
The app includes several other features. I won't go into all of them, but one of the more useful ones is the ability to describe the user's surroundings. Oh, and it also has the ability to read handwritten text.
I haven't really had a chance to experiment with Seeing AI, but the demos I've seen have been truly impressive. Even so, I can't help but think of the first app I mentioned, the one that could verbalize written text. When I first heard about it, I remember thinking it had potential but the user interface might make it difficult for someone who is visually impaired to use the app without assistance. I thought it might be a better fit for someone who has difficulties reading. To the best of my knowledge, Microsoft's Seeing AI app suffers from a similar limitation, in that it requires the user to be able to navigate an app on a smartphone or a similar device.
Even so, I think the Seeing AI app holds enormous potential, and that there is probably a way to make it possible to use without having to navigate a smartphone display. Microsoft could simply create a HoloLens version of the app. This would allow someone who is visually impaired to navigate the app using voice commands.
Of course, the HoloLens is pricey and probably wouldn't be the most comfortable thing to wear all day. However, Microsoft could create a HoloLens-like device that omits the visual display. That would significantly lower both the cost and weight of the device, while also presumably improving battery life. That approach might allow someone to use the Seeing AI app without depending on a smartphone or walking around all day wearing a HoloLens.
Brien Posey is a 19-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.