While achievements in the field of computer vision are often so industry specific that progress is hard to see, the developments in the last few weeks have been unequivocally significant: Facebook launched “Moments,” which recognizes which of your friends are in the photos you take; Google released a new photo app with automatic tagging abilities that allow a user to search for words like “sunset,” “dog,” or the color “blue,” without ever manually tagging a picture; GumGum, a company that uses computer vision to identify the content of an image and then places ads to match that image, raised $26 million in venture capital; and the Merlin Bird Photo ID tool, which allows users to identify hundreds of species of birds by uploading an image, finally launched, triggering the excited tweets of ornithologists nationwide.

But what exactly is computer vision? Simply stated, computer vision is a type of artificial intelligence (AI) in which researchers are teaching computers to see like humans do, with context and analysis, and with the ability to understand 3-D geometry, relations, emotions, and actions, capabilities that they have never had in the past.

portraits-frame (1)While a 3-year old can describe, with great accuracy, what’s going on in a picture, our most powerful computers often fall short.  For instance, computers have a hard time identifying these distorted faces, and self-driving cars are hampered by the inability to distinguish a crumpled paper bag on the road from a rock, or at least they have been until recently, as Stanford’s Fei-fei Li, explained in a Ted talk.

Although computers tend to fall short (for now!), visual processing largely determines how we, as human beings, experience the world.

Visual processing demands the use of a full 20 percent of our brains, and it has been scientifically proven that we react to and recall visual information best–simply hearing something leads to 10 percent recall, but adding an image to that information increases recall by 55 percent.

So it shouldn’t be surprising that a tweet with an image gets an 18 percent higher click-through-rate than an imageless post on social media, or that Facebook announced in 2013 that images account for over half the content in an average newsfeed. Or that, globally, video posts have increased by 75 percent since last year.


The rise of visual data on a web where brands are crying for attention comes as no surprise, but there’s a problem:

There’s this treasure trove of data these companies are sitting on in the form of the visual web. That data for the most part is uncategorized. It’s a black box.

Explained Justin Fuisz, CEO of Fuisz Media, a company that uses image recognition to attach branded calls to action to objects in videos, in an interview to with AdAge in 2014.

Traditionally, companies have derived information about images by tapping metadata, captions and associated article text, but the lion share of photo and video content are not properly captioned and even if they were, the information might still not be so helpful, according to GumGum CEO Ophir Tanz:

“85% of the time textual data does not support the actual brands that are showing up in an image”

14995977792_bd6746ee4d_zAccordingly, the mandate for computers is to get into the object, the image, the video, in order to understand it and communicate about it as well, or even better, than a person would.


Expect the next 24 months to bring massive developments in the computer vision field, many of which may directly or indirectly affect you.


Computer vision gives us access to a data gold mine that will change marketing forever by…

1. Widening  the pool of measurable resources

People stream videos in many languages and thus text searches for captions are of extremely limited value. Dextro analyzes every public video on Periscope and MeerKat and categorizes content into trending themes.  The app lets users query images and video and delivers data results that can be analyzed, visualized and used just like any other kind of data metrics for websites or mobile apps.

2. Alerting business owners to how often and in what context their brands show up

1.8 billion photos are shared on social media every day, and the number is just growing. But because of the “black box” effect, much of that data is inaccessible to marketers. Ditto Labs is trying to change that using its proprietary software to look inside photos and identify a brand’s logo.

Marketers can follow users who are posting images of their product or they can serve ads to those same customers.  An entirely new level of customer insight may be possible by looking at customers who post photos of a brand, but do not mention it by name.

3. Facilitating the development of real-time focus groups

If you’re thinking that advances in computer vision are limited to the screen, think again. The most radical changes will happen IRL. Affectiva is just one of several new pieces of software that will be able to read your emotions in real-time by tracking your facial expressions.  Having already secured $20 Million in funding, Chief Science Officer, Rana el Kaliouby, isn’t worried that he’ll freak people out. Recalling the “Intel inside” motto, Kaliouby imagines “Affectiva-embedded” inscribed on all our belongings. “It’ll sit on your phone, in your car, in your fridge. It will sense your emotions and adapt seamlessly without being in your face,” he told Wired in an interview last month.

Although that might seem outlandish, it might not stop there: emotionally intelligent software coupled with spreading surveillance technology could lead to a  variety of Big Brotheresque scenarios. For instance, a video billboard that records a viewer’s reaction in real-time can push emotional analytics to the content creator, who can tweak the video to incrementally evoke the desired reaction.

While the lack of an opt-in option makes that an extreme scenario, it may not be too far off. According to a promotional Affectiva video,

In the TV and digital advertising world, our emotion data has [already] been used to identify flops and predict at what point viewers will skim. In the movie world, our emotion data has [already]  been used to identify the most engaging beats to optimize movie trailers and predict box office earnings.

Like all advances in technology,  breakthroughs in computer vision will expand our capabilities beyond what we believed was possible, and like all potential developments, especially the ones that frighten us, it’s going to be thoroughly scrutinized.  That’s good news, because, as the inimitable psychologist and  behaviorist, B.F. Skinner told us, “the real problem is not whether machines think, but whether men do.”