Skip to main content
https://www.highperformancecpmgate.com/rgeesizw1?key=a9d7b2ab045c91688419e8e18a006621

Computer vision inches towards ‘common sense’ with Facebook’s latest research

Machine learning is capable of doing all sorts of things as long as you have the data to teach it how. That’s not always easy, and researchers are always looking for a way to add a bit of “common sense” to AI so you don’t have to show it 500 pictures of a cat before it gets it. Facebook’s newest research takes a big step towards reducing the data bottleneck.

The company’s formidable AI research division has been working on how to advance and scale things like advanced computer vision algorithms for years now, and has made steady progress, generally shared with the rest of the research community. One interesting development Facebook has pursued in particular is what’s called “semi-supervised learning.”

Generally when you think of training an AI, you think of something like the aforementioned 500 pictures of cats — images that have been selected and labeled (which can mean outlining the cat, putting a box around the cat, or just saying there’s a cat in there somewhere) so that the machine learning system can put together an algorithm to automate the process of cat recognition. Naturally if you want to do dogs or horses, you need 500 dog pictures, 500 horse pictures, etc — it scales linearly, which is a word you never want to see in tech.

Semi-supervised learning, related to “unsupervised” learning, involves figuring out important parts of a dataset without any labeled data at all. It doesn’t just go wild, there’s still structure; for instance, imagine you give the system a thousand sentences to study, then showed it ten more that have several of the words missing. The system could probably do a decent job filling in the blanks just based on what it’s seen in the previous thousand. But that’s not so easy to do with images and video — they aren’t as straightforward or predictable.

But Facebook researchers have shown that while it may not be easy, it’s possible and in fact very effective. The DINO system (which stands rather unconvincingly for “DIstillation of knowledge with NO labels”) is capable of learning to find objects of interest in videos of people, animals, and objects quite well without any labeled data whatsoever.

Animation showing four videos and the AI interpretation of the objects in them.

Image Credits: Facebook

It does this by considering the video not as a sequence of images to be analyzed one by one in order, but as an complex, interrelated set,like the difference between “a series of words” and “a sentence.” By attending to the middle and the end of the video as well as the beginning, the agent can get a sense of things like “an object with this general shape goes from left to right.” That information feeds into other knowledge, like when an object on the right overlaps with the first one, the system knows they’re not the same thing, just touching in those frames. And that knowledge in turn can be applied to other situations. In other words, it develops a basic sense of visual meaning, and does so with remarkably little training on new objects.

This results in a computer vision system that’s not only effective — it performs well compared with traditionally trained systems — but more relatable and explainable. For instance, while an AI that has been trained with 500 dog pictures and 500 cat pictures will recognize both, it won’t really have any idea that they’re similar in any way. But DINO — although it couldn’t be specific — gets that they’re similar visually to one another, more so anyway than they are to cars, and that metadata and context is visible in its memory. Dogs and cats are “closer” in its sort of digital cognitive space than dogs and mountains. You can see those concepts as little blobs here — see how those of a type stick together:

Animated diagram showing how concepts in the machine learning model stay close together.

Image Credits: Facebook

This has its own benefits, of a technical sort we won’t get into here. If you’re curious, there’s more detail in the papers linked in Facebook’s blog post.

There’s also an adjacent research project, a training method called PAWS, which further reduces the need for labeled data. PAWS combines some of the ideas of semi-supervised learning with the more traditional supervised method, essentially giving the training a boost by letting it learn from both the labeled and unlabeled data.

Facebook of course needs good and fast image analysis for its many user-facing (and secret) image-related products, but these general advances to the computer vision world will no doubt be welcomed by the developer community for other purposes.

Comments

Popular posts from this blog

Uber co-founder Garrett Camp steps back from board director role

Uber co-founder Garrett Camp is relinquishing his role as a board director and switching to board observer — where he says he’ll focus on product strategy for the ride hailing giant. Camp made the announcement in a short Medium post in which he writes of his decade at Uber: “I’ve learned a lot, and realized that I’m most helpful when focused on product strategy & design, and this is where I’d like to focus going forward.” “I will continue to work with Dara [Khosrowshahi, Uber CEO] and the product and technology leadership teams to brainstorm new ideas, iterate on plans and designs, and continue to innovate at scale,” he adds. “We have a strong and diverse team in place, and I’m confident everyone will navigate well during these turbulent times.” The Canadian billionaire entrepreneur signs off by saying he’s looking forward to helping Uber “brainstorm the next big idea”. Camp hasn’t been short of ideas over his career in tech. He’s the co-founder of the web 2.0 recommendatio...

Drone crash near kids leads Swiss Post and Matternet to suspend autonomous deliveries

A serious crash by a delivery drone in Switzerland have grounded the fleet and put a partnership on ice. Within a stone’s throw of a school, the incident raised grim possibilities for the possibilities of catastrophic failure of payload-bearing autonomous aerial vehicles. The drones were operated by Matternet as part of a partnership with the Swiss Post (i.e. the postal service), which was using the craft to dispatch lab samples from one medical center for priority cases. As far as potential applications of drone delivery, it’s a home run — but twice now the craft have crashed, first with a soft landing and the second time a very hard one. The first incident, in January, was the result of a GPS hardware error; the drone entered a planned failback state and deployed its emergency parachute, falling slowly to the ground. Measures were taken to improve the GPS systems. The second failure in May, however, led to the drone attempting to deploy its parachute again, only to sever the line...

How the world’s largest cannabis dispensary avoids social media restrictions

Planet 13 is the world’s largest cannabis dispensary. Located in Las Vegas, blocks off the Strip, the facility is the size of a small Walmart. By design, it’s hard to miss. Planet 13 is upending the dispensary model. It’s big, loud and visitors are encouraged to photograph everything. As part of the cannabis industry, Planet 13 is heavily restricted on the type of content it can publish on Instagram, Facebook and other social media platforms. It’s not allowed to post pictures of buds or vapes on some sites. It can’t talk about pricing or product selection on others.   View this post on Instagram   A post shared by Morgan Celeste SF Blogger (@bayareabeautyblogger) on Jan 25, 2020 at 7:54pm PST Instead, Planet 13 encourages its thousands of visitors to take photos and videos. Starting with the entrance, the facility is full of surprises tailored for the ‘gram. As a business, Planet 13’s social media content is heavily restricted a...