Back in 2016, people participated in the “Mannequin Challenge” in which multiple people stayed still in random frozen poses while another participant recorded them while walking around them. Numerous videos of the challenge were posted on YouTube that made it a phenomenon. Now, a team from Google AI is using 2,000 videos from that pool of videos to train a neural network to understand 3D scenes. In other words, it is training its AI to better predict depth in the videos.
Humans are naturally good at interpreting 2D videos as 3D scenes, but machines need to be taught how to do it. The team is honing the neural network’s ability to reconstruct the depth and arrangement of freely moving objects, which can help robots maneuver in unfamiliar surroundings. The AI can then be used in making the navigation of self-driving cars trouble-free.
For this training programme, the researchers converted 2,000 of the videos into 2D images with high-resolution depth data and then used them to train a neural network. It was then able to predict the depth of moving objects in a video at much higher accuracy than it used to with the previous state-of-the-art methods. The team now plans to share that data with the larger scientific community.
Meanwhile, the fact that the researchers used people’s data may raise some security concerns among the users. “This data-scraping practice is neither obviously good nor bad but calls into question the norms around consent in the industry. As data becomes increasingly commoditized and monetized, technologists should think about whether the way they’re using someone’s data aligns with the spirit of why it was originally generated and shared,” MIT Technology Review pointed it out.