Pokémon Go's 30 billion photos are teaching delivery robots how to recognize the way

Author: Will Douglas Heaven

Translation: Deep潮 TechFlow

Deep潮 Guide: Niantic has turned the 30 billion city photos taken by Pokémon Go players into a new business. Its AI subsidiary, Niantic Spatial, trained a visual positioning system using this data, achieving centimeter-level accuracy—far surpassing GPS performance in urban canyons. The first major client is delivery robot company Coco Robotics. From catching Pikachu to delivering pizza, this may be one of the most unexpected commercial uses of crowdsourced data.

Full text below:

Pokémon Go is the world’s first phenomenon-level AR game. Released in 2016 by Google’s subsidiary Niantic, this game, built on the Pokémon IP with augmented reality gameplay, quickly swept the globe. From Chicago to Oslo to Enoshima, players flooded the streets, hoping to catch a Jigglypuff, Squirtle, or (if lucky) a rare Galarian Zapdos—hovering just out of reach in the real world.

In simple terms, this means millions of people holding phones and taking photos of countless buildings. “Five hundred million people installed this app within 60 days,” says Brian McClendon, CTO of Niantic Spatial. Niantic Spatial is an AI company spun off from Niantic last May. According to data from game company Scopely (which acquired Pokémon Go from Niantic at the same time), the game still has over 100 million active players in 2024, eight years after launch.

Now, Niantic Spatial is leveraging this unparalleled crowdsourced data—city landmark photos from hundreds of millions of Pokémon Go players’ phones, with highly precise location tags—to build a world model. This is a hot tech trend aimed at anchoring the intelligence of large language models (LLMs) in the real environment.

The company’s latest product is a model: just a few snapshots of a building or other landmark can pinpoint your location on a map to within centimeters. They aim to use it to help robots navigate more precisely in areas where GPS is unreliable.

As a major validation of this technology, Niantic Spatial has just partnered with Coco Robotics. Coco is a startup deploying last-mile delivery robots in several US and European cities. “Everyone thinks AR is the future, AR glasses are coming,” McClendon says, “but robots are the ones leading the way now.”

From Pikachu to Pizza Delivery

Coco Robotics has deployed about 1,000 suitcase-sized robots in Los Angeles, Chicago, Jersey City, Miami, and Helsinki, capable of carrying up to 8 large pizzas or 4 grocery bags. CEO Zach Rash says these robots have completed over 500,000 deliveries so far, traveling millions of miles in various weather conditions.

But to compete with human couriers, Coco’s robots (traveling at about 5 miles per hour on sidewalks) must be reliable enough. “Our best approach is to arrive exactly when we’re supposed to,” Rash says. That means not getting lost.

Coco faces the problem of not being able to rely on GPS. In cities, radio signals bounce and interfere between buildings, making GPS signals weak. “We do deliveries in dense areas with skyscrapers, underground tunnels, and overpasses, where GPS basically doesn’t work,” Rash explains.

“Urban canyons are where GPS performs the worst worldwide,” McClendon says. “You see that blue dot on your phone, and it often drifts 50 meters, placing you in another block, another direction, or across the street.” That’s the problem Niantic Spatial aims to solve.

Over the past few years, Niantic Spatial has been organizing data generated by Pokémon Go and Ingress (Niantic’s previous AR game released in 2013) players to build a visual positioning system—using what you see to determine where you are. “Making Pikachu run around the streets in reality and helping Coco’s robots navigate safely and precisely are essentially the same problem,” says John Hanke, CEO of Niantic Spatial.

“Visual positioning isn’t a new technology,” says Konrad Wenzel of GIS and geospatial analysis company ESRI, “but it’s clear that the more cameras outside, the better it works.”

Niantic Spatial trained its model with 30 billion images taken in urban environments. These images are especially dense around “hotspots”—important locations in Niantic games that encourage players to visit, like Pokémon gyms. “We have over 1 million locations worldwide where we can pinpoint your position,” McClendon says, “with centimeter-level accuracy. We know exactly where you are and which direction you’re facing.”

As a result, for each of these 1 million locations, Niantic Spatial has thousands of photos taken from roughly the same spot, but from different angles, at different times, and under different weather conditions. Each photo is tagged with detailed metadata: the phone’s precise position in space, orientation, posture, whether it’s moving, speed, and direction.

The company trains its model with this dataset, enabling it to predict its location precisely by “seeing” the environment—even in areas outside these hotspots where images and data are sparse.

Beyond GPS, Coco’s robots (equipped with four cameras) now also use this model to determine their location and destination. The cameras are mounted at hip height, facing all directions, with a field of view slightly different from Pokémon Go players, but Rash says adapting the data isn’t complicated.

Competitors are also using visual positioning systems. For example, Starship Technologies, a delivery robot company founded in Estonia in 2014, claims its robots build 3D maps of the surroundings using sensors, marking building edges and streetlights.

But Rash bets Niantic Spatial’s technology will give Coco an edge. He believes it can help robots stop precisely outside restaurants to pick up orders without blocking pedestrians, and park right at the customer’s doorstep instead of a few steps away—something that has happened frequently in the past.

The Cambrian Explosion of Robots

When Niantic Spatial started developing visual positioning, the goal was for augmented reality, Hanke says. “If you wear AR glasses and want the virtual world to lock onto your view, you need some way to do that. But now we’re witnessing a Cambrian explosion in robotics.”

Some robots need to share space with humans, like construction sites and sidewalks. “If robots are to integrate into these environments without disturbing humans, they must have spatial understanding similar to humans,” Hanke explains. “When robots get pushed or bumped, we can help them find their way back accurately.”

Partnership with Coco Robotics is just the beginning. Hanke says Niantic Spatial is building the first components of what he calls a “Living Map”: a highly detailed virtual world simulation that evolves with the real world. As Coco and other companies’ robots operate worldwide, they will provide new data sources to make digital replicas of the world increasingly detailed.

In Hanke and McClendon’s view, maps are not only becoming more detailed but are increasingly used by machines. This changes the purpose of maps. Long used to help humans locate themselves, from 2D to 3D to 4D (think digital twins and real-time simulations), the fundamental principle remains: points on the map correspond to points in space or time.

But maps designed for machines may need to become more like guidebooks, filled with information humans take for granted. Niantic Spatial and ESRI aim to add descriptions to maps, telling machines what they see, with attributes for each object. “The task of this era is to build useful world descriptions for machines,” Hanke says. “The data we have is a great starting point for understanding how the connections in the world operate.”

The world model is very popular now, and Niantic Spatial is well aware. LLMs seem to understand everything, but they lack common sense when interpreting and interacting with everyday environments. The world model aims to solve this. Some companies, like Google DeepMind and World Labs, are developing models that can generate virtual fantasy worlds instantly, then use them as training grounds for AI agents.

Niantic Spatial says they approach this problem from different angles. By making maps sufficiently detailed, you eventually capture everything, McClendon says: “We’re not there yet, but we’re aiming for it. I’m very focused on trying to rebuild the real world right now.”

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin