Part 1: Senseless Machines

“Is this a tool we've built or a creature we've built?”

  • Sam Altman, CEO of OpenAI

In spite of declining birth rates, civilization will grow by two orders of magnitude this century and surpass 100 billion thinking, creative, and productive inhabitants in the next two decades.

Tens of billions of connected robots, machines, and sensors will work together to build and maintain the edifice of civilization, intelligently allocating and moving resources to where they are needed.

We don’t have to believe in AGI or machine sentience to recognize that the workforce of civilization is at the cusp of a truly radical productivity growth spurt thanks to robotics and AI copilots. For better or worse, the energy output of civilization will be multiplied, consumed, and directed by machine intelligence at an ever-increasing rate.

But before general robots can enter our workforce and our lives, they need to learn how to dynamically perceive, respond to and interact with our physical world. To do this, it is helpful to think of the robots needing six distinct software capabilities: 1) Locomotion: The ability to move around the world, either with legs or with wheels.

2) Manipulation: The ability to move and manipulate objects in the world.

3) Spatio-semantic perception: The ability to distinguish between different kinds of things, and how far away they are.

4) Mapping: The ability to remember or know where things are that are not in their immediate field of view.

5) Positioning: The ability to understand where you are in relation to the map, especially in GPS-denied indoor environments.

6) Applications: Weaving these capabilities together into task-oriented action.

In the 1980s, roboticist Hans Moravec observed that it was much more difficult to teach machines things that are intuitively simple to humans, like fine motor skills and perception, than it is to teach them complex reasoning tasks that are difficult for humans.

In a nutshell, it is easier to teach machines to understand chess than it is to teach them to crack an egg.

Almost half a century later, locomotion, manipulation and spatio-semantic perception remain incredibly challenging and compute intensive tasks, and while commercial humanoid robots are now taking their literal first steps into the world, there is still much work to be done before these robots can be put to actual work. Without the full stack of six capabilities, the robots remain fundamentally incapable of deploying artificial intelligence into the physical world.

Interesting to note, however, is that it matters which order you tackle these problems. A machine that can walk without perceiving the world is useless - but a machine that can perceive the world is immediately helpful even if it can’t walk or manipulate the world.

Spatial computing is the art of teaching digital things to understand the physical world. By focusing on perception, mapping and positioning first, Auki has realized that we can get a massive head start in the robotics race by deploying physical AI in form factors that don’t require locomotion and manipulation to be solved.

Through the form factor of glasses, and even spatially aware smartphones, we can deliver value-generating AI copilots for the physical world today. If you embrace the realization that an iPhone or a pair of smart glasses is a robot without arms and legs, you can immediately start shipping the perception stack of the future while delivering real-world value today.

Importantly, we believe that the spatial computing stack (perception, mapping and positioning) needs to be collaborative, so that multiple robots and devices can have a shared understanding of the physical spaces they operate in. If your glasses detect an obstacle on a route, they should be able to communicate this to your robot so it avoids that path.

So what will it take to make robots understand the world around them through collaborative perception?

In 2014, Naval Ravikant sagely predicted that the internet would need a fifth protocol, beyond the four foundational protocols that make up the internet as we know it today, to programmatically manage the allocation of energy and resources between machines.

1) Link Layer: The physical hardware connections over things like Ethernet and Wi-Fi that enable devices to send and receive data over a network.

2) Internet Layer: Routes packets of data to their destination over multiple interconnected networks.

3) Transport Layer: The transport layer ensures the reliable and orderly delivery of data packets, managing flow control, error checking, and data segmentation.

4) Application Layer: Protocols like HTTP, SMTP, and FTP that allow applications to interface with the internet.

The fifth protocol would allow machines to exchange value with each other at the speed of machine thought. Robots or other physical AI agents wishing to negotiate and allocate the use of scarce resources need a universal protocol to express, store, and transfer value between each other.

Imagining our cities full of self-driving and coordinating cars and robots, Naval pictured them negotiating lane merges and overtaking on some kind of communication channel. The road is a scarce resource, after all, and so is time, and the human or physical AI agents employing the car or robot have different economic preferences.

Some form of programmable representation of value that can move at the speed of machines and transact in fragments of cents in fractions of a second seems to be a necessary condition for the smart cities of the future. At the time, Naval thought cryptocurrency represented a potential candidate for the fifth layer.

5) Trade Layer: A way to express, store, and transfer value and ownership between machines.

But Naval missed the need for yet another protocol of even greater machine importance. How could these machines reason about the road, and the world in general, without a shared sense and understanding of our physical space? Machines that only consume the internet cannot reason about the physical world.

Today, our digital devices lack a critical sense that humans take for granted. The ultimate incarnation of AI will require a sense of space: machine proprioception, made available to the AI through a decentralized nervous system and universal spatial computing protocol.

Last updated