The Concept of a Universal Machine Interface

The next leap forward for the world of machines is going to be a huge proliferation of an endless diversity of computer-equipped devices of every stripe. They will range tremendously in purpose and functionality from a smart thermostat, to a self-driving car, to a virtual voice assistant. And as a key part of this new Internet of Things and intelligent devices, we need a new architecture for these devices to communicate with one another.

A longstanding paradigm of computer science, particularly in the web space with the global scale of online services, has been the concept of an API, or application programming interface. The purpose of an API is to make available data and functions from one service, to another different service.

The point I would like to address is that this API paradigm is not going to be sufficient to deal with the proliferation of smart devices and AI’s that is going to be occurring in the near future. API engineering requires humans to design the public-facing functions that a service makes available, and humans to make use of the manually-created API to be used by a different service. This works great for a relatively small number of services being connected together. But the number of links increases exponentially with the number of different types of agents. Connecting IFTT with Google is one thing- connecting ten thousand different types of devices in every possible cross-link configuration, is a different kettle of fish entirely.

My solution to this problem is to have a single uniform message interface, a “universal machine interface” if you will. Or UMI to coin yet another initialism.

This uniform message interface needs to enable any device (connected to the internet obviously) to post any message with a generic blob of data of any description or contents. And all other devices must all be able to view all the UMI messages. The idea here is that each device can listen only to messages that are of interest to its purpose. Rather than a custom-made API for interfacing two specific services, we have a “broadcast” system where a message is intentionally made public for “listening” devices to scan and decide if they are going to do anything based upon that message.

Naturally, with billions of devices this quantity of messages is going to be very voluminous and sorting them into streams or categories would be a good idea. Metadata such as a device ID for the source, a unique message ID, the timestamp for when this message was sent, and possibly other metadata such as a location would be useful as well. Message replies by having one message refer to another message or to a collection identifier would be useful. And many listening devices are not going to care about old messages, or about geographically remote messages, so this metadata allows for filtering lots of messages. Variables are represented within these messages, allowing arbitrary data to be labeled and passed between different devices, possibly made by entirely different manufacturers, with no engineered API specifically between them.

Every device implements its own internal logic, as well as reading from the public messages in whatever way makes sense for that device. And, posting messages publicly as well.

It is also important to point out that this approach is almost certain to be unsuitable for highly sensitive data, as publicly posting that data would not be desirable. Privacy is a vital issue that I do not want to understate, but at the same time, not all data is sensitive and in many use cases it is more useful to make information widely available and permanent to accomplish its purpose. So don’t do this for a medical or defense application. But for tracking a package or reserving a parking place or most other commercial or personal applications it is better to use a public system.

Let’s reduce this to a practical approach. A really simple case could be a question-answer situation. One device wants a piece of information. It asks a question, publicly. Another device that has that information could, in response to scanning that question message, send another message with the answer. This is starting to get into the weeds of a possible implementation about how this UMI system might function, but it seems likely we are going to need archetypes of messages including propositions (a simple statement of information), interrogatives (a question), imperatives (a command), and others. The proper response to an interrogative would be a proposition. “[ID][Interrogative] [Service] when is my package [package-id] arriving?” A listening device that is programmed to respond to that message replies with a “[ID][Reply][Proposition] Package [package-id] scheduled arrival 3 days from now.” This is not how a computer would represent this data of course- perhaps a dictionary-type data blob with key:value pairs would be best for each message transmission. But this approach could allow an uncountable number of different devices with different manufacturers to cooperate together usefully. Each device only needs to care about reading from and writing to the UMI.

AI agents and machine agents using a UMI type system do not need to care about the internal workings of other agents. This is vital. In this respect, the UMI system conceptually resembles the principles of object-oriented programming and encapsulation, but there is one enormous difference in the existence of a social relationship between agents that is never a factor for classes in a program.

What do I mean by this? A self-driving car pulling up to a gas pump and purchasing gasoline via a UMI system is functionally parallel to OOP- however two objects within the same program have 100% perfect trust between them by definition. Two separate economic agents or social agents do not. The car has to be concerned about the possibility of payment and not receiving the gasoline, and the pump has to be concerned about the possibility of delivering gasoline and not receiving payment. This agentive separation requires a new approach with UMI that is different from traditional object-oriented programs.

“Intentions” are a vital tool for this UMI system. Essentially this is a message flagged as an intention, like a proposition or interrogative, except it represents a future desired state rather than a current state. One way this might be utilized is to have two or more machines exchange a series of intention messages before engaging in a transaction. A self-driving car buying gasoline transmits its intention to buy gasoline explicitly, the gas pump replies, most likely with its intention to deliver gasoline in exchange for payment, and the car replies with an intention to pay.

It is worth repeatedly reiterating that machines are stupid. Humans have developed countless subtle social signals and cues, as well as patterns and traditions that make “simple” a sequence of events that is actually very complicated. Consider the complexity of a checkout line at a grocery store. There is a queue, a line of people waiting for their turn. Queues have their own social rules that are silently agreed upon. The cashier understands that an item placed on the counter or conveyor is an expression of an intent to purchase without words being exchanged. The scanning of items, tallying of an amount to be paid, an implicit and possibly silent request for payment, the execution of payment, this is a highly socially complex interaction that takes place countless times each day that we humans take for granted. Machines must be programmed to execute every step of this type of process explicitly. That is not an easy task to even describe and represent, much less execute in a bug-free form. The real world is messy and complicated and has countless weird permutations, edge cases, and bizarre situations, which a machine may be incapable of handling because it was never programmed to do so.

The only real approach available to us to deal with this problem is to limit the types of interactions that are allowed, to simplify and apply strict protocols until it becomes manageable. Like a vending machine- a primitive mechanical solution to this exact problem. The vending machine reduces the transaction to as simple and rigid a form as possible- the user inputs currency and presses a button to indicate their purchase choice.

However a vending machine is a simple machine. Although some reduction of the scope of the interaction is required, hyper-reduction alone is not going to solve the problem for our needs today. Virtual AI assistants ordering movie tickets, restaurant reservations, or autonomous machines buying and selling commodities and services, whether that is a self-driving taxi or a mailbox. These devices must possess far more complex behavior than a vending machine in order to do their jobs.

A virtual AI assistant in a customer’s phone wants to order tickets, and calls a business. A virtual AI assistant picks up and these two AI agents need to have a conversation. How do we accomplish this? And in fact two disembodied intelligences communicating over the internet is actually a relatively easy problem compared to embodied intelligences such as a self-driving taxi taking a human or a package as a fare.

We are most likely also going to need a new concept for a “job” or task request generally. This is analogous to a question, but rather than a request for information, it is a request for an action. UMI machine “jobs” would be a specific action, a completely different definition from a human job which is an enduring affair. A taxi fare, for example, would be at least one ‘job.’ Most likely a complex job like a taxi service is going to require division into many smaller jobs, possibly as small as unlocking and opening a car door being different jobs. A request that the self-driving taxi unlock its door is a perfect example of a superficially simple action that is nonetheless nontrivial to actually implement an intelligent machine to perform that action as reliably as a human being can make that decision with little effort.

I hope this has been some food for thought. By no means is this a solved problem. But in my opinion the beginnings of a solution are in having a universal machine interface, which enables public messaging between devices of different make and manufacture. This “agentive” universal message transmission system needs to be considerably more complex and nuanced than traditional object-oriented approaches, because different agents cannot assume function or cooperation the way different objects within a single program can. Among other ideas to be developed, this could be helped by having new types of messaging between smart devices, such as public expressions of “intentions” and other “squishy” data for facilitating effective interactions between disparate machine agents.