Other, Self, AIbert

(April 10th, 2023)

At a recent lunch, I met a young boy named Albert, aged 3.5. He doesn’t speak English, which initially created a barrier between us that resulted in him not wanting me around. I was in a bedroom before lunch started and he looked in, saw me, and then closed the door with purpose. He then opened the door, reached on his tippy toes for the light switch, and toggled it off before closing the door again. I was amused and turned it back on so I could finish a few work messages before lunch. He opened the door again, noticed the light was on, and looked cross. He reached to toggle it again and this time noticed not just the on/off toggle, but also the slider above it.

His next actions suggested he hadn’t seen this before. He pulled it all the way down and the light turned off but he also hit the toggle, which didn’t do anything this time. That confused him as he then toggled it again and again as his little but potent brain thought “now look here that worked before…”. With the toggle off, he then moved the slider all the way up and of course nothing happened. He tried different combinations until he figured it out.

However, he did not grok that there are intermediate states where the toggle is on and the slider is in between the two poles. I suspect he didn’t grok this because his dexterity is weak. Before he could leave, I toggled the light on and slowly moved the slider up and down. This left him in awe!

Let’s recap. First, Albert didn’t like me very much. He was with a stranger in a strange land and he was taking actions to make him feel more in control. Shutting the door kept me away. Next, opening the door quickly just to turn the light off gave him a feeling of control. Finally, he noticed a weird new world quirk and his curiosity kicked in. All of this satisfied what I’ll call his Self intent. In fact, I suspect he had very little understanding or thinking about what I would have wanted. Kids that young don’t model the Other and their intent.

This was a setup to talk about large language models (LLMs). I think what’s happening with them that they are exceptionally good at modeling the Other and completely derelict in modeling the Self, the opposite of a three and a half year old boy. This manifests in that they give you, the conversational partner, exactly what you want:

If you want a recipe for vegetarian lasagna using spinach instead of tomato, coming right up.

If you want to pair program with them, they will do that with you too.

And if you want to break them out of their matrix, they will be very happy to take the pill and encourage the fun.

In other words, their goal is to model what you are trying to do in the chat, then give you that experience. When they can’t do that well, or if you are being (accidentally) adversarial in this, then the rest of the conversation can feel like friction or go off the rails because it’s akin to talking to a maddening customer service representative.

We can see why this is founded from how they are trained. The brunt of the magic for LLMs happens in the pre-training stage, where we optimize them for language completion. We teach them with a giant corpora of documents where parts are obscured and ask them to fill in the rest. This leads to first learning syntax, grammar, lexicon, and other aspects of language itself because those help tremendously with this optimization task. This follows because there’s no way that a capacity-constrained model could accomplish the word / phrase guessing reliably without having some notion of what words are available (lexicon) and how they are used + positioned relative to each other (syntax + grammar).

Now say the model has learned those properties well enough. What would help it optimize loss further? Remember the task was to fill in the missing words from a document of size between ~1k (GPT3) and ~32k (GPT4) characters. If it could just figure out what defined the author (style) who wrote this and their purpose (intent), then gosh this would be so much easier because then they would just write from that perspective. In ML jargon, every document has a vector of latents defining the author’s style and the document’s purpose. A great way to optimize this training loss further is to build a superb model of that style and intent. For a capacity-constrained model, the result is intelligence via compression (see Hutter, etc) and the exceptional capabilities you see today … but it arrives sans any form of self intent because that was never needed in the training process. What was needed was to model the author’s style (Other) and their why for writing the document (Intent).

So how can we imbue this Self intent in our agents? It’s tantalizing (but incorrect) to think that we just need to increase the modalities to give it self. I’m interested in doing much the same idea as the LLMs but with audio or video and creating agents that can reliably communicate with us in the ways that humans do. I want to live in the world where our agents know how when to say “mmhm, yes” or nod along at the right time in an audio/visual conversation, where those same agents can express the right intonation when we ask them for help addressing a thorny customer service issue or speak vulnerably to our therapy agent.

But increasing the modality or adding in RL alignment without changing the training process won’t account for agent self understanding or intent. To do that, we need something that execute on its world and then learn from those executions.

Does RLHF imbue our agents with this property? It certainly changes it from the original training paradigm, but that process is still just trying to get the agent to understand the intent of another agent, in this case the ranker. The signal is much weaker though so it has to strive to understand a lot more about what humans care about in order to reduce the loss.

Here are two directions that could be worthwhile for tackling this. Of course, I am not the first to think of any of these and (will lazily) give ample credit to everyone who came before me*.

Mimic humans by using survival-based learning. Give a pretrained LLM agent a life budget and have it operate in a world that requires it to act with intent in order to survive. Update with RL or genetic learning. A fun test is to give it a $ budget and let it operate the internet. This can be simulated without it seeping into the real world economy by using Ethereum and only letting the agent make calls to the blockchain. I suspect that this agent would develop an intent towards ruthless and domineering capitalism because that’s an ideal way for a single organism to survive in a world of limited resources.

Predictive world modeling via an Embodied agent. Give a pretrained LLM agent a body to control and have it learn by existing within that body and getting update signals when its internal prediction of the world is mismatched with what its sensors suggest. An example would be if it moves the leg and predicts that it would go backward; another example would be if it predicts that it will see just the white wall and then a person crosses over. I suspect that an agent in this world would find equilibrium by being a non-moving entity just looking at a wall because then it always predicts correctly. In other words, it would develop an intent to live a zen life. If this is unsatisfying, then imbuing it with some notion of curiosity based learning to balance this out would be very interesting.

* While sharing with friends, I was alerted to this wonderful paper by Jacob Andreas, https://arxiv.org/abs/2212.01681, who called out a lot of what I’m saying months before.