The Alignment Problem – Fare Forward

Aligning Ourselves

If we want to meaningfully join in the conversation about AI, we’ll have to learn to face ourselves first.

Review by Joshua Rio-Ross

In South Park’s early March episode “Deep Learning,” middle schoolers use ChatGPT to manage their love lives and do their homework. Their teacher, bored of the students’ papers’ sudden verbosity, decides to use ChatGPT to grade the papers. Conflicts ensue about the integrity of work and the essential humanness of love, along with some necessary histrionics involving the government stepping in to set it all straight. South Park, as always, has its finger on the cultural pulse. Schools at all levels in the U.S. have had to confront the possibility (certainty) that students are outsourcing their work to whatever ChatGPT is. As someone who left teaching in higher education, I don’t envy the investigations into a new kind of plagiarism. But as someone who left higher education to work in machine learning, I am as intrigued and disrupted as anyone else. All industries will be. After all, the real twist of South Park’s episode comes in the credits, where they list ChatGPT as a co-writer.

By now, anyone who reads an article on the internet has heard of ChatGPT, if not used it directly. By its own admission, ChatGPT is “an advanced AI language model developed by OpenAI, designed to generate human-like text.” You can ask it to translate your Python code into Rust or turn your rambling nonsense into an outline, to act as a CEO for a company or concoct cocktail recipes for each of your sisters’ enneagrams. It’s powerful. Working in tech, it’s seductively cool and useful.

The electricity and fear are all over social media. New aphorisms have taken hold, like “AI won’t replace you. Someone using AI will.” Prognostications about the future of work and leisure and language abound. Professionals offer genuine concerns about the authenticity of voice and the veracity of what we read. And, of course, there’s the groundswell question of whether we’re going to lose control of AI if we continue advancing at this rate. How do we ensure AI will act in accordance with our human values?

This problem is a longstanding one. It’s called “the alignment problem,” and it runs through the history of machine learning, from its humble beginnings predicting linear values to loftier ambitions of making artificial intelligence capable of meeting or exceeding humans’ abilities in any intellectual domain.

In late 2020, Brian Christian released The Alignment Problem: Machine Learning and Human Values. Christian’s flagship skills are painstaking research, curated storytelling, and the distillation of technical computer science concepts into ideas the curious reader can meaningfully ponder. For a decade, his books have explored computer science as a human discipline, as a collection of skills and methods that were designed to address human problems and whose application always has ramifications for how we understand ourselves.

Every machine learning question raised has analogous and related human questions.

The Alignment Problem does this in a new, narrative way. For Christian, the alignment problem has been the driving question behind machine learning since the early 20th century. He introduces the core components almost unnoticeably, since the components–samples, labels, reward functions, optimization algorithms–are all discussed through historical examples that function as pericopes drawn out of a grander story. The effect is the feeling of an active, human project that we’re already a part of by virtue of being human–the feeling that our humanity precedes and is integral to the problems of machine learning.

Every machine learning question raised has analogous and related human questions, whether in learning theory or epistemology or otherwise. By the end of the book, the reader is equipped to enter the arena of AI ethics, not because they become technical experts, but because they have a human story for AI and can locate the alignment problem. It’s not in an arcane field of zeros and ones, but precisely in the thicket of human questions that characterize our lives.

Take, for example, one problem Christian introduces in the first movement of his book, called “Prophecy.” The computer-vision technology used to label our photos or unlock our phones has a history of not accurately identifying minority groups (relative to the US population) or failing to identify them as human at all. Christian’s investigation reveals that this is a classic problem of poor sampling techniques. People with darker skin tones were (and commonly still are) underrepresented in the training data used for these algorithms. One facet of the alignment problem, then, is the question How do we prevent AI from acting with and perpetuating social bias and inequity? As AI ethics hit mainstream conversations, questions like this one are likely near the top of the agenda. But Christian doesn’t let us miss that this question is intimately tied up in both a machine-learning question (What can go wrong when my data is poorly sampled?) but also, crucially, an unsettling human question (Do my limited experiences inhibit my judgment?). An effective approach to AI ethics demands that we parse them into both these machine-learning and human questions.

Christian’s second and third movements, “Agency” and “Normativity,” show how our relationship with the machines we train has evolved with our goals for machine intelligence. “Agency” traces how we began building our likeness into the machines we teach. We simulated human characteristics like curiosity, intuition, and expectation in machine learning in order to generalize machine intelligence, eventually enabling these models to beat us in chess and to explore virtual worlds. “Normativity” then shows how our dynamic with machines has changed as we’ve introduced more complex tasks. If we want these machines to learn tasks too complex for us to fully articulate–like driving a car–then we need them to learn to imitate our actions or anticipate our needs. As we know from children, animals, and new hires, imitation without correct inference of what exactly we’re trying to accomplish can be disastrous. Likewise, once a machine is granted license to interpret, misunderstandings and unexpected possibilities emerge. Controlling these possibilities requires new kinds of training, training that entails intimate interaction with machines. It’s sharing control of the wheel (literally) until it’s hard to tell who’s driving. It’s completing each other’s sentences.

Here the imagination runs wild. Most of our collective imagination about machine learning and AI has been formed by (often apocalyptic) sci-fi stories: 2001: A Space Odyssey, Total Recall, Minority Report, Her, Ex Machina, The Terminator series, etc. Each of these offers an answer to the question: What could go wrong with AI? But for any answer to that question, a follow-up question is warranted: How do we prevent that?

How, in this context, is inhospitably technical. Tempting as it might be to entrust these questions to our legislative body, that would be asking the same group of people who struggled to understand that Facebook made money by selling ads to now understand neural networks. And most of us don’t feel any better equipped to take on the math and magic behind AI. But we can’t avoid the alignment problem that Brian Christian took the time to investigate in painstaking, compelling narrative detail. We can’t avoid it because ChatGPT (with the help of South Park) has forced the alignment problem back into public conversation.

ChatGPT isn’t alone, either. Google has since released its competitor, Bard. Facebook released Llama. (Yes, “Llama”.) A stampede of these large language models (LLMs) are emerging as if from some recently beached ark. And the open-source community is now able to apply, adapt, and connect these LLMs to other applications. We can train them for particular use cases–to answer questions about our company’s training documentation or to summarize a Slack conversation. AI is moving with dizzying velocity. But the field is split on how to proceed, especially with the alignment problem looming and unresolved.

The AI conversation being forced on us feels at once immanent and transcendent, important and unapproachable.

Some say we should cut the cord entirely. Others want a pause. Like a shout to get off the ride right before the roller coaster drops, some of tech’s most (in)famous minds (e.g. Elon Musk, Andrew Yang) released an open letter titled “Pause Giant AI Experiments” calling for a six-month moratorium on all training of models more powerful than the architecture running ChatGPT. They want “a stepping back from the dangerous race to ever-larger unpredictable black-box models with emergent capabilities,” for the sake of accuracy, safety, and transparency. The letter’s concern is reasonable, if jargon-laden. Yet Andrew Ng, whom Christian frequently references for his AI research, believes the safest and most responsible route is more research, including more research in AI ethics questions, both technical and non-technical, machine and human.

The AI conversation being forced on us feels at once immanent and transcendent, important and unapproachable. Jargon stands as a barrier, but the greater obstacle is that we lack a story to contextualize the discussion. All ethical conversation begins with a story, even if the conversation primarily becomes What is the story? That is itself ethical work. And that is Brian Christian’s work in The Alignment Problem. He invites the technical and non-technical reader into the conversation to ask What next?, but only after we have established how we got here and prioritized the human elements at play. Some of our oldest problems have come to bear in our newest creations. We’ll need to examine our humanity anew to solve them.

Joshua Rio-Ross studied Philosophical Theology at Yale Divinity School. He is a data engineer in Nashville, Tennessee, where he lives with his poet wife and two pups who are a philosopher and a drama queen, respectively. He plays fantasy football using data analytics; he swing dances with his wife; he and his dogs just are.

The Alignment Problem was published by W.W. Norton on October 6, 2020. You can purchase a copy on the publisher’s website here.

Aligning Ourselves

You Might Also Like

The Bear

The Modern Myths

Jesus Merch