Fluent Hallucinations: Training a Tagline Generator
Yasmin set out to examine the qualities of various text-generation models by creating a tagline generator to write up a resonating brand statement for AIxDESIGN. They considered the shift from older models like RNNs to transformer models as a transition from craft to synthesis while seeking a balance between believability and imagination in the generated text. The blog highlights the potential of AI-generated text and sparked discussions on the nature of human-like imitation versus the unique creativity of machine-generated content.
Introduction
In 2022, I joined the AIxDesign team as the AI Tooling project lead, which was all about how we could experiment with pre-made and custom machine learning models to aid in the workflow of AIxDesign. We wanted to figure out how AI tools and tricks can be incorporated within the events, workshops and other activities produced within that year. As a community of practitioners, we wanted to lead by example and implement the use of new technologies in our daily work, rather than just experimenting with them in a learning environment. We set out to make a tagline generator because we wanted a slogan for the AIxDesign website and other communication materials, but we could not agree on any. Why not harness the power of text models to generate a slew of taglines that we could use?
Easier said than done. Yes, I could have simply put âAIxDesignâ and a short description of the company and its values into a pre-made text generator and called it a day. But I wanted to get my hands dirty with some code and collaborate with AI tools to get some actual creativity out of it.
My experiments with so-called Artificial Intelligence (AI, or what artist Memo Akten would call ASS - Automated Software System) go deeper than making something cool, or being impressive, or just stating that I used âAIâ to create something. I wanted to research why certain elements of AI are attractive to us, and how the underlying mechanisms and social formations influence these processes and eachother. AI is a âhyper-object,â Conceptualised by philosopher Timothy Morton, a hyper-object desribes super complex phenomena that cannot be grasped or understood completely on a human level. Examples include climate change, oil spills, capitalism, etc. Morton, Timothy. Hyperobjects - Philosophy and Ecology after the End of the World. University Of Minnesota Press, 2013.both a structure of algorithmic thinking that is becoming more and more incomprehensible to humans and values prediction and optimisation.
A notable example of such work is poetry by Allison Parish, who uses that uses computer generated text, with RNNs and other models:ââCompassesâ is a chapbook recently published in Andreas BĂźlhoffâs sync series. It consists of poems produced with the help of a machine learning model I designed as the next step in my exploration of phonetic similarity.â â Allison Parrish
One of the biggest language models is GPT-3, released by OpenAI in 2020. With 175 billion parameters - values or settings that change how the model behaves - GPT-3 is capable of generating very plausible and complex text. My introduction to language models was with an older model called Recurrent Neural Networks (RNN). These models work well with sequential data like text. Iâve had experience training a character-level RNN (charRNN), which predicts the next character in a sequence.
Another type of RNN is called a Long Short-Term Memory (LSTM) network. It is better at remembering information and context over long periods of time. In my art project postHuman Koan, I compiled a selection of Zen koans, which are short stories, questions or statements used in Zen Buddhism. These were fed into a char-rnn model, in hopes of of creating machine-mediated poetry.
What amazed me when first working with these models is how fast a model learns the structure of words and spaces. As it trains further, consuming more data, you can see how the model begins to make sense of the textual world you provided. It essentially learns a language from scratch. Even though I knew the model was just trying to predict the probability that a âcâ would come after an âaâ, it felt like I was discovering magic. My first experience with language models was mind-blowing. If youâre interested in reading more about these older language models, Andrej Karpathy wrote a great blog post on âThe Unreasonable Effectiveness of Recurrent Neural Networksâ.
The difference between the two is that newer transformer models (aka GPT-2 and GPT-3) have several smart techniques for natural language processing (NLP). AI Engineer at Google, Dale Markowitz, has a great article breaking down transformers for a high-level understanding. I highly recommend reading this article or watching her video ML researcher Dale Markowitz lists this:
- Positional encodings: âstore information about word order in the data itself, not in the neural network structureâ each word is given a number so GPT-3 learns importance of order from the word data itself
- Attention: model pays attention to specific words in the input sentence to make a decision on the output sentence, the model learns which words are interdependent
- Self-attention: the model can understand a word in the context of the sentence around it
𧠠Training an RNN with AIxDesign Text
Letâs start with what I know. To create an initial dataset based entirely on AIxDesign brand copy, I manually scraped 8 pages of writing (only 23kb) from the website, Eventbrite caption text and our brand page on Notion. I trained the first iteration of an RNN model with this tiny corpus using the ml5.js implementation. ml5.js is a beginner-friendly library of machine learning APIs that can be easily accessed through the browser. These models can be implemented in p5.js, a creative coding language that Iâm most comfortable in. Here is the outcome:
You can play with generating text with this model here.
This implementation is basically nonsensical word mush. If you lower the âtemperatureâ, which is akin to randomness of the model, you can see more words that show up in the dataset. Common words like âandâ, that repeat a lot but donât carry enough meaning on their own. Raising the temperature makes it complete gibberish, because it doesnât have enough data to learn from. I knew that 23kb was insufficient. You need at least 2mb for an RNN to produce something coherent. (GPT-3 was trained on 40 terabytes of text!)
If all of AIxDesignâs written material only yields 23kb, how can one expand this corpus? I wanted to buff up this dataset by incorporating some texts by scholars whose work is important to AIxDesignâs. I wanted to borrow the voices of Donna Harawayâs Cyborg Manifesto, Kate Crawfordâs Atlas of AI and Catherine DâIgnazio and Lauren F. Kleinâs Data Feminism. Is it as simple as combining a bunch of books with all of AIxDesignâs writing? This is what I wanted to find out. However, soon after I trained this model in early 2022, updates on certain code dependencies made it impossible to re-train the model using the same repository. Since ml5.js is an open-source project run by volunteers, it wasnât getting updated much. Unfortunately, Google Colab, a cloud computing website that allows you to use powerful GPUs to train models with, was no longer supporting the package (tensorflow v.1) that the training repository was built on.
I soon came to realize that the code was outdated and I could no longer force Colab to use the older version of packages. My beautifully strange text generators had been superseded by transformer models.
There is a balance in believability and imagination when using machine learning to generate text. At least, as an artist thatâs how I see it. Itâs nice for synthetic text to have noise, to not be completely clean and passable as written by a human. The transformer models are much better at being coherent and remembering context, creating texts that communicate almost nothing about the writer. Is our collective aim to make machine generated text completely believable? To âimitateâ a human?
Eryk Salvaggio, AI artist, writer and author of Cybernetic Forests, calls the models that were popular before the boom of transformer and diffusion image models a sort of craft. Understanding these older models is like âdeveloping film to understand a digital cameraâ. Salvaggio, Eryk, âNotes From A Lost Future of AI Artâ, Substack, Nov 13, 2022 After many frustrating nights spent trying to find ways to use deprecated versions of code packages, it took a while to let go and admit that I could not make older models work. I kept on trying. I knew that RNNs were not going to make the most believable text. This boutique set of older neural networks feel like what oil painting is to photoshop, and how photoshop is beginning to feel compared to AI-generated images.
While contemplating the difference between synthesis and craft, I moved onto a more recent text generator: GPT-2. Open-source developer Max Woolf made Aitextgen, a framework that is more convenient for people to fine-tune this transformer model. I spent two days fiddling around with code but I had so many problems with the repository that it became obvious I would not make progress within a reasonable amount of time.
âď¸Â Using pre-made tools
I moved on to exploring apps other people made, such as BrandImage.io is a model made by Jonathan Boigne. You can read about how he developed this experiment here.
This fine-tuned GPT-2 model works fairly well, sometimes generating pretty good slogans. The input section has limited characters, and so I reworded AIxDesignâs branding descriptions into different short sentences. Here are my inputs:
- Community-led platform exploring creative purposes of AI/ML
- Outside of chasing productivity & optimization, what goals? (this one outputted the least interesting ones)
- Outside of big tech, we want to expand the typology of AI
An interesting quirk is that a lot of the outputs use âAIxâ, splitting the organisations name into two. Below are a selection of my favourite ones.
- For those who make a difference, youâll need AIx
- The new AIx
- AIxDesign. Better things happen
- The tools to do all of this
- The smarts are in store
- We want you to look like a person
- If youâre on board, we donât take you on board
I moved onto copy.ai, a paid service where I used a free trial. At first, I used a custom model.
I described what I wanted and tried to be as specific as possible. Even when I specified the slogan to be between 5-20 words, it first spewed out a paraphrased version of my initial description and then later on generated version of this slogan:
AIxDesign: AI for everyone
A bit boring, right? When I found their pre-made motto generator, this created some of the best and funniest results
Hereâs an unedited list of everything it generated for me:
- Artificial Intelligence meets Design.
- Explore. Research. Hack. Program. Art. Activism.
- Put the âAIâ back in âArtificial Intelligenceâ
- Art, Innovation and Knowledge Exchange.
- Imagine what you could do with a pizza, a couple of quarts of beer, and the universeâs largest dataset.
- The Machine Intelligence that Creates Design
- part human, part bot, all legs
- We are not robots
- Are we human or are we design?
- Glitch art for
- The future is yours, AIxDesign. We are here to act as your collaborators in this process.
- The future of AI is in the hands of the creators
- Expanding the boundaries of AI.
- Promote, explore and celebrate
- Artist Inxdesign
- Creative. Activist.
- Avant! Itâs a-me-a!
- Committed to the eradication of bullshit
- Wax on. Wax off. Itâs our job to see you through.
- learning by doing.
- Inquiring minds want to know
- Evil Machines
The main question of this experiment was to figure out how to train a custom-made text generator that can take on personalities and processes of thinkers, philosophers and writers aiming not for direct imitation but rather for a quirky piece of synthetic thought. Older text models feel like theyâre a thing of the past, but there isnât anything to replace their quirkiness. Why do I feel like itâs important to have multiple voices in AI, or machine generated text? Why not something generic? Here are a couple examples that show the potential of AI to create text models with multiple voices.
@kimkierkegaard tweets the merged voice of Kim Kardashian and an existentialist philosopher
Memo Akten made an âinteractive text conductorâ that can compose texts from multiple RNN models
>đŽ Social Force of AI
It brings me back to the idea of AI as a homogenising force that flattens the diversity of language, reducing âoutliersâ and generating a uniformity of both written and visual language. Text-to-image models, in all their diversity, have a certain aesthetic and depict conceptions of gender and race. Heikkilä, Melissa âHow it feels to be sexually objectified by an AIâ, MIT Technology Review, Dec 13, 2022. Since machine learning uses statistical prediction to produce the most probable next word or answer to a prompt, I think there is a danger that machine generated text will lose our voice. The quirks and nuances of how a human writes, or a corpus of human text. Thatâs why projects like Unsupervised Pleasures are so important. The independent makers behind it, Emily Martinez and Sarah Ciston, train and finetune language models with custom data to âinfect larger AI systems with a plurality of stories as a strategy of resistanceâ. Machine learning, in its industry-based concept, is about averaging, smoothing out the âexceptionsâ. However, I think maybe the entire point of these large language models like gpt-3 is its general nature. For copy writing in design, marketing and business, homogenous brandspeak is the aim. Bringing a voice to gpt-3 is, in a way, breaks its purpose. This kind of adjustment of the model is not a natural learning curve for a beginner that wants to learn the tool, but, in essence, hacking.
â Conclusion
At AIxDesign, we did not need a generalised slogan generator, we actually wanted to make something specific that embodied our values. The generality of this huge text model means that we get surface-level great text that could work for many purposes, but lacks specificity.
Through this experiment I realised how itâs getting more challenging for independent makers and researchers to grapple with AI developments as they become more complex to understand and more expensive to train. RNNs could be fed flat-text, in the sense that it absorbs plain text sequentially. GPT-3 utilises more complex techniques that requires breaking down words in a sentence into tokens (unique numbers).
Screenshot from OpenAI tokenizer that shows how a sentence is split.
Understanding text models on a practical level and not just a conceptual level is so challenging because of the current âblack boxâ Sommerer, Lucia. âFrom Black Box To Algorithmic Veilâ, Better Images of AI, Feb 01, 2022. paradigm in AI research that obscures how a model came to a particular statistical prediction of what word to follow next.
As I finish writing this, chatGPT has been explored by millions of users. This GPT-3 powered chatbot uses human reinforcement learning in the form of a text conversation, to generate plausible text on a huge range of topics. Hundreds of products incorporating AI are being made today, building on this technology. Outputs of chatGPT have been stripped of discrimination and prejudice present in datasets, replaced with disclaimers. The voice of language models was the system, now what? Corporate marketer? Transformer language models have been proven to generate very fluent and capable text, but I wonder about a future where machine-generated text with the personality of âgeneric humanâ supersedes the amount of human written text, and how that will affect our imagination and our world.
I end with a short list of the pros and cons of current large language models:
Pros:
- very realistic, probable outputs. Fluent. Never any grammar or spelling mistakes
- Well structured, complex responses that do a good job (most of the time) of breaking down questions
- Responses to the same prompts improve over time
- lots of current examples to see what these models can do
Cons:
- Generic, faceless tone (which is probably a pro for marketing and business copy)
- Sometimes outputs false information
- Logical fallacies
- You still need to proofread AI generated text and crosscheck its facts
- Furthers English as a global language and contributes to less use of nonstandard local dialects
The title of this article is borrowed from Gary Marcus, researcher and Deep Learning critic, in his Substack post called âWhat To Expect When Youâre Expecting ⌠GPT-4â. My writing on this is also inspired by Jerrold McGrathâs newsletter called âLocal Disturbancesâ.