Generative Playgrounds from Sketches through Image Segmentation

Yasmin ran an experiment around the creation of a custom COCO dataset of annotated playground images for the purpose of generating interesting algorithmically-generated backgrounds for the gather space we use during the AI Playground program using image segmentation, GANs, and the SPADE COCO dataset. The goal was to explore the annotation of images, the creative applications of image segmentation, and the ethical considerations related to dataset preparation.

decroative corner decroative corner
decroative corner decroative corner

🪄 Setup of the experiment - What is this about?

This experiment started from a conversation with Computational Mama (aka Ambika Joshi) for AI Playground. If you don’t know what AI Playground is, it’s a set of public online events we held on Gathertown, inviting artists and researchers that were using AI in their practice. Ambika wanted a way to make the Gathertown event background more interesting or algorithmically generated.

My idea was that after each event, attendees can collectively draw a sketch with different parts of a playground, which will be fed as an input into a Generative Adversarial Network (GAN) that will generate an image be used for the next event.

At first, I wanted to create a pix2pix model like Edges2Pikachu where one can draw a sketch and it transfers an image of a Pikachu.

Demo of an image-to-image model, play with them here:

However, a playground is a far more complex scene than a character so I decided that image segmentation and SPADE COCO dataset would be a better fit, similar to the one you can play with on RunwayML here.

Github repo for image segmentation model:

Another reason why I did this experiment is that I haven’t come across many custom image segmentation datasets (let me know if you have!). The custom dataset below is a tutorial from Paperspace so I feel like this experiment can add to this developing area of image synthesis.

‘Training GauGAN on Custom Datasets | Paperspace Blog’

This experiment is about the process of creating a custom COCO dataset of annotated playground images that can then be used to make a GauGAN, similar to the one developed by NVIDIA below. They even have a live demo you can play with.

Output from NVIDIA’s GauGAN2. Left is my sketch input with different colours, right is the output

Research questions:

_How can I annotate images and what’s the best method/tool?

_What are the possible creative applications of image segmentation?

_What are some of the ethical questions of dataset preparation and how can these work to be resolved?

🎨 Output + Resources - What did you make?

Github Files

I have uploaded all the code I’ve used and referenced where I got them from. You can check out the repository for this experiment here.

For reference, here’s the main github repo for training an image segmentation model. I didn’t successfully prep the dataset so I didn’t get to the point of training the model. Github for training image segmentation here.

Instagram guides

A Guide to Generative Image Models

Infographic on Instagram about different image models. Text-to-image diffusion models blew up in 2022 but they are the descendants of a machine learning architecture called Generative Adversarial Networks (GANs)

Rundown on Image Segmentation

Infographic on Instagram about a type of computer vision method called ‘image segmentation’ that builds up from object detection. Pixel-perfect masks of objects in a complex scene.

🎥 Videos of process

Talk about the research stages, looking at different models and Github repos. I talk about semantic image synthesis and Nvidia’s GauGan2

I talk about synthetic datasets and how I use scripting in Blender to rotate objects in a scene and potentially make as many images as I want.

💭 Reflections - What did you learn?

At the beginning of the experiment, I was really determined to make a custom image segmentation model. Not only was it something that I hadn’t seen much or done myself, I thought it would be a worthy challenge. I realised later on that maybe that was too high of a challenge to accomplish within my time and knowledge constraints.

The multiple modalities, or the different kind of inputs a user can make is so exciting for me. Not only can you write a text input describing a landscape, but also draw a sketch, but also paint a colourful segmentation map. By combining all of these, it could really increase the potential of our creativity. The next frontier of AI is creating multi-sensory systems that are capable of processing more of the world.

Challenge 1: Dataset diversity

Challenge 2: Dataset preparation

Synthetic datasets reduce the issues of making mistakes in labelling, because the masks of the categories will be pixel-perfect, unlike doing it by hand. I still have a lot more to learn in this area, but this is an exciting part of AI that means you don’t have to scrape images of online with real people’s sensitive pictures, or offshore cheap labour for annotating images.

Comments on the research questions

  1. How can I annotate images and what’s the best method/tool?
    • labelme worked for me BUT I had issues converting into a COCO format. It appears that the annotated files could still be used for other image recognition purposes though. See my reflections on this here: July: Using labelme to annotate
  2. What are some of the ethical questions of dataset preparation and how can these work to be resolved?

As this experiment comes to an end, I’m realising some of the mistakes I made that I’ll try to avoid or limit next time.

  1. Start simple. And I mean REALLY simple
    • In the setup of the experiment, I identified there hadn’t seen much research on image segmentation before by artists/designers doing it themselves. Maybe that was for a reason! What followed is many months of trying to do something complex that was a big stretch from my current knowledge.
    • Instead, I could have identified this gap and I could have made more simple mini experiments that could have led to the ‘main’ experiment of making a gauGAN. Mini experiments could be:
      • Generating images with a pre-made COCO dataset, like with ADE20K
      • Using the Runway ML model in a creative way, as an API maybe?
      • Create an image detection model of a playground (simpler than image segmentation)
  2. Ask for help as soon as possible, so your friends and peers can check you
    • I went ahead with this project very individually and didn’t involve people or ask for help until much later on. I could have jumped to trying a synthetic dataset faster, but also could have just been told ‘Yasmin, this is a bit difficult’ and simplified this much more. By the time I realised I should make this experiment way more simple, it was time to wrap up.
    • It was tricky to involve people because it can be a difficult process to onboard someone. I had conversations with friends that were eager to help but didn’t have specific ML knowledge or I talked to people that had loads of knowledge but little time to spare. Next time, it would make more sense to have a partner on the project for accountability and also reality checks. Pairing with someone with expertise from the beginning could be really helpful.

Hope you enjoyed the read! If you’re someone that feels like they could contribute to another iteration of this experiment that can solve some of the issues I’ve identified (or missed), please email me 💌 You can also Slack me on the AIxDesign slack.