AI & the [Human] Body

In this guide, Cailean Finn tells us about human pose detection and recognition technology. In this application of AI and ML, the technology hasn’t been accessible to ‘outsiders’ until recently, and Cailean walks us through what it is, tells us about its history and shows examples of how it can be used for creative purposes.

decroative corner decroative corner
decroative corner decroative corner

01_Introduction to Human Pose Recognition

→ Human body language is an intrinsic component of our lived experience. Through our movements we can engage in nonverbal communication, using our physical movements to express ideas and emotions often done instinctively rather than consciously. Subsequently, our perceptions of others are heavily influenced by their body language, communicating a plethora of information to the world. This flow of information does not cease when you stop speaking, even when you are silent you are still communicating.

→ In the digital age, we have unfortunately seen this complex language fade into the background. This has become even more evident during the pandemic, intoxicated by the copious amount of Zoom meetings, a domain where our level of communication is severely limited. So, how can we encompass our body language as a tool for communication in the digital age?

In this guide, I hope to provide a brief historical and technical overview of the many artificial intelligence and machine learning tools for Human Pose Recognition (HPR) that are currently available and in development.

→ Human Pose Recognition is a branch of Computer Vision research, and is essentially a technique that allows us to accurately detect and predict/estimate the pose of a person. This is achieved by identifying and classifying the coordinates of the joints of a human body, such as wrists, shoulders, knees, arms (…) commonly known as landmarks.

→ Through more accurate representations of our physical body it can enable us to create more natural and complex interactions with different virtual environments.

Authors Ginés Hidalgo (left) and Hanbyul Joo (right) in front of the CMU Panoptic Studio, Open Pose

→ In the past, there has been many technical limitations for artist, designer and creative practitioners to utilise and experiment with Human Pose Recognition tools. As this technology becomes more accessible through the development of tools like OpenPose and MoveNet, it presents us with the opportunity to explore new modalities of bodily interaction. With this increase in accessibility and speed, human pose recognition is becoming more ubiquitous across numerous ecologies, and we must begin to critically observe how this information could be used when mediating our bodily movements digitally.

How can we use Human Pose Recognition to translate our intimate bodily movements in a digital environment? What elements do we lose during that process?

Ultimately, the aim of this guide is to provide a foundation for further exploration and experimentation of Human Pose Recognition.

Key Terminology

CVComputer Vision

HPRHuman Pose Recognition

HPEHuman Pose Estimation

CNN → Convolutional Neural Network

Landmarks → A set of defined coordinates that represent the different joints in the human body. The number of joints mapped varies from model to model.

CVPRComputer Vision and Pattern Recognition Conference: An annual conference on computer vision and pattern recognition, which is regarded as the most important conference in its field.

IMU → Inertial Measurement Unit : An electronical device which records and measure a body’s specific force, angular velocity and sometimes the orientation of the body.

COCOCommon Objects in Context, A large scale object detection, segmentation, and captioning dataset.

OpenPose → An open-source real-time multi-person system to detect not only human body joints but also hand, face and foot keypoints.

Landmarks → A landmark corresponds to different body parts/joints. The relative position of landmarks can be used to distinguish one pose from another.

PoseNet → PoseNet is a machine learning package that allows for Real-time Human Pose Estimation. There is also a TensorFlow.js implementation that enables these models to run real-time human pose estimation in the browser. This Tensorflow implementation has been integrated into the ml5.js library, which makes machine learning for the web more accessible and approachable!

02_History of Human Pose Recognition

This section presents an incomplete list of the many developments made in human pose recognition, as well as some early ideas surrounding mapping and representing bodily movements.

→ Historically, systems have been created to translate the semiotics of our bodily movements into another language prior to the advent of computer vision, human pose recognition or computing in general.

→ Movement scripts is one instance of a system that was developed to transcribe this visual-kinesthetic language, and was widely used across Europe in the 15th century. Many were invented to record a unique movement system such as an idiom for dancing or gestural system. This was seen as a technological breakthrough at the time, as there was no existing tool created for such a purpose.

→ Movement notation was never an integral part of any dance study or practise. The technological advancements in video – especially in 1970s with video recording – overshadowed movement notation greatly. However, these early movement systems reflect many of the same goals and motives in human pose recognition research, striving to create more and more accurate mathematical and graphical representations of movement itself.

→ In computer vision, Human Pose Estimation has been studied for decades. However many methods prior to 2012/2013 had many limitations around adaptability, speed, and hardware requires that extend outside of a single RGB camera or monocular view. During this time, many algorithms have had their spotlight - pictorial structures - but in recent year human pose recognition has seen major developments with the advent of larger and more complex dataset (COCO*, *CMU Motion Capture Dataset) and machine-learning algorithms being formed, enabling the machine to establish a greater understanding of human body language through pose detection and pose tracking.

Learn how to dance La Macarena with SubZero, at 9gag.com

→ The importance and influence of this technology can not be understated, as we now have the capability to extract more information from a single image then ever before. At present, human pose estimation is used across a range of consumer and scientific domains, such as Robotics, Surveillance, Gaming, and Sports. It presents a new technique and perspective on how we can view, and study body language and utilise it as a tool (hopefully for good) to create more natural computer interfaces, which is inclusive of a more visual/kinaesthetic form of communication.

→ The timeline below are findings taken from the Computer Vision and Pattern Recognition (CVPR) conference. CVPR is an annual conference held to discuss and showcase the latest developments made across a wide range of topics such as object detection, object segmentation, 3D reconstruction and human pose estimation, and is hailed as the most importance conference in the field of computer vision.

I attempted to include projects and papers that reflect key stages of development in human pose estimation over the past 10 years. The timeline is incomplete with its content at times quite technical, but for each paper it is usually accompanied by a video presentation which is visually fun to watch, and providing a glimpse of what the future might look like!

Timeline

(2013)

Unconstrained Monocular 3D Pose Estimation by Action Detection and Regression Forest

A Stereo Camera Based Full Body Human Motion Capture System Using a Partitioned Particle Filter

(2014)

3D Pose from Motion for Cross-view Action Recognition

→ A Layered Model of Human Body and Garment Deformation

(2015)

Pose-Conditioned Joint Angle Limits for 3D Human Pose Reconstruction

The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose

Simultaneous Pose and Non-Rigid Shape with Particle Dynamics

(2016)

DeepHand: Robust Hand Pose Estimation by Completing a Matrix Imputed with Deep Features

SMPLify: 3D Human Pose and Shape from a Single Image

→ Deepcut + http://mocap.cs.cmu.edu/ CMU dataset

DeepCut

End-to-End Learning of Deformable Mixture of Parts and Deep CNN for Human Pose Estimation

OpenPose

(2017)

Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields, CVPR 2017 Oral

Estimating body shape under clothing

A simple yet effective baseline for 3d human pose estimation

Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision

(2018)

DensePose: Dense Human Pose Estimation In The Wild

OpenPose v2 (2018)

Using a Single RGB Frame for Real Time 3D Hand Pose Estimation in the Wild

(2019)

MonoPerfCap: Human Performance Capture from Monocular Video

CVPR 2019 Oral Session 3-2B: Face & Body

BodyFusion: Real-time Capture of Human Motion and Surface Geometry Using a Single Depth Camera

Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views

→ 5 2D cameras + SIMPlify

LiveCap .Habermann

DeepHuman .Zheng et al

DeepCap

SMPL-X

(2020)

Object-Occluded Human Shape and Pose Estimation from a Single Color Image

Contact and Human Dynamics from Monocular Video

ExPose: Monocular Expressive Body Regression through Body-Driven Attention

VIBE: Video Inference for Human Body Pose and Shape Estimation

(2021)

AGORA human pose and shape dataset

FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

HybrIK - A Hybrid Analytical-Neural IK Solution for 3D Human Pose and Shape Estimation

SimPoE: Simulated Character Control for 3D Human Pose Estimation

TUCH: On Self-Contact and Human Pose ( interesting to show issues w/ contact )

POSA: Populating 3D Scenes by Learning Human-Scene Interaction ( contact + proximity )

Human POSEitioning System uses IMUs

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

PIFuHD

PaMIR

ARCH++

(2022)

GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras

OSSO: Obtaining Skeletal Shape from Outside

BEV: Monocular Regression of Multiple 3D People in Depth

ICON: Implicit Clothed humans Obtained from Normals

Neural Head Avatars from Monocular RGB Videos

03_Re/defining Creativity

So now, how can we use such tools to translate bodily movements into something meaningful?

Through these developments human pose recognition has now reached a point where it has become widely accessible and commercially viable. State-of-the-art models such as OpenPose* and *PoseNet has enabled and inspired more developers and makers to experiment and apply pose detection into their own unique projects.

This field is still in its infancy and has yet to be fully explored by many creative practitioners/designers/artist. However, we should be excited for how it could re-shape how we interact with digital technologies with its many possibilities. As I stated in the introduction, body language is an intrinsic component of how we communicate and share knowledge with each other as humans, and for the most part what we output to the world is done subconsciously. As human pose recognition technologies become more and more ubiquitous, it may shine a spotlight on this forgotten language in the wide digital landscape, and allow us to critically observe and subsequently reconfigure our approach to this (in)visible language in which we all know.

“The human mind ‘knows’ body language from a kind of primordial memory. We seem to be capable of reading different meanings in different expressions and postures by the second, translating it into emotions based on our personal and cultural experiences when interacting with others. Teaching this complex and often subconscious ‘body knowledge’ to an AI is a different story.”

Coralie Vogelaar

NOTABLE CREATORS

decroative corner decroative corner

Fingerp(ai)nt With Words (2022), by Computational Mama

Computational Mama a.k.a Ambika recently gave a beginner friendly workshop that introduced participants to using HandPose in P5.js. Computational Mama has been learning and experimenting with creative coding since 2017. As a creative technologist, her work explores coding as a form of self-care and learning. She is a regular live streamer on Twitch, where she teaches the basics of creative computation and new approaches to computational thinking.

decroative corner decroative corner
decroative corner decroative corner
Youtube video:

Inhabitating the Virtual City (2022)

Collaborating with Gramazio Kohler Research , GAMMA is implemented in Nvidia Omniverse, and let 200 virtual humans inhabit a 600-meter-high digital vertical city. As shown on the left, the entire system is running on the fly for about 8 hours per day. Virtual humans of various identities are continuously placed at random places, and move spontaneously in the scene. Curated by Norman Foster Foundation and others, this is currently exhibited at the Guggenheim Museum, Bilbao. Their research is also in the frontpage news of ETH Zurich. I was truly blown away by this work. Its amazing to see what is possible when state-of-the-art technology is infused with a creative purpose. The work doesn’t directly deal with pose estimation but I am certain it can be found somewhere down the pipeline of this general model. It is a great example of how developments in Human Pose Esitimation can aid in other tasks that deal with the bodily movement!

decroative corner decroative corner
decroative corner decroative corner

Move Mirror (2018), by Jane Friedhoff and Irene Alvarado, Creative Technologists, Google Creative Lab

A fun experiment called Move Mirror that lets you explore images in your browser, just by moving around. The experiment creates a unique, flipbook-like experience that follows your moves and reflects them with images of all kinds of human movement — from sports and dance to martial arts, acting, and beyond. Sadly, this is not featured as a web application on the experiments.withgoogle website. However, the code is still open-source and available on their website! There is also a nice Tensorflow blog post about the projects development, providing a deeper dive into PoseNet.

decroative corner decroative corner
decroative corner decroative corner

ML5 Pose Yoga (2022), by Nadia Piet

"For my thesis, I’m exploring the potential for human pose estimation through machine learning to support embodied learning systems in the context of yoga studies. In simpler terms: I’m training a computer to give real-time contextual information & audiovisual feedback as you practice. For this, I’m looking to better understand the current (learning) experiences of yoga practitioners & teachers to determine the specific features I’ll go on to build for my prototype 👩🏼‍💻"

decroative corner decroative corner
decroative corner decroative corner

Future Dance of Nostalgia (2022), by Kexin Hao

Kexin Hao is a visual artist and designer born in Beijing and based in The Netherlands. Her practice is a marriage of graphic design and performance art. In her recent works, Kexin investigates in the themes of body, rituals, health, archive, and collective memory.

Future Dance of Nostalgia is a dancing game which invites audience to perform the choreography that extracts and abstracts the movements found in the pre-industrial, heavy physical labour, and work songs. Motion tracking technology allows the body movements to be quantified, measured, and evaluated. Historical archives of work songs provide the inspiration for the music that renders the old tales and melodies into clubbing beats that lead the dance.

decroative corner decroative corner
decroative corner decroative corner

Body, Movement, Language: A.I. Sketches with Bill T. Jones (2020), by Maya Man at Google Creative Lab

Maya Man is an artist whose work considers the computer screen a space for intimacy and performance, focusing on the phenomenon of translating our offline selves into online content. She has exhibited internationally at spaces including SOOT Tokyo, Vellum Los Angeles, Power Station of Art Shanghai, Times Square, and Feral File. Her work has been featured in Art in America, Forbes, Refinery29, Dance Magazine, and more. Maya holds Bachelor of Arts degrees in Computer Science and Media Studies from Pomona College. She is currently pursuing an MFA in Media Art at UCLA in Los Angeles, California. Maya joined our AI Playground to give a talk titled Navig(AI)ting Self and Body on The Internet, where she talks about this project and the overall exploration of digitizing movement in here practice. Watch the talk here

decroative corner decroative corner
decroative corner decroative corner

Editation on Violence (2022), by Derrick Schultz

The work matches poses from a video source to a large dataset of labelled images from various films ( I think ). I wish I could find out more about the processes/techniques Derrick used, but there is little documentation about this particular project online besides this video. Derrick Schultz. Utilizing machine learning and other computational techniques, his work explores multisensory perception, generative abstraction, and computational filmmaking. In addition to creating his own work, Derrick also teaches machine learning to artists, designers, and image makers. Artificial Images courses combine small group personal instruction with a digital community from across the world.

decroative corner decroative corner
decroative corner decroative corner

Choreographic Camouflage by Liam Young

“The film 'Choreographic Camouflage' by speculative architect and director Liam Young explores the ways in which technology can hack surveillance and tracking systems that monitor our daily lives. In Choreographic Camouflage, Young teams up with choreographer Jacob Jonas for a dance performance and film that presents a new vocabulary of movement that has been designed to disguise the proportions of their body from the skeleton detection algorithms used by modern city’s surveillance networks to track and identify individuals.” - STRP

decroative corner decroative corner

04_DIY: Proceed with Caution

When talking about anything AI x ML related, we must be very careful in how we plan and develop such systems and tools. We should also consider their wider influence and impact on various ecologies that might extend outside the more typical use cases for such technologies. In the context of Human Pose Estimation, our approach should be no different. We should be critical of the techniques researchers and big tech companies adopt when working with artificial intelligence; right down to how they curate their datasets, hardware/power requirements, and how these technologies are applied “in the wild”.

Besides the overarching problems that plague the AI pipeline such as labor and biases, I’m unsure what potential issues we may face due to human pose estimation technologies, and so my observations may be speculative at times. However, I also hope to touch upon the limitations of current human pose estimation technologies, alongside the problems that researchers face moving forward.

→ Datasets & Cognitive Sweatshops

A dataset is one of the most important element which is responsible for creating most of the models and tools we see and use today. By curating a large dataset that contains a diverse set of information such as images and text, or in our case a wide set of poses from different perspectives, and body shapes, it allows us to create a more accurate representation and estimation can be achieved. This paired with highly detailed and complex image labeling can really assist in an algorithms quest to achieve its desired output.

“Again, the myth of AI as affordable and efficient depends on layers of exploitation, including the extraction of mass unpaid labor to fine-tune the AI systems of the richest companies on earth” -

Kate Crawford

For example, OpenPose was trained on the Common Objects in Context (COCO ) dataset created by Microsoft, with it having over 250,000 people with their keypoints labelled. There is even 3D motion capture datasets such as the MU Motion Capture Dataset which has aided in even providing even more accurate labeling surrounding motion.

Even though these large scale datatsets exist, they require a vast amount of labour to produce and at times it is not achieved ethically. In the case of COCO, one of the most popular large scale labelled image datasets, they annotated their large corpus of images through “crowdsourcing” tasks, using workers from Amazon’s Mechanical Turk (AMT) platform. Workers are primarily tasked with tagging images for computer vision systems to test whether an algorithm is producing accurate results. In the majority of cases, these workers are underpaid with their contribution becoming lost in the long AI pipeline; and to do so would only increase the cost and efficiency of AI for large companies/corporations who are driven by capitalistic greed. This is one less recognised fact about AI and its infrastructure, and I can only hope it changes in the future.

If you are curious about the images used in the COCO dataset, Roboflow has developed a web application to allow users to easily browse through its corpus. I recently made a post on Instagram that showcases some of the unusual images that I stumbled upon!

Body language in the wild

As I mentioned earlier, human pose estimation is a major branch in computer vision research. One of the biggest industries that can take advantage of new developments in computer vision technologies is in surveillance. As privacy is practically an illusion at this point, with every corner of urban landscapes infested with cameras that collect, mine and process information about individuals and their interactions with their environment, for opaque agendas.

Human pose estimation is unique when compared to many other popular machine learning algorithms applied in surveillance systems, as what information can be extracted from our body language and movements? and how can that be used to target, and harm an individual?

However, researchers are attempting to shine a new perspective on human pose estimation when integrating into such systems. Stating that it could be used as a method to preserve privacy, prioritizing viewing the reconstructed skeletal structure, over any facial features - preventing ethnic bias and reducing complexity.

Anonymized results of a central station scene. Re-identification shows stable track results in the foreground. Background tracks appear more unstable due to smaller boxes and increased occlusion. Temporal consistency partially stabilized background tracks. From with Human Pose Estimation in Real-World Surveillance? Mickael Cormier, Aris Clepe,Andreas Specker,Jurgen Beyerer, Fraunhofer IOSB, Karlsruhe, Germany Image Source

Future Obstacles

Some questions to ask and things to consider in relation to the use and development of HPR:

→ Datasets

→ What datasets are used in the creation of prolific models such as OpenPose?

→ It is possible to use tools for exploring the contents of an image dataset ( COCO ), and it’s worth doing so

→ What are the logistics of the curation of dataset used to train HPR models?

→ Issues with contemporary HPR models

→ Body diversity, Age Diversity - Issues with recognizing different age groups

→ Occlusion - Challenging problem in CV/HPR

→ Most models don’t explicitly model depth

→ State-of-the-art model fail to consider contact, with others and self contact

→ Privacy

→ How HPR is used as a mechanism to extract data in surveillance systems

→ What data can be extracted from our body language? How can that data used unethically? How is it being used now?

→ Its invisible nature and how we might struggle to realise its implications

NOTABLE CREATORS, ARTISTS & RESEARCHERS

decroative corner decroative corner

Infinite Posture Dataset (2020), by Coralie Vogelaar

Coralie Vogelaar is an interdisciplinary artist who combines scientific disciplines such as behavioral studies with the artistic imagination. Vogelaar investigates the relationship between human and machine by applying machine logic to the human body. She moves by endlessly morphing to the rhythm of the device – strapped in the frame of the screen - following or giving instructions; part human, part machine. Her movements, caught within a motion capture like tight suit deconstructing her body parts, talk of complex and conflicting emotions, but her face, from which we usually read how someone is feeling, is hidden. But is the machine that is observing her deconstructed and re-sequenced postures actually capable of recognizing what the body is communicating? Are we?

decroative corner decroative corner
decroative corner decroative corner

Dundas Square Surveillance Etude (2021), by David Rokeby

What Computer Vision are not used in this work?! Using background / foreground separation, motion / stillness separation, and multi-scalar edges, Rokeby combines these individual algorithms to show what information can be extracted from any environment. A combination he calls Temporal Depth of Field. David Rokeby is an installation artist based in Toronto, Canada. He has been creating and exhibiting since 1982. For the first part of his career he focussed on interactive pieces that directly engage the human body, or that involve artificial perception systems. This is a study for a longer work I am planning that parses the activities in a large public space captured in 4K video in various ways using computer vision techniques.

decroative corner decroative corner
decroative corner decroative corner

Living Archive (2019), by Ben Cullen Williams

Ben Cullen Williams is a London based artist, whose practice consists of sculptures, installations, photography and video. In his work, Williams explores humankind’s relationship to the world in a rapidly changing environment; he focuses on the intersection between space, technology and landscape. The Living Archive is an experiment between Studio Wayne McGregor and Google Arts and Culture - a tool for choreography powered by machine learning which generates original movement inspired by Wayne's 25-year archive. Alongside the Living Archive project Williams created a suspended video installation for a live performance titled Living Archive: An AI Performance Experiment. Williams collaborated with Google Arts and Culture to create abstract visualizations of AI generated choreography which explores ideas of dance as code and vice versa. Without privileging his own human lens Williams attempted not to make a distinction between what would traditionally be considered ‘ugly’ and ‘beautiful’.  Through this, it raises questions about the new aesthetics that AI presents to us, that don’t necessarily conform to our values,  guiding us to challenge the way we look at the world.

decroative corner decroative corner
decroative corner decroative corner

The Follower (2022), by Dries Depoorter

In this work, Depoorter uses open cameras and AI to find how an Instagram photo is taken. It is a great example of the capabilties and dangers of computer vision alogrithms when applied to surveillance. Its also very entertaining to watch the not so glamerous and imperfect processes that influencers are subject to when capturing a snapshot of their seemingly flawless reality. Dries Depoorter is a Belgium artist that handles themes as privacy, artificial intelligence, surveillance & social media. Depoorter creates interactive installations, apps, games.

decroative corner decroative corner

06_Closing remarks

While Human Pose Recognition might be less explored by creatives compared to image-generating and text-generating AIs, it is clear that there is an in-depth artistic and creative exploration of this technology happening at the moment. Creators looking into HPR are looking into its potential and producing experiments that benefit both art and science. They are also building up on rich tradition of exploring technology to understand what it means to live in a physical body, that has been present in media art and net art. Using reflection and artistic sensitivity, makers working with HPR address biases and harms done to humans who exist within the global technological stack and call for better ecologies of human-machine coexistence.

Looking at how this technology advances and becomes more accessible to independent makers, it is clear that there is a lot more to come, and we are looking forward to seeing more experimentation and more shared resources, and we hope this guide can contribute to the pool of knowledge.

💜

AI PLAYGROUND S01 / BODY

This guide is a part of our community program AI Playground / Body. AI Playground is an event series and a collection of Guides, structured under four topics: Image, Text, Body and Sound. As part of the program we hosted 2 events:

Artist Talk: Navigating the Self + Body on the Internet | Artist talk w/ Maya Man

Workshop: Learn to Fingerp(AI)nt with Words | Workshop w/ Computational Mama

decroative corner decroative corner
Youtube video:

decroative corner decroative corner
decroative corner decroative corner
Youtube video:

decroative corner decroative corner