GENEA Challenge 2022

Generation and Evaluation of Non-verbal Behaviour for Embodied Agents

Official ICMI 2022 Grand Challenge – Start date May 16 (but register now!)

The GENEA Challenge 2022 on speech-driven gesture generation aims to bring together researchers that use different methods for non-verbal-behaviour generation and evaluation, and hopes to stimulate the discussions on how to improve both the generation methods and the evaluation of the results.

The results of the challenge will be presented at the 3rd GENEA workshop at ACM ICMI 2022, with accepted challenge papers published in the main ICMI proceedings.

This is the second installment of the GENEA Challenge. You can read more about the previous GENEA Challenge here.

This challenge is supported Wallenberg AI, Autonomous Systems and Software Program ( WASP ) funded by the Knut and Alice Wallenberg Foundation with in-kind contribution from the Electronic Arts (EA) R&D department, SEED .



Important dates

April 4, 2022
Registration opens
May 16, 2022
Training dataset released to challenge participants
June 20, 2022
Test inputs released to participants
June 27, 2022
Deadline for participants to submit generated motion
July 15, 2022
Release of crowdsourced evaluation results to participants
July 22, 2022
Deadline for participants to submit system-description papers
August 10, 2022
Paper notification
August 17, 2022
Deadline for camera-ready papers
November 7 or 11, 2022
Challenge presentations at ICMI 2022 (Hybrid)

Call for participation

The state of the art in co-speech gesture generation is difficult to assess since every research group tends to use its own data, embodiment, and evaluation methodology. To better understand and compare methods for gesture generation and evaluation, we are continuing the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge, wherein different gesture-generation approaches are evaluated side by side in a large user study. This 2022 challenge is a Multimodal Grand Challenge for ICMI 2022 and is a follow-up to the first edition of the GENEA Challenge, arranged in 2020.

We invite researchers in academia and industry working on any form of corpus-based generation of gesticulation and non-verbal behaviour to submit entries to the challenge, whether their method is driven by rule or machine learning. Participants are provided a large, common dataset of speech (audio+aligned text transcriptions) and 3D motion to develop their systems, and then use these systems to generate motion on given test inputs. The generated motion clips are rendered onto a common virtual agent and evaluated for aspects such as motion quality and appropriateness in a large-scale crowdsourced user study.

The results of the challenge are presented in hybrid format at the 3rd GENEA Workshop at ICMI 2022, together with individual papers describing each participating system. All accepted challenge papers will be published in the main ACM ICMI 2022 proceedings.



Rules for the GENEA Challenge 2022

Only register for this challenge if you actually intend to submit an entry to the challenge and to comply with all its rules.

Goal of the challenge

The GENEA Challenge seeks to advance scientific knowledge relating to automatic gesture generation, by means of open science and a large-scale, joint subjective evaluation. It is therefore a challenge, not a competition: the point is not to find who does best, but what works best. The rules of the challenge have been designed with the intent of best furthering this goal.

Database access

The gesture database is currently only available to registered participants in the challenge. Access to the data and participating in the challenge may also require completing and agreeing to a data licence agreement.
Download passwords will be issued after your registration is accepted and you have completed any required licences. Do not share the data or passwords with non-challenge participants.

Materials provided

All participants who have signed the licence will be given access to the following materials:
  • 3D full-body motion-capture clips of a speaking and gesticulating person, in BVH format.
  • Aligned audio waveforms of the speech associated with the motion-capture clips, in WAV format.
  • Text transcripts of each audio file with word-level timing information, in CSV and JSON format.
  • A label coding the identity of the person gesticulating in each recording.
  • Code and scripts for replicating the training of the previously-published baseline systems to be included in the challenge evaluation.
  • A pipeline for visualising their system output as videos of a gesticulating avatar, the same as will be used to render video stimuli for the challenge evaluation.

The above motion, audio, and transcriptions have been partitioned into an official training set and an official validation set. Please respect this split, and do not train on validation data when developing your system. (You may only train on the validation data when creating your final submission.) The official validation set was created using the same process as the held-out test set for the challenge, and has similar duration and other characteristics. It is therefore your best guide to what the final, held-out test set will look like.

If the full data release to participants is delayed, dummy data files illustrating the folder structure, filenames, and data formats will be made available. This allows participants to set up their data-processing pipelines in advance of the full data release.

Approximately one week before the deadline to submit generated motion stimuli, participants will also be given access to:

  • Held-out audio waveforms from the same source as the training audio, in WAV format.
  • Text transcripts of each held-out audio file with word-level timing information, in CSV and JSON format.
  • A label coding the identity of the person gesticulating in each recording.

The task of the challenge is to use one’s system to generate convincing gesture motion for this held-out speech, and submit that motion for evaluation. For this reason, we will not provide motion data with the held-out speech. Note that not all of the synthetic motion output submitted to the challenge may be included in the final evaluation.

If, for some reason, you have or gain access to the held-out motion data, we rely on your honesty in not looking at that material or letting it influence your challenge submission.


Limits on participation

Each participating team may only submit one system per team for evaluation. Teams can consist of one or more persons, from zero or more academic institutions and/or commercial entities.

Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance and receive approval first. We will try to accommodate all reasonable requests, provided the evaluation remains manageable. If the number of participating teams is small (e.g., less than five), the organisers may decide to permit multiple entries per team.


Use of external data

  • “External data” is defined as data, of any type, that is not part of the provided database. This includes, for example, raw recordings, structured databases, and pre-trained systems such as word vectors.
  • For this year's challenge, only open external data – data that is available to the public free of charge (possibly after signing a licence) – may be used.
  • All external data used in your system must be explicitly acknowledged by providing a citation and/or link in the paper accompanying your submission.
  • You are allowed to use external data in any way you wish, subject to any exclusions or limitations given in these rules.

For data pertaining to text and audio, any external data may be used, as long as they satisfy the criteria above. There is no limitation on the amount of such data you may use.

For motion data (whether 2D, 3D, or video), no external motion data is permitted. The reason for this data restriction is that other behaviour-generation challenges have found that system performance often is limited by the amount of training data that can be ingested, which is not an interesting scientific conclusion to replicate.

Your system must make use of the provided motion data, but you may exclude parts of that data if you wish. Use of the provided audio or text transcripts is entirely optional and not compulsory, as is the use of external text and audio data.

Please keep in mind that the point of the challenge is to gain better insight into the synthesis and perception of motion and gestures, not to see who has the best data and resources. Consequently, participants are strongly encouraged to share processed material they are using in their entries with other participants and with the organisers. Example data that may be valuable to share include: improved transcripts and alignments; motions from permitted external databases converted to the challenge format and retargeted to the challenge skeleton; denoised and reconstructed motion data; sub-selected data; bug fixes to baseline systems; etc.

If you are in any doubt about how to apply these rules, please contact the organisers for clarification!


Synthesising test motion

Synthetic gesture motion must be submitted in the same format as that used by the challenge gesture database (BVH, same skeleton, frame rate, etc.). The organisers take no responsibility for any effects that may occur when processing motion that was not submitted in the correct format.

Manually tweaking the output motion is not allowed, since the idea is to evaluate how systems would perform in an unattended setting.


Retention and distribution of submitted stimuli

Any stimuli that you submit for evaluation will be retained by the organisers for future use. The evaluated stimuli and any associated user ratings and comments will also be made publicly available for non-commercial purposes, labelled by the corresponding anonymised system label.


Evaluation

The GENEA Challenge centres on subjective human perception, not objective metrics. A large-scale formal evaluation by means of several user studies will be conducted to jointly evaluate and compare the submitted co-speech gestures. These user studies will be carried out online using crowdsourced raters who speak and comprehend the language featured in the database.

The evaluation of the submitted gesture motion will likely consider aspects such as:
  • Its perceived human-likeness, without accounting for the speech;
  • its appropriateness for the associated held-out speech, in terms of, e.g., timing, semantic content, or both; and
  • its appropriateness for the individual gesticulation style of the indicated test speaker in each segment.

The results of the evaluation, including a statistical analysis, will be made public, albeit with the identity of participating systems anonymised. Participating teams will be informed of the results and which system is theirs, so that they can draw conclusions and describe what they learned in papers describing their submissions.


Paper

  • Each participant must submit a paper (using the template specified) describing their challenge entry for double-blind review. This submission and its reviews will be permanently available on OpenReview.
  • Each participant is also expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons.
  • One of the authors of each accepted paper must register and present their work at the conference workshop associated with the challenge, if their paper is accepted, or if the organisers otherwise ask them to do so.
  • Paper submission, copyright, publishing, and presentation is subject to ACM rules and the rules of the parent conference at which the GENEA Challenge is hosted. Do not enter the challenge if you are unable to comply with these rules.

Use of results

This gesture-generation challenge is a scientific exercise. You may use the results only for the purpose of scientific research. Specifically, you may not use the results (e.g., your team’s ranking in the evaluations) for any commercial purposes, including but not limited to advertising products or services.


Please contact the organisers at genea-contact@googlegroups.com if you have any questions about these rules.

Registration

Challenge registration is open!

Please make sure to read and agree to the challenge rules before registering. Do not register for the challenge if you do not intend to comply with these rules.

Once you have read the rules, please use this sign-up form to register your team.




Organising committee

The main contact address of the workshop is: genea-contact@googlegroups.com.

Workshop organisers

Pieter Wolfert
Pieter Wolfert
IDLab, Ghent University - imec
Belgium

Taras Kucherenko
Taras Kucherenko
Electronic Arts (EA)
Sweden

Youngwoo Yoon
Youngwoo Yoon
ETRI
South Korea

Carla Viegas
Carla Viegas
Carnegie Mellon University
United States of America

Gustav Eje Henter
Gustav Eje Henter
KTH Royal Institute of Technology
Sweden