Official ICMI 2022 Grand Challenge – Start date May 16 (but register now!)
The GENEA Challenge 2022 on speech-driven gesture generation aims to bring together researchers that use different methods for non-verbal-behaviour generation and evaluation, and hopes to stimulate the discussions on how to improve both the generation methods and the evaluation of the results.
This is the second installment of the GENEA Challenge. You can read more about the previous GENEA Challenge here.
This challenge is supported Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.
The state of the art in co-speech gesture generation is difficult to assess since every research group tends to use its own data, embodiment, and evaluation methodology. To better understand and compare methods for gesture generation and evaluation, we are continuing the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge, wherein different gesture-generation approaches are evaluated side by side in a large user study. This 2022 challenge is a Multimodal Grand Challenge for ICMI 2022 and is a follow-up to the first edition of the GENEA Challenge, arranged in 2020.
We invite researchers in academia and industry working on any form of corpus-based generation of gesticulation and non-verbal behaviour to submit entries to the challenge, whether their method is driven by rule or machine learning. Participants are provided a large, common dataset of speech (audio+aligned text transcriptions) and 3D motion to develop their systems, and then use these systems to generate motion on given test inputs. The generated motion clips are rendered onto a common virtual agent and evaluated for aspects such as motion quality and appropriateness in a large-scale crowdsourced user study.
The results of the challenge are presented in hybrid format at the 3rd GENEA Workshop at ICMI 2022, together with individual papers describing each participating system. All accepted challenge papers will be published in the main ACM ICMI 2022 proceedings.
The above motion, audio, and transcriptions have been partitioned into an official training set and an official validation set. Please respect this split, and do not train on validation data when developing your system. (You may only train on the validation data when creating your final submission.) The official validation set was created using the same process as the held-out test set for the challenge, and has similar duration and other characteristics. It is therefore your best guide to what the final, held-out test set will look like.
If the full data release to participants is delayed, dummy data files illustrating the folder structure, filenames, and data formats will be made available. This allows participants to set up their data-processing pipelines in advance of the full data release.
Approximately one week before the deadline to submit generated motion stimuli, participants will also be given access to:
The task of the challenge is to use one’s system to generate convincing gesture motion for this held-out speech, and submit that motion for evaluation. For this reason, we will not provide motion data with the held-out speech. Note that not all of the synthetic motion output submitted to the challenge may be included in the final evaluation.
If, for some reason, you have or gain access to the held-out motion data, we rely on your honesty in not looking at that material or letting it influence your challenge submission.
Each participating team may only submit one system per team for evaluation. Teams can consist of one or more persons, from zero or more academic institutions and/or commercial entities.
Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance and receive approval first. We will try to accommodate all reasonable requests, provided the evaluation remains manageable. If the number of participating teams is small (e.g., less than five), the organisers may decide to permit multiple entries per team.
For data pertaining to text and audio, any external data may be used, as long as they satisfy the criteria above. There is no limitation on the amount of such data you may use.
For motion data (whether 2D, 3D, or video), no external motion data is permitted. The reason for this data restriction is that other behaviour-generation challenges have found that system performance often is limited by the amount of training data that can be ingested, which is not an interesting scientific conclusion to replicate.
Your system must make use of the provided motion data, but you may exclude parts of that data if you wish. Use of the provided audio or text transcripts is entirely optional and not compulsory, as is the use of external text and audio data.
Please keep in mind that the point of the challenge is to gain better insight into the synthesis and perception of motion and gestures, not to see who has the best data and resources. Consequently, participants are strongly encouraged to share processed material they are using in their entries with other participants and with the organisers. Example data that may be valuable to share include: improved transcripts and alignments; motions from permitted external databases converted to the challenge format and retargeted to the challenge skeleton; denoised and reconstructed motion data; sub-selected data; bug fixes to baseline systems; etc.
If you are in any doubt about how to apply these rules, please contact the organisers for clarification!
Synthetic gesture motion must be submitted in the same format as that used by the challenge gesture database (BVH, same skeleton, frame rate, etc.). The organisers take no responsibility for any effects that may occur when processing motion that was not submitted in the correct format.
Manually tweaking the output motion is not allowed, since the idea is to evaluate how systems would perform in an unattended setting.
Any stimuli that you submit for evaluation will be retained by the organisers for future use. The evaluated stimuli and any associated user ratings and comments will also be made publicly available for non-commercial purposes, labelled by the corresponding anonymised system label.
The GENEA Challenge centres on subjective human perception, not objective metrics. A large-scale formal evaluation by means of several user studies will be conducted to jointly evaluate and compare the submitted co-speech gestures. These user studies will be carried out online using crowdsourced raters who speak and comprehend the language featured in the database.The evaluation of the submitted gesture motion will likely consider aspects such as:
The results of the evaluation, including a statistical analysis, will be made public, albeit with the identity of participating systems anonymised. Participating teams will be informed of the results and which system is theirs, so that they can draw conclusions and describe what they learned in papers describing their submissions.
This gesture-generation challenge is a scientific exercise. You may use the results only for the purpose of scientific research. Specifically, you may not use the results (e.g., your team’s ranking in the evaluations) for any commercial purposes, including but not limited to advertising products or services.
The main contact address of the workshop is: firstname.lastname@example.org.