Call for participation
The state of the art in co-speech gesture generation is difficult to assess, since every research group
tends to use their own data, embodiment, and evaluation methodology. To better understand and compare
methods for gesture generation and evaluation, we are launching a new challenge – the
GENEA (Generation and
Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2020
– wherein different
gesture-generation approaches are evaluated side by side in a large user study. The results of the challenge
will be presented at the GENEA workshop.
We invite researchers in academia and industry working on any form of corpus-based nonverbal behaviour
generation and gesticulation to submit entries to the challenge, whether their method is driven by rule or
machine learning. Participants are provided a large, common dataset of speech (audio+aligned text) and 3D
motion to develop their systems, and then use these systems to generate motion on given test inputs. The
generated motion clips are rendered onto a common virtual agent and evaluated for motion quality and
appropriateness in a crowdsourced user study. Additional details are provided below and in the
rules of
participation.
Registration
The registration deadline has passed
Please make sure to read and agree to the
challenge rules
before registering. Do not
register for the
challenge if you do not intend to comply with the rules.
Data
The challenge data is only available to registered participants. Access to the data requires completing a
license agreement, which will be distributed to participants via e-mail after registering for
participation.
Access to the test inputs will also be provided via e-mail. These materials additionally include a
questionnaire to be completed and included with your submission.
Paper submission and timeline
Challenge participants are invited to describe their systems and findings in a paper to be presented at the
workshop. Accepted papers will be published in proceedings on the workshop website and on Zenodo, however,
they are considered non-archival and can thus be published at other venues.
Papers describing challenge systems use the same template as regular workshop papers, and are subject to the
same length and size restrictions, except that:
- The deadlines are different, to allow time for evaluation before completing the paper. Please refer to
the
important dates for the timeline that applies to
challenge participants.
- Submissions use camera-ready formatting and author names are not blinded for review.
Please contact the organisers at
genea-contact@googlegroups.com if you have any
questions.
Rules for the GENEA Challenge 2020
Only register for this challenge if you actually intend to submit an entry to the challenge and to comply
with all its rules.
Database access
The gesture database is currently only available to registered participants in the challenge. Access to the
data also requires completing and agreeing to the data license agreement.
Download passwords will be issued after your registration is accepted and you have completed the required
licenses.
Materials provided
All participants who have signed the license will be given access to the following materials:
- 3D full-body motion-capture clips of a speaking and gesticulating person, in BVH format.
- Aligned audio waveforms of the speech associated with the motion-capture clips, in WAV format.
- Text transcripts of each audio file with word-level timing information, in JSON format
- Code and scripts for replicating the training of the previously-published baseline systems to be
included in the challenge evaluation (once available).
- A pipeline for visualising their system output as videos of a gesticulating avatar, the same as will
be used to render videos for the challenge evaluation.
Prior to the full data release to participants, dummy data files illustrating the format illustrating the
folder structure, filenames, and data formats will be available. This allows participants to set up their
data-processing pipelines in advance.
Approximately one week before the deadline to submit generated motion stimuli, participants will also be
given access to:
- Held-out audio waveforms from the same source as the training audio, in WAV format.
- Text transcripts of each held-out audio file with word-level timing information, in TXT format.
- A number coding the identity of the person gesticulating in each recording (if the data contains
motion from multiple persons).
The task of the challenge is to use one’s system to generate convincing gesture motion for this held-out
speech. Not all of the synthetic motion output may be included in the final evaluation.
While we endeavour for participants to be able to retain and keep using the challenge data for future
research, the extent to which this is possible (if at all) is governed by the license agreement that
participants sign.
If, for some reason, you have or gain access to the held-out motion data, we rely on your honesty in not
looking at that material or letting it influence your challenge submission.
Limits on participation
Each participating team may only submit one system per team for evaluation. Teams can consist of one or more
persons, from zero or more academic institutions and/or commercial entities.
Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an
individual entry and a joint system) should contact the organisers in advance and receive approval first. We
will try to accommodate all reasonable requests, provided the evaluation remains manageable. If the number
of participating teams is small (e.g., less than five), the organisers may decide to permit multiple entries
per team.
Use of external data
“External data” is defined as data, of any type, that is not part of the provided database. This includes,
for example, raw recordings, structured databases, and pre-trained systems such as word vectors.
For this year's challenge, only open external data – data that is available to the public free of charge
(possibly after signing a license) – may be used.
All external data used in your system must be explicitly listed by providing a citation and/or link in the
paper accompanying your submission.
You are allowed to use external data in any way you wish, subject to any exclusions or limitations given in
these rules.
For data pertaining to text and speech, any external data may be used, as long as they satisfy the criteria
above. There is no limitation on the amount of such data you may use.
For motion data (whether 2D, 3D, or video), only external data from very specific databases may be used in
creating your challenge entries. These resources are linked and listed below:
The reason for this data restriction is that other behaviour-generation challenges have found that system
performance often is limited by the amount of training data that can be ingested, which is not an
interesting scientific conclusion to replicate.
Your system must make use of the provided motion data, but you may exclude parts of that data if you wish.
Use of the provided audio or text transcripts is entirely optional and not compulsory. The same applies to
the use of external data.
Please keep in mind that the point of the challenge is to gain better insight into the synthesis and
perception of motion and gestures, not to see who has the best data and resources. Consequently,
participants are strongly encouraged to share processed material they are using in their entries with other
participants and with the organisers. Example data that may be valuable to share include: improved
transcripts and alignments; motions from permitted external databases converted to the challenge format and
retargeted to the challenge skeleton; denoised and reconstructed motion data; sub-selected data; bug fixes
to baseline systems; etc.
If you are in any doubt about how to apply these rules, please contact the organisers for
clarification!
Synthesising test motion
Synthetic gesture motion must be submitted at 20 fps in a format otherwise identical to that used by the
challenge gesture database (BVH, same skeleton, etc.).
The organisers take no responsibility for any effects that may occur when processing motion that was not
submitted in the correct format
To prevent optimising for the specific evaluation used in the challenge, the exact nature of the test set
will not be revealed in advance. Manually tweaking the output motion is not allowed, since the idea is to
evaluate how systems would perform in an unattended setting.
Retention and distribution of submitted stimuli
Any stimuli that you submit for evaluation will be retained by the organisers for future use. The evaluated
stimuli and any associated user ratings and comments will also be made publicly available for non-commercial
purposes, labelled by the corresponding anonymised system label.
Evaluation
A large formal evaluation by means of a user study will be conducted to jointly evaluate and compare the
submitted co-speech gestures. This user study will be carried out online using crowdsourced raters who speak
and comprehend the language featured in the database.
The evaluation will likely consider aspects such as the human-likeness of the generated gesture motion, its
appropriateness (in terms of timing, semantic content, or both) for the associated held-out speech, and its
appropriateness for the individual gesticulation style of the test speaker or speakers.
Aside from stimuli based on motion submitted by challenge participants, the evaluation will also incorporate
motion generated from a handful of baseline approaches based on public code and shared with challenge
participants; stimuli based on natural speech and motion; and checks on raters’ attention.
The results of the evaluation, including a statistical analysis, will be made public, albeit with the
identity of participating systems anonymised. Participating teams will be informed of the results and which
system is theirs, so that they can draw conclusions and describe what they learned in papers describing
their submissions.
Paper
-
Each participant must submit a paper (using the template specified) describing their entry for
review.
- If you are unable to comply with this requirement, do not enter the challenge!
- Papers should describe the system, as well as:
- external data used, if any (e.g., speech and text corpora, word embeddings, etc.);
- any other existing tools, software and models used;
- any manual interventions such as additional data annotation;
- participants’ scientific and engineering takeaway messages from their participation.
- In addition, describing and analysing the results of other evaluations performed, including formal
and informal tests (e.g., ablations) as part of the system development, is also strongly encouraged.
- Although submitted systems will be anonymised in the challenge results published by the organisers,
participants are encouraged to report which anonymised label is associated with their system in their
paper and any other publications based on their challenge submission.
- Each participant is also expected to complete a form giving the general technical specification of
their system, to facilitate easy cross-system comparisons. (For example Is the motion based on playback
such as motion graphs, continuous motion generation, or a hybrid approach? Is the output deterministic
or stochastic? Does the system use text input, audio input, or both? Does it make use of external motion
data? What computational resources were required to create/train the system? etc.)
- One of the authors of each accepted paper must register and present the paper at the
workshop associated with the challenge.
- If you are unable to comply with this requirement, do not enter the challenge!
- There is no penalty for dropping out of the challenge prior to the start of the evaluation, other than
that the data license might restrict your future use of the challenge data. However, teams whose stimuli
are included in the evaluation are required to submit a paper on their system and present it at the
workshop.
Use of results
This gesture generation challenge is a scientific exercise. You may use the results only for the purpose of
scientific research. Specifically, you may
not use the results (e.g., your team’s ranking in the
evaluations) for any commercial purposes, including but not limited to advertising products or services.
How are these rules enforced?
This is a challenge, which is designed to advance scientific knowledge, and not a competition. The point is
not to find who does best, but what works best. Therefore, we depend on your honesty in preparing your
entry.