GENEA workshop 2020

Generation and Evaluation of Non-verbal Behaviour for Embodied Agents

The GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Workshop 2020 aims to bring together researchers working on the generation and evaluation of nonverbal behaviour for social robots, virtual agents, or the like. We invite all interested researchers to submit a paper related to their work in the area and to participate in the workshop.

Attached to the workshop is a challenge to assess and advance the state of the art in co-speech gesture generation, which is detailed here.

The workshop will take place online on Sunday October 18, 2020, staring at 11 GMT, as an official workshop of ACM IVA’20. Please see the workshop programme here.

We thank Vicon for offering to sponsor our data collection, which unfortunately did not happen due to the pandemic lockdown.

Please follow this link to learn more about the GENEA Initiative.

Tweets by WorkshopGenea

Important dates

Timeline for regular workshop submissions:

~~14th Sept~~	~~Deadline for participants to submit workshop papers~~
16th Sept	Extended Deadline for participants to submit workshop papers
2nd Oct	Notification of acceptance for workshop papers
11th Oct	Deadline for camera-ready papers
18th Oct	Workshop date

Timeline for challenge participants:

23rd June	Deadline for registration for the challenge
1st July	Challenge dataset released to participants
7th Aug	Test inputs released to participants
15th Aug	Deadline for participants to submit generated motion
21st Aug	Crowdsourced evaluation begins
5th Sept	Crowdsourced evaluation closes
9th Sept	Evaluation results released to participants
19th Sept	Deadline for participants to submit system-presentation papers
1st Oct	Notification of acceptance for challenge papers
11th Oct	Deadline for camera-ready papers
18th Oct	Workshop date

Call for papers

Overview

Generating nonverbal behaviour, such as gesticulation and facial expressions, is of great importance for natural interaction with embodied agents such as virtual agents and social robots. At present, behaviour generation is typically powered by rule-based systems, data-driven approaches, and their hybrids. For evaluation, both objective and subjective methods exist, but their application and validity are frequently a point of contention in peer review.

This workshop asks “What will be the behaviour-generation methods of the future? And how can we evaluate these as meaningfully and usefully as possible?” The aim is to bring together researchers working on the generation and evaluation of nonverbal behaviour for embodied agents to discuss the future of these fields. To kickstart these discussions, we invite all interested researchers to submit a paper related to their work in the area for presentation at the workshop.

Attached to this workshop is a challenge to assess and advance the state of the art in co-speech gesture generation, for which a separate call for participation will be released.
The GENEA Workshop is not an archival venue, and papers submitted to the workshop can thus be published at other venues.

Paper topics include (but are not limited to) the following

Co-speech gesture generation
Nonverbal feedback
Interactive nonverbal behaviour generation
Evaluation methods for generated nonverbal behaviour
Objective evaluation metrics for nonverbal behaviour
Guidelines for nonverbal behaviours in human-agent interaction

Author instructions

Please format your workshop submissions for double-blind review according to the conference template: genea_template.zip
The same rules and guidelines regarding double-blind reviewing as for the main IVA conference apply. Submissions should represent original, unpublished work or extensions thereof.

Papers are limited to 5 pages, not counting references and full-page figures. The maximum file size is 20 MB. Anonymous video sharing through FigShare is strongly encouraged. Submissions should be made in PDF format through OpenReview:
https://openreview.net/group?id=ACM.org/IVA/2020/Workshop/GENEA

Please contact the organisers at genea-contact@googlegroups.com if you have any questions.

Gesture generation challenge

Call for participation

The state of the art in co-speech gesture generation is difficult to assess, since every research group tends to use their own data, embodiment, and evaluation methodology. To better understand and compare methods for gesture generation and evaluation, we are launching a new challenge – the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2020 – wherein different gesture-generation approaches are evaluated side by side in a large user study. The results of the challenge will be presented at the GENEA workshop.

We invite researchers in academia and industry working on any form of corpus-based nonverbal behaviour generation and gesticulation to submit entries to the challenge, whether their method is driven by rule or machine learning. Participants are provided a large, common dataset of speech (audio+aligned text) and 3D motion to develop their systems, and then use these systems to generate motion on given test inputs. The generated motion clips are rendered onto a common virtual agent and evaluated for motion quality and appropriateness in a crowdsourced user study. Additional details are provided below and in the rules of participation.

Registration

The registration deadline has passed
Please make sure to read and agree to the challenge rules before registering. Do not register for the challenge if you do not intend to comply with the rules.

Data

The challenge data is only available to registered participants. Access to the data requires completing a license agreement, which will be distributed to participants via e-mail after registering for participation.

Access to the test inputs will also be provided via e-mail. These materials additionally include a questionnaire to be completed and included with your submission.

Paper submission and timeline

Challenge participants are invited to describe their systems and findings in a paper to be presented at the workshop. Accepted papers will be published in proceedings on the workshop website and on Zenodo, however, they are considered non-archival and can thus be published at other venues.

Papers describing challenge systems use the same template as regular workshop papers, and are subject to the same length and size restrictions, except that:

The deadlines are different, to allow time for evaluation before completing the paper. Please refer to the important dates for the timeline that applies to challenge participants.
Submissions use camera-ready formatting and author names are not blinded for review.

Please contact the organisers at genea-contact@googlegroups.com if you have any questions.

Rules for the GENEA Challenge 2020

Only register for this challenge if you actually intend to submit an entry to the challenge and to comply with all its rules.

Database access

The gesture database is currently only available to registered participants in the challenge. Access to the data also requires completing and agreeing to the data license agreement.
Download passwords will be issued after your registration is accepted and you have completed the required licenses.

Materials provided

All participants who have signed the license will be given access to the following materials:

3D full-body motion-capture clips of a speaking and gesticulating person, in BVH format.
Aligned audio waveforms of the speech associated with the motion-capture clips, in WAV format.
Text transcripts of each audio file with word-level timing information, in JSON format
Code and scripts for replicating the training of the previously-published baseline systems to be included in the challenge evaluation (once available).
A pipeline for visualising their system output as videos of a gesticulating avatar, the same as will be used to render videos for the challenge evaluation.

Prior to the full data release to participants, dummy data files illustrating the format illustrating the folder structure, filenames, and data formats will be available. This allows participants to set up their data-processing pipelines in advance.

Approximately one week before the deadline to submit generated motion stimuli, participants will also be given access to:

Held-out audio waveforms from the same source as the training audio, in WAV format.
Text transcripts of each held-out audio file with word-level timing information, in TXT format.
A number coding the identity of the person gesticulating in each recording (if the data contains motion from multiple persons).

The task of the challenge is to use one’s system to generate convincing gesture motion for this held-out speech. Not all of the synthetic motion output may be included in the final evaluation.
While we endeavour for participants to be able to retain and keep using the challenge data for future research, the extent to which this is possible (if at all) is governed by the license agreement that participants sign.
If, for some reason, you have or gain access to the held-out motion data, we rely on your honesty in not looking at that material or letting it influence your challenge submission.

Limits on participation

Each participating team may only submit one system per team for evaluation. Teams can consist of one or more persons, from zero or more academic institutions and/or commercial entities.

Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance and receive approval first. We will try to accommodate all reasonable requests, provided the evaluation remains manageable. If the number of participating teams is small (e.g., less than five), the organisers may decide to permit multiple entries per team.

Use of external data

“External data” is defined as data, of any type, that is not part of the provided database. This includes, for example, raw recordings, structured databases, and pre-trained systems such as word vectors. For this year's challenge, only open external data – data that is available to the public free of charge (possibly after signing a license) – may be used. All external data used in your system must be explicitly listed by providing a citation and/or link in the paper accompanying your submission. You are allowed to use external data in any way you wish, subject to any exclusions or limitations given in these rules. For data pertaining to text and speech, any external data may be used, as long as they satisfy the criteria above. There is no limitation on the amount of such data you may use.

For motion data (whether 2D, 3D, or video), only external data from very specific databases may be used in creating your challenge entries. These resources are linked and listed below:

CMU Motion Capture Database: http://mocap.cs.cmu.edu/
Motion Capture Database HDM05: http://resources.mpi-inf.mpg.de/HDM05/
CMU Panoptic Studio dataset: http://domedb.perception.cs.cmu.edu/
Talking With Hands 16.2M: https://github.com/facebookresearch/TalkingWithHands32M Important: Motion data only – no speech/text/audio!

The reason for this data restriction is that other behaviour-generation challenges have found that system performance often is limited by the amount of training data that can be ingested, which is not an interesting scientific conclusion to replicate.

Your system must make use of the provided motion data, but you may exclude parts of that data if you wish. Use of the provided audio or text transcripts is entirely optional and not compulsory. The same applies to the use of external data.

Please keep in mind that the point of the challenge is to gain better insight into the synthesis and perception of motion and gestures, not to see who has the best data and resources. Consequently, participants are strongly encouraged to share processed material they are using in their entries with other participants and with the organisers. Example data that may be valuable to share include: improved transcripts and alignments; motions from permitted external databases converted to the challenge format and retargeted to the challenge skeleton; denoised and reconstructed motion data; sub-selected data; bug fixes to baseline systems; etc.

If you are in any doubt about how to apply these rules, please contact the organisers for clarification!

Synthesising test motion

Synthetic gesture motion must be submitted at 20 fps in a format otherwise identical to that used by the challenge gesture database (BVH, same skeleton, etc.). The organisers take no responsibility for any effects that may occur when processing motion that was not submitted in the correct format

To prevent optimising for the specific evaluation used in the challenge, the exact nature of the test set will not be revealed in advance. Manually tweaking the output motion is not allowed, since the idea is to evaluate how systems would perform in an unattended setting.

Retention and distribution of submitted stimuli

Any stimuli that you submit for evaluation will be retained by the organisers for future use. The evaluated stimuli and any associated user ratings and comments will also be made publicly available for non-commercial purposes, labelled by the corresponding anonymised system label.

Evaluation

A large formal evaluation by means of a user study will be conducted to jointly evaluate and compare the submitted co-speech gestures. This user study will be carried out online using crowdsourced raters who speak and comprehend the language featured in the database.

The evaluation will likely consider aspects such as the human-likeness of the generated gesture motion, its appropriateness (in terms of timing, semantic content, or both) for the associated held-out speech, and its appropriateness for the individual gesticulation style of the test speaker or speakers.

Aside from stimuli based on motion submitted by challenge participants, the evaluation will also incorporate motion generated from a handful of baseline approaches based on public code and shared with challenge participants; stimuli based on natural speech and motion; and checks on raters’ attention.

The results of the evaluation, including a statistical analysis, will be made public, albeit with the identity of participating systems anonymised. Participating teams will be informed of the results and which system is theirs, so that they can draw conclusions and describe what they learned in papers describing their submissions.

Paper

Each participant must submit a paper (using the template specified) describing their entry for review.
- If you are unable to comply with this requirement, do not enter the challenge!
Papers should describe the system, as well as:
- external data used, if any (e.g., speech and text corpora, word embeddings, etc.);
- any other existing tools, software and models used;
- any manual interventions such as additional data annotation;
- participants’ scientific and engineering takeaway messages from their participation.
- In addition, describing and analysing the results of other evaluations performed, including formal and informal tests (e.g., ablations) as part of the system development, is also strongly encouraged.
Although submitted systems will be anonymised in the challenge results published by the organisers, participants are encouraged to report which anonymised label is associated with their system in their paper and any other publications based on their challenge submission.
Each participant is also expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons. (For example Is the motion based on playback such as motion graphs, continuous motion generation, or a hybrid approach? Is the output deterministic or stochastic? Does the system use text input, audio input, or both? Does it make use of external motion data? What computational resources were required to create/train the system? etc.)
One of the authors of each accepted paper must register and present the paper at the workshop associated with the challenge.
- If you are unable to comply with this requirement, do not enter the challenge!
There is no penalty for dropping out of the challenge prior to the start of the evaluation, other than that the data license might restrict your future use of the challenge data. However, teams whose stimuli are included in the evaluation are required to submit a paper on their system and present it at the workshop.

Use of results

This gesture generation challenge is a scientific exercise. You may use the results only for the purpose of scientific research. Specifically, you may not use the results (e.g., your team’s ranking in the evaluations) for any commercial purposes, including but not limited to advertising products or services.

How are these rules enforced?

This is a challenge, which is designed to advance scientific knowledge, and not a competition. The point is not to find who does best, but what works best. Therefore, we depend on your honesty in preparing your entry.

Workshop programme:

11:00 GMT	Opening statement
11:10 GMT	Live keynote presentation by Stefan Kopp (40 min) + Q&A (10 min)
12:00 GMT	"Coffee" break - on Gather!
12:20 GMT	Introduction to the GENEA Challenge (15 min) + Q&A (5 min)
12:40 GMT	Challenge system presentations (10 min each) + Q&A (3 min) Nectec FineMotion StyleGestures Edinburgh CGVU AlltheSmooth
13:45 GMT	Panel discussion
14:05 GMT	Break! (Gather is on for those who want)
14:20 GMT	Workshop paper presentations James Pustejovsky (10 min) + Q&A (5 min)
14:35 GMT	Open discussion on the future of the field
14:55 GMT	Closing statement
15:00 GMT	End of the workshop; Gather remains open

Workshop recording policy

The GENEA Workshop 2020 will be virtual and take place on Zoom and Gather. It will consist of a number of pre-recorded video presentations, along with informal discussions and networking in various constellations. Although the video presentations will be posted and preserved online on our YouTube channel, no other parts of the workshop will be recorded. If you want to take part in the discussions or just listen in, please join us online on the date!

Invited speakers

Stefan Kopp

Leaving adolescence? — Lessons, challenges, and perspectives in multimodal behavior generation after 25 years of research

Stefan Kopp The field of multimodal behavior generation for embodied agent is now more than 25 years old. Starting with animating the conversational behavior of virtual agents, the field has matured by exploring a variety of modalities, technologies, and systems. We have seen ambitious attempts, successes and failures, as well as technological revolutions and forking paths. Yet, the field still does not have a consolidated, common understanding of what we can generate (or not yet), how can do it best technically, or how humans respond to our agents and what we should thus optimize for. I will discuss own and others' work on building multimodal conversational agents (virtual or robot) to point out lessons, challenges, and perspectives for the future development of our field. I will argue that, much like at the end of adolescence, the field needs to learn how to consolidate its experiences, reconcile different approaches and research motivations, identify its criteria of success, and interact with the outside world.

Organising committee

The main contact address of the workshop is: genea-contact@googlegroups.com.

Workshop organisers

	Taras Kucherenko KTH Royal Institute of Technology Sweden		Gustav Eje Henter KTH Royal Institute of Technology Sweden
	Pieter Wolfert IDLab, Ghent University - imec Belgium		Youngwoo Yoon ETRI & KAIST South Korea
	Patrik Jonell KTH Royal Institute of Technology Sweden

Data and proceedings

Workshop materials

The GENEA Workshop is not an archival venue. That said, most of the material from the workshop has been made permanently available online in a number of locations:

Conference proceedings are available in the the GENEA 2020 Zenodo community. This repository also holds most other challenge materials, such as submitted motion, video stimuli, their subjective ratings, and analysis code.
Challenge training and test data can be found in the Trinity Speech-Gesture Dataset repository owing to licensing agreements.
Code for visualising gesture motion used to generate the video stimuli in the challenge is provided on GitHub.
Presentations from the workshop are available on the GENEA Workshop YouTube channel.
Code for computing the numerical evaluation metrics used in the challenge is also on GitHub.
A paper published at the 26th Annual Conference on Intelligten User Interfaces (IUI) describing the challenge: A Large, Crowdsourced Evaluation of Gesture Generation Systems on Common Data: The GENEA Challenge 2020