AIHUB.ML - Competition

VLSP 2023 ASR/SER Challenge

Organized by vlsp_asr_organizer - Current server time: July 1, 2025, 2:05 a.m. UTC

First phase

Public Test

Sept. 20, 2023, midnight UTC

End

Competition Ends

Nov. 17, 2023, 5 a.m. UTC

Overview
Evaluation
Terms and Conditions
Submission

Our website

https://vlsp.org.vn/vlsp2023/eval/asr

Important dates

Aug 14, 2023: Registration opens
Sept 04, 2023: Registration closes
Sept 06, 2023: Dataset building starts
Sept 17, 2023: Dataset building ends
Sept 20, 2023: Training data and public test data released
Nov 17, 2023: Private test set released
Nov 17, 2023: Test result submission
Nov 26, 2023: Technical report submission
Dec 15-16, 2023: Result announcement - Workshop days

General Description

The challenge focuses on a full pipeline development of the ASR model and SER model from scratch with limited training data conditions.
Two emotion categories are “neutral” and “negative”. The organizer will provide 4 different training datasets (released by TLU and NamiTech).

Data overview:

Dataset	Amount (hours)	Text Label	Emotion Label
Dataset 1	200	No	No
Dataset 2	60	Yes	No
Dataset 3	5	No	Yes
Dataset 4	40	No	Yes (Low quality)

Evaluation metrics

For each given utterance, two outputs will be submitted.

Text sequence (ASR output)
Emotion label (Emotion output)

The quality of the models will be evaluated by the Syllable Error Rate (SyER_ASR) and Emotion Recognition Accuracy (ACC_SER) metrics.

SyER_ASR = (S+D+I)/N, where

S is the number of substitutions,
D is the number of deletions,
I is the number of insertions,
C is the number of correct syllables,
N is the number of syllables in the reference (N=S+D+C)

ACC_SER = (NEU_Corr/NEU + NEG_Corr/NEG)/2, where

NEU_Corr is the number of correct neutral emotion utterances
NEU is the number of total neutral utterances
NEG_Corr is the number of correct negative emotion utterances
NEG is the number of total negative utterances.

Score = 0.7*(1-SyER_ASR) + 0.3*ACC_SER

- no title specified

General rules

Right to cancel, modify, or disqualify. The Competition Organizer reserves the right at its sole discretion to terminate, modify, or suspend the competition.
By submitting results to this competition, you consent to the public release of your scores at the Competition workshop and in the associated proceedings, at the task organizers' discretion. Scores may include but are not limited to, automatic and manual quantitative judgments, qualitative judgments, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
By joining the competition, you accepted to the terms and conditions of Agreement form of VLSP 2023 - ASR/SER, which has been sent to your email. It is noted that your participant rights will be revoked if you do not sign and send back to us before the deadline.
By joining the competition, you affirm and acknowledge that you agree to comply with applicable laws and regulations, and you may not infringe upon any copyrights, intellectual property, or patent of another party for the software you develop in the course of the competition, and will not breach of any applicable laws and regulations related to export control and data privacy and protection.
Prizes are subject to the Competition Organizer’s review and verification of the entrant’s eligibility and compliance with these rules as well as the compliance of the winning submissions with the submission requirements.

Eligibility

Each participant must create a CodaLab account to submit their solution for the competition. Only one account per user is allowed.
The competition is public, but the Competition Organizer may elect to disallow participation according to its own considerations.
The Competition Organizer reserves the right to disqualify any entrant from the competition if, in the Competition Organizer’s sole discretion, it reasonably believes that the entrant has attempted to undermine the legitimate operation of the competition through cheating, deception, or other unfair playing practices.

Team

Participants are allowed to form teams. The maximum of the number of participants on the team is up to 5.
You may not participate in more than one team. Each team member must be a single individual operating a separate CodaLab account.

Submission

Maximum number of submissions in each phase:
- Phase 1 - Public Test: 1 submissions / day / team
- Phase 2 - Private Test: total 2 submissions / team.
Submissions are void if they are in whole or part illegible, incomplete, damaged, altered, counterfeit, obtained through fraudulent means, or late. The Competition Organizer reserves the right, in its sole discretion, to disqualify any entrant who makes a submission that does not adhere to all requirements.

Data

By downloading or by accessing the data provided by the Competition Organizer in any manner you agree to the following terms:

You will not distribute the data except for the purpose of non-commercial and academic-research.
You will not distribute, copy, reproduce, disclose, assign, sublicense, embed, host, transfer, sell, trade, or resell any portion of the data provided by the Competition Organizer to any third party for any purpose.
The data must not be used for providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose.
You accept full responsibility for your use of the data and shall defend and indemnify the Competition Organizer, against any and all claims arising from your use of the data.

Submission Guidelines

Submission Format

The result to upload to AI Hub is a zip file containing single file "results.tsv" that conforms bellow the specifications.

Submissions have to be made in UTF-8, lower-case and one line for each utterance/audio.

utterance_name<TAB>emotion_label<TAB>recognized_text_sequence

For example

0001.wav neutral chào mừng các bạn đã tham dự cuộc thi
0002.wav negative tôi sẽ kiện ra toà

Output Conventions

Since there are cases that input speech can be interpreted in different ways, the below rules are applied to mitigate such an issue:

Numbers, dates etc. need to be transcribed in words as they are spoken, not in digits
Common acronyms such as nato, fifa, are written as one word, without any special markers between the letters. This applies no matter whether they are spoken as one word or spelled out as a letter sequence. All other letter spelling sequences are written as individual letters with space in between.
For English words, names of people and places in other languages such as youtube, facebook, are written as it, not in Vietnamese pronunciation.

Public Test

Start: Sept. 20, 2023, midnight

Private Test

Start: Nov. 17, 2023, 2 a.m.

Competition Ends

Nov. 17, 2023, 5 a.m.

You must be logged in to participate in competitions.