Automatic Speech Recognition Challenge

Organized by vuhl - Current server time: July 1, 2025, 11:44 a.m. UTC

Public Test

April 1, 2025, midnight UTC

Current

Private Test

May 10, 2025, midnight UTC

End

Competition Ends

Dec. 31, 2025, midnight UTC

Overview
Evaluation
Terms and Conditions

Automatic Speech Recognition Challenge - Overview

Automatic Speech Recognition Challenge

Overview

Welcome to the Automatic Speech Recognition (ASR) Challenge! This competition focuses on developing and evaluating speech recognition systems that can accurately transcribe Vietnamese speech into text.

Challenge Description

Automatic Speech Recognition is a technology that converts spoken language into written text. It has numerous applications including voice assistants, transcription services, accessibility tools, and more.

In this challenge, participants will develop systems that:

Process Vietnamese speech audio files
Generate accurate text transcriptions
Handle various acoustic conditions and speaking styles

Challenge Phases

Public Test Phase: Participants submit their transcriptions on the public test set to get feedback.
Private Test Phase: Final evaluation on the private test set to determine the winners.

Evaluation

Systems will be evaluated using the Word Error Rate (WER) metric, which measures the minimum number of word edits (insertions, deletions, and substitutions) required to transform the system's output into the reference transcript, divided by the number of words in the reference. Lower WER values indicate better performance.

Contact

For questions or support, please use the competition forum or contact the organizers at [email protected], cc-ing [email protected], [email protected].

Automatic Speech Recognition Challenge - Evaluation

Evaluation Criteria

Metric: Word Error Rate (WER)

The primary evaluation metric for this challenge is the Word Error Rate (WER), expressed as a percentage (%). WER measures how accurately your ASR system transcribes speech into text.

Understanding WER

Word Error Rate is calculated as:

WER = (S + D + I) / N × 100%

Where:

S: Number of substitutions (incorrect words)
D: Number of deletions (missing words)
I: Number of insertions (extra words)
N: Total number of words in the reference transcript

Lower WER values indicate better performance. A perfect system would have a WER of 0%, meaning the transcription exactly matches the reference.

We use the following implementation to calculate WER: https://gist.github.com/lggvu/de4c258fcba019a844f10820ea46ca01

Submission Format

Your submission should be a ZIP file (any name), encapsulating a single text file named transcripts.txt containing transcriptions for each audio file. Each line should only contain one column for the transcript, with the same order as the provided test list. (The test list is, for example, promptst.txt in the public test phase).

For example, if the first line of the test list is "audio1.wav", then the first line in your submitted prediction file should be the prediction for that audio.

Text Normalization

Before calculating WER, the following text normalization steps are applied to both reference and hypothesis transcriptions:

Converting to lowercase
Removing extra spaces

Evaluation Process

The evaluation process works as follows:

Your submitted transcriptions are compared with the ground truth transcriptions.
For each audio file, the WER is calculated between your transcription and the reference.
The final score is the average WER across all audio files.
The WER percentage is reported on the leaderboard (lower is better).

Example

For example, if the reference transcription is:

xin chào việt nam

And your system's transcription is:

xin chào việt nam hôm nay

The WER calculation would be:

Substitutions (S): 0
Deletions (D): 0
Insertions (I): 2 (the words "hôm" and "nay")
Reference length (N): 4
WER = (0 + 0 + 2) / 4 × 100% = 50%

Ranking

Participants will be ranked based on their WER score, with lower values being better. In case of ties, earlier submissions will be ranked higher.

Automatic Speech Recognition Challenge - Terms and Conditions

Terms and Conditions

Participation Rules

Participation in this challenge is open to individuals and teams worldwide.
Teams can consist of up to 5 members.
Each participant may be a member of only one team.
Participants must register for the challenge before making submissions.
The organizers reserve the right to disqualify any participant who violates these terms or engages in unethical behavior.

Submission Guidelines

All submissions must be made through the challenge platform.
Participants are limited to the maximum number of submissions specified for each phase.
Submissions must follow the required format as described in the evaluation guidelines.
Participants must not attempt to reverse-engineer the test set or ground truth data.
Manual transcription of test audio is strictly prohibited.

Data Usage

The dataset provided for this challenge may only be used for participating in this competition.
Participants are allowed to use additional external data for training their models, but this must be clearly documented.
Redistribution of the challenge dataset is strictly prohibited.
After the competition, participants may use the dataset for research purposes and must cite the dataset appropriately.

Intellectual Property

Participants retain ownership of their submissions and the intellectual property rights to their methods.
By submitting to the challenge, participants grant the organizers a non-exclusive, worldwide, royalty-free license to use their submissions for evaluating and presenting the challenge results.
Participants agree that the organizers may publish their team name, member names, and performance results.

Publication and Recognition

The organizers plan to publish a summary of the challenge results, including the methods used by top-performing teams.
Top-performing teams may be invited to present their methods at a related workshop or conference.
Participants are encouraged to publish their methods, citing the challenge appropriately.

Privacy

Personal information provided during registration will be used solely for the purposes of the challenge and will not be shared with third parties.
Participants' names and affiliations may be published on the challenge leaderboard and in challenge-related publications.

Disclaimer

The challenge organizers reserve the right to modify these terms and conditions at any time. Participants will be notified of any changes. The decisions of the challenge organizers regarding any aspect of the competition are final.

Contact

For questions or clarifications regarding these terms, please contact the challenge organizers at [email protected], cc-ing [email protected], [email protected].

Public Test

Start: April 1, 2025, midnight

Private Test

Start: May 10, 2025, midnight

Competition Ends

Dec. 31, 2025, midnight

You must be logged in to participate in competitions.

#	Username	Score
1	thanhnd	35.79
2	pvhoangGG	36.13
3	duckie	36.67

Competition