VLSP 2025
VLSP 2025 CHALLENGE ON MEDICAL MACHINE TRANSLATION WITH LIMIT PARAMETER AND RESOURCE USING PRE-TRAIN MODEL
Data Format
Input format: For the parallel data, training, development and public test sets will be provided as UTF-8 plaintexts, 1-to-1 sentence aligned, one “sentence” per line. Notice that “sentence” here is not necessarily a linguistic sentence but maybe phrases. For the monolingual corpora, we provide UTF-8 plaintexts, one “sentence” per line as you would see when you downloaded them Output format: UTF-8, precomposed Unicode plaintexts, one sentence per line. Participants might choose appropriate casing methods in the preprocessing steps: word segmentation, true casing, lowercasing or leaving it all along. You might want to use those tools which are available in the Moses git repository.
Private test format
Link: Private test set
The private test contains 2 files: en.csv, vi.csv . These two files are not translation result of each other. The ground truth contains gold translation of these two files is hidden.
Baseline example
There is an example of what the result file should looks like. The file is created by prompting Qwen3-0.6B for translating. The file is provided in the same folder of private test named results.csv
Evaluation
Final system ranking will be determined by human evaluation.
Participants are allowed to submit only constrained systems.
A constrained system is defined as a system trained exclusively on the data provided by the organizers.
Only constrained submissions will be officially evaluated and ranked.
AIhub will display SacreBLEU scores for submitted systems.
These scores are for reference only and contribute to the overall evaluation.
The final official ranking may differ from the AIhub leaderboard because it also incorporates human evaluation.
Team ranking will be based on a combination of:
Automatic evaluation (SacreBLEU)
Human evaluation
When submitting your system, each team must provide the following metadata through the AIhub submission form (or organizer form):
Team name
Method name
Method description (short summary + prompt used for inference)
Project URL → link to a Google Drive folder containing your final submission package (see Section 2).
The Project URL must point to a Google Drive folder that contains your full system for final evaluation.
This folder must include:
Self-contained: includes all models and dependencies.
Offline: no calls to online services or APIs.
Sends an input text file to your Docker service.
Produces the corresponding translations in an output text file.
Accepts three arguments:
host:port
of the Docker service
Input file path
Output file path
Description of your method (training data, approach, prompts).
Short overview of your system.
Compress everything into a .zip
before uploading to Google Drive.
Provide an MD5 checksum for integrity verification.
Ensure the Google Drive link has download permission enabled.
Total runtime.
Sentences per second, words per second.
✅ Multiple submissions are allowed, but only the last submission will be used for evaluation.
.zip
Submission (Automatic BLEU Scoring)In addition to the Project URL, each team must upload a .zip
file to AIhub.
This .zip
is used only for automatic SacreBLEU evaluation and leaderboard display.
Submit one .zip
file only.
Inside, include exactly one file named:
Do not include extra files.
results.csv
formatTwo columns only:
English → Translation of the original Vietnamese sentences.
Vietnamese → Translation of the original English sentences.
Column names must be exactly:
(case-sensitive, no extra spaces).
Row order must match the original input files exactly.
⚠️ Any deviation (wrong column names, extra files, row mismatch) will cause the evaluation script to fail with KeyError
.
AIhub SacreBLEU scores (from results.csv
) will be displayed on the leaderboard.
Final ranking will be determined by:
Human evaluation
Automatic scoring (SacreBLEU)
Only constrained systems (trained solely on the data provided by organizers) are eligible for final ranking.
Start: July 17, 2025, midnight
Start: Aug. 17, 2025, midnight
Aug. 25, 2025, midnight
You must be logged in to participate in competitions.
Sign In