AIHUB.ML - Competition

VLSP 2025 – viTempQA Duration Challenge - Vietnamese Temporal QA

Organized by toanlnhus - Current server time: Feb. 9, 2026, 10:34 p.m. UTC

First phase

Public Test

July 10, 2025, midnight UTC

End

Competition Ends

Oct. 10, 2025, 5 p.m. UTC

Overview
Evaluation
Terms and Conditions

VLSP 2022 – viTempQA Duration Challenge: Vietnamese Temporal Question Answer

Shared Task Registration Form

Important dates

June 23, 2025: Registration open
July 6, 2025: Training data release
July 15, 2025: Public test release
August 20, 2025: System submission deadline
August 30, 2025: Private test results release
September 10, 2025: Technical report submission
September 27, 2025: Notification of acceptance
October 3, 2025: Camera-ready deadline
October 29-30, 2025: Conference dates

Task Description

Objective: Build a system to answer temporal questions in Vietnamese across three sub-tasks: Date Arithmetic (date-arith), Duration Question Answering (durationQA). The system must extract and reason about temporal information to provide accurate answers related to dates, durations, and temporal relationships.

Sub-Task 2: Duration Question Answering (durationQA)
Description: Answer questions about the duration of events or actions based on a given context. The system must extract duration-related information from text and use real-world knowledge to evaluate answer options, determining how long an event or action lasts.
Focus: Identify explicit or implied durations in the context (e.g., "6 years") and apply real-world reasoning to classify options as correct ("yes") or incorrect ("no") based on factual accuracy.

Evaluation

System performance will be evaluated using a range of standard metrics, including Accuracy, Exact Match, Precision, Recall, and F1-score:

Evaluation Metrics

Exact Match: Used for Sub-Task 2 (DurationQA). It evaluates whether the predicted label sequence matches exactly the ground-truth label sequence.
Precision: Ratio of correctly predicted "yes" answers to total "yes" predictions made by the system.
Recall: Ratio of correctly predicted "yes" answers to total actual "yes" answers in the ground truth.
F1-score: Harmonic mean of Precision and Recall, summarizing overall performance.

Evaluation is performed separately for each sub-task. The final evaluation report includes individual scores as well as aggregate performance across all tasks.

Example for Sub-Task 1: Date Arithmetic

Input:

{

"question": "Thời gian 1 năm và 2 tháng trước tháng 6, 1297 là khi nào?",

"context": "",

"answer": ["Tháng 4, 1296"]

}

System Prediction:

["Tháng 4, 1296"]

Accuracy: The prediction matches the ground-truth exactly.
→ Accuracy = 1.0

Example for Sub-Task 2: Duration Question Answering

Input:

{

"context": "Tôi đang sửa chữa chiếc xe đạp bị hỏng.",

"options": ["30 phút", "1 tháng", "10 phút", "2 giờ"],

"qid": 54,

"question": "Mất thời gian bao lâu để sửa chữa chiếc xe đạp?" }

"labels": ["yes", "no", "yes", "yes"],

System Prediction:

["yes", "no", "no", "yes"]

Metric Calculation:

Exact Match: System prediction ≠ ground truth.
→ Exact Match = 0.0
Precision: 2 correct "yes" predictions out of 2 total "yes" predictions.
→ Precision = 2 / 2 = 1.0
Recall: 2 correct "yes" predictions out of 3 actual "yes" in ground truth.
→ Recall = 2 / 3 ≈ 0.6667
F1-score: Harmonic mean of precision and recall.
→ F1 = 2 × (1.0 × 0.6667) / (1.0 + 0.6667) ≈ 0.8

References

Chu, Zheng, et al. "Timebench: A comprehensive evaluation of temporal reasoning abilities in large language models." arXiv preprint arXiv:2311.17667 (2023).
Tan, Qingyu, Hwee Tou Ng, and Lidong Bing. "Towards benchmarking and improving the temporal reasoning capability of large language models." arXiv preprint arXiv:2306.08952 (2023).
Virgo, Felix, Fei Cheng, and Sadao Kurohashi. "Improving event duration question answering by leveraging existing temporal information extraction data." Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022.

Shared Task Registration Form

Public Test

Start: July 10, 2025, midnight

Private Test

Start: Oct. 2, 2025, midnight

Competition Ends

Oct. 10, 2025, 5 p.m.

You must be logged in to participate in competitions.