AIHUB.ML - Competition

VLSP 2025 – viTempQA Date-arith Challenge - Vietnamese Temporal QA Date-Arith

Organized by toanlnhus - Current server time: Feb. 7, 2026, 10:52 p.m. UTC

First phase

Public Test

July 10, 2025, midnight UTC

End

Competition Ends

Oct. 10, 2025, 5 p.m. UTC

Overview
Evaluation
Terms and Conditions

VLSP 2022 – viTempQA Date-Arith Challenge: Vietnamese Temporal Question Answer

Shared Task Registration Form

Important dates

June 23, 2025: Registration open
July 6, 2025: Training data release
July 15, 2025: Public test release
August 20, 2025: System submission deadline
August 30, 2025: Private test results release
September 10, 2025: Technical report submission
September 27, 2025: Notification of acceptance
October 3, 2025: Camera-ready deadline
October 29-30, 2025: Conference dates

Task Description

Objective: Build a system to answer temporal questions in Vietnamese across three sub-tasks: Date Arithmetic (date-arith), Duration Question Answering (durationQA). The system must extract and reason about temporal information to provide accurate answers related to dates, durations, and temporal relationships.

Sub-Task 1: Date Arithmetic (date-arith)
Description: The date-arith sub-task focuses on handling questions related to date calculations, such as adding or subtracting time intervals from a given date. This involves understanding and manipulating time expressions to compute answers based on the provided context.
Focus: Parse and manipulate temporal expressions to compute new dates.

Evaluation

System performance will be evaluated using a range of standard metrics, including Accuracy, Exact Match, Precision, Recall, and F1-score:

Evaluation Metrics

Accuracy: Used for Sub-Task 1 (Date Arithmetic). It is the percentage of system answers that exactly match the ground-truth answers.

Evaluation is performed separately for each sub-task. The final evaluation report includes individual scores as well as aggregate performance across all tasks.

Example for Sub-Task 1: Date Arithmetic

Input:

{

"question": "Thời gian 1 năm và 2 tháng trước tháng 6, 1297 là khi nào?",

"context": "",

"answer": ["Tháng 4, 1296"]

}

System Prediction:

["Tháng 4, 1296"]

Accuracy: The prediction matches the ground-truth exactly.
→ Accuracy = 1.0

References

Chu, Zheng, et al. "Timebench: A comprehensive evaluation of temporal reasoning abilities in large language models." arXiv preprint arXiv:2311.17667 (2023).
Tan, Qingyu, Hwee Tou Ng, and Lidong Bing. "Towards benchmarking and improving the temporal reasoning capability of large language models." arXiv preprint arXiv:2306.08952 (2023).
Virgo, Felix, Fei Cheng, and Sadao Kurohashi. "Improving event duration question answering by leveraging existing temporal information extraction data." Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022.

Shared Task Registration Form

Public Test

Start: July 10, 2025, midnight

Private Test

Start: Oct. 2, 2025, midnight

Competition Ends

Oct. 10, 2025, 5 p.m.

You must be logged in to participate in competitions.