https://vlsp.org.vn/vlsp2022/eval/abmusu
Note: All deadlines are 11:59PM UTC-00:00 (~ 6:59AM next day Indochina Time (ICT), UTC+07:00).
Introduction
In the era of information explosion mining data effectively has huge potential but is a difficult problem which takes time, money and labour effort. Multi-document summarization is one of the natural language processing tasks that is useful for solving this problem. Receiving the set of documents as input, the summarization system aims to select or generate important information to create a brief summary for these documents [1]. It is a complex problem that has gained attention from the research community. Several competitions have been launched in recent years to support research and development in this field for English, such as DocEng 2019 [2] and BioNLP-MEDIQA 2021 [3], ect.
Based on output characteristics, there are two major approaches for automatic summarization, i.e, extractive and abstractive summarization. Extractive summarization tends to select the most crucial sentences (sections) from the documents while abstractive summarization tries to rewrite a new summary based on the original important information [4]. From the early 1950s, various methods have been proposed for extractive summarization ranging from frequency-based methods [5] to machine learning-based methods [6]. The extractive methods are fast and simple but the summaries are far from the manual-created summary, which can be remedied with the abstractive approach [7]. In the multi-document problem, extractive approaches show significant disadvantages in arranging and combining information from several documents. In recent years, sequence-to-sequence learning (seq2seq) makes abstractive summarization possible [8]. A set of models based on encoder-decoder such as PEGASUS [9], BART [10], T5 [11] achieves potential results for abstractive multi-document summarization. Studies on this issue in Vietnamese texts are still in the early phases. Therefore, this shared task is proposed to promote the development of research on abstractive multi-document summarization for Vietnamese text.
The goal of the Vietnamese abstractive multi-document summarization task (AbMusu task) is to develop summarization systems that could create abstractive summaries automatically for a set of documents on a topic. The input is multiple news documents on the same topic, and the output of the model is a related abstractive summary. This task focuses on Vietnamese news summarization.
To contact us, mail to: [email protected]
The official evaluation measures are the ROUGE-2 scores and ROUGE-2 F1 is the main score for ranking. ROUGE-2 Recall (R), Precision (P) and F1 between predicted summary and reference summary are calculated with formulas [12]:
ROUGE-2 P = |Matched N-grams| / |Predict summary N-grams|
ROUGE-2 R = |Matched N-grams| / |Reference summary N-grams|
ROUGE-2 F = (2 x ROUGE-2 P x ROUGE-2 F) / (ROUGE-2 P + ROUGE-2 F)
ROUGE 2 will be used as the main metric to rank the participating teams [13], but we will also use several evaluation metrics more adapted to each task such as ROUGE 1 and ROUGE L. BERT score could be provided for the top submissions after private phrase.
Right to cancel, modify, or disqualify. The Competition Organizer reserves the right at its sole discretion to terminate, modify, or suspend the competition.
By submitting results to this competition, you consent to the public release of your scores at the Competition workshop and in the associated proceedings, at the task organizers' discretion. Scores may include but are not limited to, automatic and manual quantitative judgments, qualitative judgments, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
By joining the competition, you affirm and acknowledge that you agree to comply with applicable laws and regulations, and you may not infringe upon any copyrights, intellectual property, or patent of another party for the software you develop in the course of the competition, and will not breach of any applicable laws and regulations related to export control and data privacy and protection.
Prizes are subject to the Competition Organizer’s review and verification of the entrant’s eligibility and compliance with these rules as well as the compliance of the winning submissions with the submission requirements.
Participants grant to the Competition Organizer the right to use your winning submissions and the source code and data created for and used to generate the submission for any purpose whatsoever and without further approval.
Each participant must create a AIHub account to submit their solution for the competition. Only one account per user is allowed.
The competition is public, but the Competition Organizer may elect to disallow participation according to its own considerations.
The Competition Organizer reserves the right to disqualify any entrant from the competition if, in the Competition Organizer’s sole discretion, it reasonably believes that the entrant has attempted to undermine the legitimate operation of the competition through cheating, deception, or other unfair playing practices.
Participants are allowed to form teams.
You may not participate in more than one team. Each team member must be a single individual operating a separate AIHub account.
[1] Ježek K, Steinberger J. Automatic text summarization (the state of the art 2007 and new challenges). In Proceedings of Znalosti 2008 Feb (pp. 1-12).
[2] Lins RD, Mello RF, Simske S. DocEng'19 Competition on Extractive Text Summarization. In Proceedings of the ACM Symposium on Document Engineering 2019 2019 Sep 23 (pp. 1-2).
[3] Abacha AB, M’rabet Y, Zhang Y, Shivade C, Langlotz C, Demner-Fushman D. Overview of the MEDIQA 2021 shared task on summarization in the medical domain. In Proceedings of the 20th Workshop on Biomedical Language Processing 2021 Jun (pp. 74-85).
[4] Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K. Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268. 2017 Jul 7.
[5] Khan R, Qian Y, Naeem S. Extractive based text summarization using k-means and tf-idf. International Journal of Information Engineering and Electronic Business. 2019 May 1;11(3):33.
[6] Gambhir M, Gupta V. Recent automatic text summarization techniques: a survey. Artificial Intelligence Review. 2017 Jan;47(1):1-66.
[7] El-Kassas WS, Salama CR, Rafea AA, Mohamed HK. Automatic text summarization: A comprehensive survey. Expert Systems with Applications. 2021 Mar 1;165:113679.
[8] Hou L, Hu P, Bei C. Abstractive document summarization via neural model with joint attention. InNational CCF conference on natural language processing and Chinese computing 2017 Nov 8 (pp. 329-338). Springer, Cham.
[9] Zhang J, Zhao Y, Saleh M, Liu P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. InInternational Conference on Machine Learning 2020 Nov 21 (pp. 11328-11339). PMLR.
[10] Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461. 2019 Oct 29.
[11] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res.. 2020 Jun;21(140):1-67.
[12] Lin CY. Rouge: A package for automatic evaluation of summaries. InText summarization branches out 2004 Jul (pp. 74-81).
[13] Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu & Graham Neubig: Re-evaluating Evaluation in Text Summarization. EMNLP 2020.
Start: Nov. 1, 2022, midnight
Description: This leaderboard is calculated with approximately 50% of the testing data. The final results will be based on the other 50%, so the final standings may be different.
Start: Nov. 5, 2022, midnight
Description: The private leaderboard is calculated with approximately 50% of the testing data. The leaderboard will reflect the final standings after this shared task completed.
Start: Nov. 11, 2022, midnight
Description: This leaderboard is calculated with approximately 50% of the testing data. SAME as the Public test.
Start: Nov. 26, 2022, midnight
Description: This leaderboard is calculated with other approximately 50% of the testing data. SAME as Private Test.
Dec. 31, 2100, 11:59 p.m.
You must be logged in to participate in competitions.
Sign In