Course web page, WS 2024 - 2025 SfS, University of Tübingen

This is the course page for the seminar course Challenges in Computational Linguistics at the Department of Linguistics, University of Tübingen.

Computational Linguistics and related fields have a well-established tradition of “shared tasks” or “challenges” where the participants try to solve a current problem in the field using a common data set and a well-defined metric of success. Participation in these tasks is fun and highly educational as it requires the participants to put all their knowledge into practice, as well as learning and applying new methods to the task at hand. The comparison of the participating systems at the end of the shared task is also a valuable learning experience, both for the participating individuals and for the whole field.

This course takes its title literally. The students taking the course are required to participate in a shared task in the field, and solve it as best as they can. The requirement of the course include developing a system to solve the problem defined by the shared task, submitting the results and writing a paper describing the system.

Requirements

The course requires good programming skills, a working knowledge of machine learning and NLP, and strong (self) motivation. This typically means a highly motivated master’s or advanced Bachelor’s student in computational linguistics or related departments (e.g., computer science, artificial intelligence, cognitive science). If you are unsure whether this course is for you, please contact the instructor.

Shared tasks

In principle, any shared task related to computational linguistics is acceptable. Here are a few pointers (some point to earlier events):

Other workshops in ACL, EMNLP, EACL, NAACL, and COLING often include relevant shared tasks (this year’s workshop schedule is not yet known).

This is only a small sample. Some of the shared tasks for the upcoming year are not yet announced. You are recommended to check the earlier instances of and keep an eye on the workshop pages.

Another interesting event similar to the shared tasks above, but has a different approach is the ML Reproducibility Challenge. You are also welcome to participate in this event (You may also want to look at the last year’s challenge website to see example reports).

For the purposes of the class, we prefer a shared task where you finalize your work with a system description paper. If all else fails, or if you have a strong preference, a CL-related Kaggle competition may also be an option (you are still required to write a system description paper).

Earlier instances of the course

The following are the pointers to the papers the participants have published.

Paper Shared task
Stuhlinger & Winkler Multidomain, Multimodel and Multilingual Machine-Generated Text Detection
Pickard & Do (2024) Multidomain, Multimodel and Multilingual Machine-Generated Text Detection
Cao, Kilic & Will Multidomain, Multimodel and Multilingual Machine-Generated Text Detection
Smilga & Alabiad (2024) Safe Biomedical Natural Language Inference for Clinical Trials
Zhunis & Chuang (2024) Numeral-Aware Language Understanding and Generation
Rösener, Wei & Vandici (2024) SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes
Grötzinger, Heuschkel and Drews (2023) Learning With Disagreements
Höfer and Mottahedin (2023) Multilingual Complex Named Entity Recognition
Shmalts (2023) Clickbait Spoiling
Sentiment Analysis for African Languages
Baumann and Deisenhofer (2023) Category and Framing Prediction in Online News
Lundberg, Sánchez Viñuela and Biales (2022) Dialogue summarization
Merzhevich, Gbadegoye, Girrbach, Li and Soh-Eun Shim (2022) Morphologica inflection generation
Girrbach (2022) Morphome segmentation
Girrbach and Li (2022) Word Segmentation and Morphological Parsing for Sanskrit
Gu, Meisinger and Dick (2022) Misogyny Detection
Jobanputra and Martín Rodríguez (2022) Multilingual news article similarity
Hantsch and Chkroun (2022) Intended Sarcasm Detection
Hantsch and Chkroun (2022) Intended Sarcasm Detection
Vetter, Segiet and Lennermann (2022) Presupposed Taxonomies
Brivio and Çöltekin (2022) ML reproducibility challenge
Glocker and Markianos Wright (2020) Emphasis selection
Kaparina and Soboleva (2020) Extracting term-definition pairs
Karnysheva and Schwarz (2020) Detecting semantic change
Kannan and Santhi Ponnusamy (2020) Extracting term-definition pairs
Bear, Hoefels and Manolescu (2020) Sentiment Analysis of Code-Mixed Tweets
Collins, Grathwohl and Ahmed (2020) Commonsense Validation and Explanation
Ammer and Grüner (2020) Assessing the Funniness of Edited News Headlines
Dick, Weirich and Kutkina (2020) Assessing the Funniness of Edited News Headlines
Luo and Tang (2020) Assessing the Funniness of Edited News Headlines
Blaschke, Korniyenko and Tureski (2020) Detection of Propaganda Techniques
Bansal, Nagel and Soloveva (2019) OffensEval 2019
Kannan and Stein OffensEval 2019
Pütz and Glocker (2019) Cross-lingual Semantic Parsing with UCCA
Juhász, Linnenschmidt and Roys (2019) Fact Checking in Community Question Answering Forums
Manolescu, Löfflad, Mohamed Saber and Moradipour Tari (2019) Shared Task on Multilingual Detection of Hate
Luo, Baranova and Biegert (2019) Math Question Answering
Wu, DeMattos, So, Chen and Çöltekin (2019) VarDial Evaluation Campaign

 

Contact