Datasets

We have currently published the following linguistic datasets. Please refer to the associated publications for their details.

Name Empirical Domain Sample Languages Data source Publication
MultiCoS connectives 24 languages Elicitation LREC 2026
MECORE-EN clause-embedding predicates English Web-crawled corpora SCiL 2025
MODALS modal auxiliaries 24 languages Elicitation Linguistic Variation 2024
MECORE-XLing clause-embedding predicates 14 languages Elicitation SigTyp 2023