You are here


COSMIANU is an Italian corpus of social media texts annotated with Nominal Utterances.

In particular, COSMIANU consists of semi-synchronous forms of computer mediated communication, i.e. blogs, forums, newsgroups, and social networks (for a total of 66,011 tokens), taken from the Web2Corpus IT, a balanced corpus  of over one million words. These texts consist of discussions between users across a large number of themes (from politics to popular singers). Thus in most cases, users interact with each other creating a dialogic enviroment rich in verbal crossfires and quotes.

Nominal utterances (NUs) appearing in the corpus have been marked and classified into the following categories:

  • Simple
  • Verbal coordinate
  • Non-verbal coordinate
  • Verbal subordinate
  • Ellipsis
  • Metadata

Distribution License

Creative Commons License
COSMIANU is licensed under a Creative Commons Attribution 4.0 International License.

If you use COSMIANU, please cite the following paper:

  • Gloria Comandini, Manuela Speranza, and Bernardo Magnini. Effective Communication without Verbs? Sure! Identification of Nominal Utterances in Italian Social Media Texts. In Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy, 10-12 December 2018, to appear.

The corpus will be available soon.

Technology type: