COSMIANU - Corpus Of Social Media Italian Annotated with Nominal Utterances

COSMIANU is an Italian corpus of social media texts annotated manually with different types of Nominal Utterances (NUs).

In particular, COSMIANU consists of semi-synchronous forms of computer mediated communication, i.e. blogs, forums, newsgroups, and social networks (for a total of 66,013 tokens), taken from the Web2Corpus IT, a balanced corpus  of over one million words. These texts consist of discussions between users across a large number of themes (from politics to popular singers). Thus in most cases, users interact with each other creating a dialogic enviroment rich in verbal crossfires and quotes.

Nominal utterances (NUs) appearing in the corpus have been annotated and further marked with the following attributes:

  • Verbal coordinate
  • Non-verbal coordinate
  • Verbal subordinate
  • Ellipsis
  • Metadata

Distribution License

Creative Commons License
COSMIANU is licensed under a Creative Commons Attribution 4.0 International License.

If you use COSMIANU, please cite the following paper:

To obtain the corpus please fill the request form below with your data (they will be maintained in a database at FBK):

Request form for download

