CLARIN to IMDI, DATE: 2009-08-13
Croatian Language Corpus
Croatian Language Corpus
This is a corpus of diachronic and synchronic Croatian, including dialectal variants. Its annotation is expanding to a multi-tier annotation, offering phonological, morphological, syntactic and semantic properties. There is a frequency profile of tokens available, being expanded to frequency profiles at all linguistic levels, n-gram and multi-gram models.
ISO639-3:hrv
Croatian
ISO639-3:eng
English
ISO639-3:hrv
Croatian
Unknown
Unknown
Unknown
Croatia
Written Corpus
XML TEI
2005
Croatian Language Repository
Croatian Language Repository
Institute of Croatian Language and Linguistics
Collection: 100 mil. tokens (continued development)
]]>
729
1469
Institute of Croatian Language and Linguistics
Zagreb, Croatia
Scanned text in TIFF and PDF, Raw text annotations are 100% pure XML TEI P5 using multi-tier annotation for phonemic, morphological, syntactic and semantic annotation.
high-quality, multi-checked process output for digitization and annotation
Croatian Language Repository