CLARIN to IMDI, DATE: 2009-08-13 Croatian Language Corpus Croatian Language Corpus This is a corpus of diachronic and synchronic Croatian, including dialectal variants. Its annotation is expanding to a multi-tier annotation, offering phonological, morphological, syntactic and semantic properties. There is a frequency profile of tokens available, being expanded to frequency profiles at all linguistic levels, n-gram and multi-gram models. ISO639-3:hrv Croatian ISO639-3:eng English ISO639-3:hrv Croatian Unknown Unknown Unknown Croatia Written Corpus XML TEI 2005 Croatian Language Repository Croatian Language Repository Institute of Croatian Language and Linguistics Collection: 100 mil. tokens (continued development) ]]> 729 1469 Institute of Croatian Language and Linguistics Zagreb, Croatia Scanned text in TIFF and PDF, Raw text annotations are 100% pure XML TEI P5 using multi-tier annotation for phonemic, morphological, syntactic and semantic annotation. high-quality, multi-checked process output for digitization and annotation Croatian Language Repository