CLARIN to IMDI, DATE: 2009-08-13
Wortschatz
Wortschatz
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
ISO639-3:afr
Afrikaans
Unknown
Unknown
Unknown
ISO639-3:sqi
Albanian
Unknown
Unknown
Unknown
ISO639-3:bul
Bulgarian
Unknown
Unknown
Unknown
ISO639-3:cat
Catalan
Unknown
Unknown
Unknown
ISO639-3:zho
Chinese
Unknown
Unknown
Unknown
ISO639-3:hrv
Croatian
Unknown
Unknown
Unknown
ISO639-3:ces
Czech
Unknown
Unknown
Unknown
ISO639-3:dan
Danish
Unknown
Unknown
Unknown
ISO639-3:nld
Dutch
Unknown
Unknown
Unknown
ISO639-3:eng
English
Unknown
Unknown
Unknown
ISO639-3:epo
Esperanto
Unknown
Unknown
Unknown
ISO639-3:est
Estonian
Unknown
Unknown
Unknown
ISO639-3:fin
Finnish
Unknown
Unknown
Unknown
ISO639-3:fra
French
Unknown
Unknown
Unknown
ISO639-3:deu
German
Unknown
Unknown
Unknown
ISO639-3:hun
Hungarian
Unknown
Unknown
Unknown
ISO639-3:isl
Icelandic
Unknown
Unknown
Unknown
ISO639-3:ind
Indonesian
Unknown
Unknown
Unknown
ISO639-3:ita
Italian
Unknown
Unknown
Unknown
ISO639-3:jpn
Japanese
Unknown
Unknown
Unknown
ISO639-3:kor
Korean
Unknown
Unknown
Unknown
ISO639-3:lat
Latin
Unknown
Unknown
Unknown
ISO639-3:lav
Latvian
Unknown
Unknown
Unknown
ISO639-3:lit
Lithuanian
Unknown
Unknown
Unknown
Unknown
Malay
Unknown
Unknown
Unknown
ISO639-3:nor
Norwegian
Unknown
Unknown
Unknown
Unknown
Occitan
Unknown
Unknown
Unknown
ISO639-3:ron
Romanian
Unknown
Unknown
Unknown
ISO639-3:rus
Russian
Unknown
Unknown
Unknown
ISO639-3:slk
Slovak
Unknown
Unknown
Unknown
ISO639-3:slv
Slovenian
Unknown
Unknown
Unknown
ISO639-3:spa
Spanish
Unknown
Unknown
Unknown
ISO639-3:sun
Sundanese
Unknown
Unknown
Unknown
ISO639-3:swe
Swedish
Unknown
Unknown
Unknown
ISO639-3:tgl
Tagalog
Unknown
Unknown
Unknown
ISO639-3:tur
Turkish
Unknown
Unknown
Unknown
ISO639-3:vie
Vietnamese
Unknown
Unknown
Unknown
ISO639-3:cym
Welsh
Unknown
Unknown
Unknown
Germany
Written Corpus
MySQL databases (myISAM) and plain text; directly accessible (webinterface; partly webservices)
1993
University of Leipzig • Department of Computer Science • NLP Group
available on the internet
Prof. Dr. Gerhard Heyer, apl. Prof. Dr. Uwe Quasthoff; wort@informatik.uni-leipzig.de
http://corpora.informatik.uni-leipzig.de/
Quasthoff, U. and Richter, M.: /Projekt Deutscher Wortschatz/. 2005 (http://www.asv.informatik.uni-leipzig.de/publications/87); Quasthoff, U., Richter, M., and Biemann, C.: /Corpus Portal for Search > in Monolingual Corpora/. In: /Proceedings of the LREC 2006/, Genoa, > Italy, 2006 (http://www.asv.informatik.uni-leipzig.de/publications/bibtex/32); Biemann, C., Heyer, G., Quasthoff, U., and Richter, M.: /The Leipzig > Corpora Collection - Monolingual corpora of standard size/. In: > /Proceedings of Corpus Linguistic 2007/, Birmingham, UK, 2007 (http://www.asv.informatik.uni-leipzig.de/publications/bibtex/53)
1017
2068
Sorbian