#################################################### THIS CODE IS OBSOLETED!! it has been integrated and further developed within: https://github.com/vronk/corpus_shell and https://github.com/vronk/SADE #################################################### == CLARIN MDRepository == Steps to setup and run the repository 0. prerequisites + install a) be sure to use java-jdk 1.6 (we experienced strange java-errors with 1.5) b) install: http://exist-db.org/quickstart.html#sect2 java -jar eXist-{version}.jar -p {install-dir} c) set admin pwd d) you may want to add memory to the JVM under bin/functions.d/eXist-settings.sh#set_java_options() e) you may also want to grow the cache in conf.xml clarin {PASSWORT} 3. create a collection for caching, eg: /db/cache this has to correspond to the entry in cmd-model.xqm: declare variable $cmd-model:commonFreqsPath as xs:string := "/db/cache"; If you change something, you have to manually clear the cache-collection. Queries on queryModel- and getCollections-interfaces are being cached. The key is: for getCollections: collection{maxdepth}-{hash({collection-handle})} for queryModel: values{maxdepth}-{hash({simple xpath from q-param})} 4. define indices copy cmdi-mirror.xconf into /db/system/config/db/cmdi-mirror 5. add data to /db/cmdi-mirror (the file-system structure will be reflected in the "collection"-structure within exist, however this is irrelevant for the MDRepository methods. Those rely on the linking via handles in MdSelfLink/ResourceRef and elements of the MDRecords. The handles in are redundant (necessary for faster collection-constraint search) and can be derived from the ResourceRef/MdSelfLink link. This can be done before storing the data in the repository, or after the import directly in the repository (XUpdate-scripts for this will be available soon) The top level collection record is by convention called colleciton_root.cmdi and is marked with: root (So every dataset (olac, lrt, imdi) has one such MDRecord.) 6. depending on your server-setup (port) you should be able to get your first query under somewhere like: http://localhost:8680/exist/rest/db/clarin/cmd-model.xql?q=Components (queryModel is the default operation) http://localhost:8680/exist/rest/db/clarin/cmd-model.xql?operation=getCollections&collection= These queries may take some time, when run first time, so be patient. Avoid starting multiple times. You can see in the cache-collection, if the results are ready. == test suite == THIS IS CURRENTLY BEING DEVLEOPED! NOT SAFELY USABLE YET! own build-file: build-tests.xml based on exist's performance.xml sub-build-file imports main exist build-file. This yields problems with basedir for the imported build-files The simplest solution I could find is to set the basedir as property on command line: ant -f build-tests.xml -Dbasedir=C:/apps/exist benchmark The other options are to be set in build-tests.properties! actual queries for testing/benchmarking are written in cmd-test.xml