ISOcat in daily life
possible uses of a large repository of widely used linguistic concepts
LREC 2012 tutorial

This ISOcat tutorial will take place at the morning (9:00 - 13:00) of Monday 21 May 2012 at LREC 2012 in Istanbul.

News

UPDATE: The presentations and exercises/answers are now linked from the tutorial agenda. Thanks to all participants for a tutorial with lively discussions!

Reminder: Register before Sunday 20 May at www.isocat.org (you’ll need your credentials to login to the ISOcat “testbed”)

Location: Dolmabahçe A, main building first floor, Istanbul Lütfi Kırdar Convention & Exhibition Centre (ICEC), Harbiye 34267, Istanbul, Turkey

Introduction

Language resources are a very valuable asset, whether they come just with metadata, or with one or several types of annotation (PoS, syntax, semantics, ...). Not only now, where they form the basis for new scientific publications, but also in de future when new research might need to reassess previous findings. In order to relate annotation or metadata schemes, for example when comparing two syntactically annotated corpora, when adding a layer of annotation making use of a previous one, or when comparing various instantiations of the 'same' annotation scheme (which may have been adapted over time), one needs information on these schemes, preferably in relation to each other, to be able to perform such a task. ISOcat can be used for such a task, alone or in combination with RELcat and SCHEMAcat. They enable you to specify relations between (parts of) schemes.

ISOcat is a linguistic concept database developed by ISO TC 37 to provide reference semantics for annotation schemata (both features/attributes and values, all of which are designated as Data Categories (DCs) in the context of the repository). Some of the DC specifications in ISOcat will be standardized, meaning that their use is promoted. These items will mainly be descriptions that are general enough to be used by as many users as possible. In some cases, however, the theoretical background of an annotation scheme presupposes different definitions from those available, which necessitates the creation of new, perhaps closely related DCs. New DCs should be related to each other and to the existing definitions, which can be achieved using RELcat and SCHEMAcat, two other registries designed to express relations between specified DCs, which are referenced using the persistent identifiers for their respective ISOcat DCs.

Aims

This tutorial will teach how to deal with DCs in ISOcat. It will provide hands on experience with the ISOcat web interface (viewer and editor), show how to find existing DCs, how and when to create new ones, how to work with DC Selections (DCS; coherent, defined sets of DCs), etc.

Instructional sessions will be alternated with practical ones as well as with reports from experienced users, who will explain why they make use of ISOcat, and what are its benefits. After the tutorial the participants should be able to decide when an existing DC can be reused, when a new one is to be defined and how this new DC relates to the existing ones. They will also be able to construct such new DCSs as well as new DCs. During the tutorial the participants will make use of the ISOcat "testbed", which will enable them to experiment without doing any harm to the "real" ISOcat.

The reporting, experienced users are:

  • Matej Durco, ICLTT Austrian Academy of Sciences, CMDI expert [on relating metadata]
  • Irina Nevskaya, U.Frankfurt & F.U.Berlin, expert ISOcat-user (RELISH, MDF, GOLD) [on adding schemes into ISOcat]
  • Franca Wesseling, Meertens Instituut, Royal Netherlands Academy of Arts and Sciences, expert ISOcat-user (EdiSyn project) [on using ISOcat to relate corpora]
  • Sue Ellen Wright - Kent State University, convenor ISO 12620:2009 and chair DCR Board [on ISOcat itself]

Agenda

9:00 – 9:20 Welcome and introduction to ISOcat Sue Ellen/Menzo
9:20 – 9:30 Prologue: the CGN showcase Ineke
9:30 – 10:00 Hands-on: ISOcat basics (exercises and answers) all
10:00 – 10:30 Hands-on: creating Data Category selections all
10:30 – 11:00 coffee
11:00 – 11:10 Data Category specifications Menzo
11:10 – 11:40 Hands-on: creating Data Category specifications all
11:40 – 11:55 Known problems (do and don’t) Ineke
11:55 – 12:05 Epilogue: the CGN showcase Ineke
12:05 – 12:20 Connecting Corpora and ISOcat Franca
12:20 – 12:30 Beyond ISOcat Menzo
12:30 – 12:45 The RELISH MDF and GOLD crosswalk Irina
12:45 – 13:00 Semantic mapping in CMDI Matej

Prerequisites for the hands-on sessions:

  1. Bring your own laptop (we’ll use the LREC internet connection or a special purpose local wireless network)
  2. Register before Sunday 20 May at www.isocat.org (you’ll need your credentials to login to the ISOcat “testbed”)
Registration

Register at LREC 2012 and indicate participation in the ISOcat tutorial

Contact

For any additional information feel free to contact the organizers:

  1. Ineke Schuurman (KULeuven & Utrecht University) ISOcat content coordinator CLARIN-NL
  2. Menzo Windhouwer (Max Planck Institute for Psycholinguistics, Nijmegen) developer of ISOcat and RELcat