Thematic Domain Groups

 

ISO TC 37 Terminology and Other Language and Content Resources

Group

Profile name

Chair

TDG 1

Metadata

Peter Wittenburg

Description: TDG 1 for Metadata manages a profile that includes the set of data categories that can be used to describe language data resources, web services and applications with keyword type of metadata. It supports all linguistic data types that are required in the emerging eScience scenario, such as speech and multimedia recordings, written resources, annotations, lexicons, data category registries and ontologies, schemas, tools of different sorts and many others. The data categories included are partly taken from existing sets that have proven successful in the past. Due to the dynamics in the field, new data categories have been added and will be added to offer the necessary semantic scope. In addition, the requirements of sub-disciplines will be accommodated. Users will be able to make their own Data Category Selections to describe their resources appropriately. The intention of these descriptions is to foster visibility and re-usability. 

TDG 2

Morphosyntax

Gil Francopoulo

Description: TDG 2 for Morphosyntax manages a profile that includes the set of data categories used for morphosyntactical markup in language resources, as well as one or more Data Category Selections used in this area. Morphosyntax involves those elements and patterns of morphology (word formation) that reflect syntactical or grammatical functions, such as inflections and other paradigmatic elements whereby word forms change in adaptation to their usage.

For example, in German the infinitive-forming particle zu is inserted between some verb prefixes and the root form of the verb instead of simply preceeding the verb in the sentence, which is the more normal position:
übersetzen, meaning to translate: I read the text in order to translate it. Ich habe den Text gelesen, um ihn zu übersetzen.
übersetzen, meaning to transport to the opposite side of something: I came to transfer the boat to the opposite side of the river. Ich kam, um das Boot auf die andere Seite des Flußes überzusetzen.

TDG 3

Semantic Content Representation

Harry Bunt

Description: TDG 3 for Semantic Content Representation manages a profile that includes the set of data categories that can be used for semantic markup in language resources, as well as several Data Category Selections used in this area. The markup of all types of language use is supported, including written text, multimedia messages and resources, and recordings of spoken and multimodal interaction. Not only individual data categories will be defined, but also principles for structuring comprehensive sets of semantic data categories, for example in the form of multidimensional annotation schemas; for the addition of elements or structures to such sets; and for selecting coherent subsets.

  Activity 1

Discourse Relations

Koiti Hasida

Description: TDG 3, Activity 1 manages a profile that includes a set of reference categories designed to qualify semantic and pragmatic relations among text segments representing events and states (and their classes), in particular from the point of view of argumentation or rhetorical structures.

  Activity 2

Dialogue Acts

Harry Bunt

Description: TDG3 Activity 2 focuses on the identification and definition of reference data categories for characterizing the communicative functions of utterances in spoken, written, or multimodal dialogue involving two or more participants.

  Activity 3

Referential Structures and Links

Laurent Romary

Description: This activity focuses on the identification of data categories needed for the qualification of referential markables in written or spoken discourse, as well as those that may be used in defining referential links between such markables.

  Activity 4

Logico-semantic Relations

Scott Farrar

Description: TDG 3, Activity 4 will collect canonical logico-semantic relations. A useful relation is defined as one that can be used to describe the relations in a semantic system of a particular language. Relations include discourse-level, mereological and temporal relations, e.g., 'elaboration', 'part', and 'before', respectively. In particular, the TDG is interested in relations that are common to many different languages. Starting points will be data category concepts related to ontologies such as the Generalized Upper Model and FrameNet. These relations should be useful in computational implementations, e.g., new ontologies and reasoning systems.

  Activity 5

Temporal Entities and Relations

Kiyong Lee

Description: TDG 3, Activity 5, the temporal semantic content profile in the DCR, includes a global set of all data categories known to be used in the management of temporally-oriented language resources that support logical and ordinary inferences related to time and events in ordinary discourse or specialized disciplines. In particular, it includes a set of data categories specifying temporal entities such as instants, intervals, and durations or periods, and temporal relations such as precedence or overlap. It also deals with grammatical categories such as /tense/, /aspect/, /modality/, and /mood/ and their associated attribute-values that reflect on various semantic features of time and events in written text or spoken discourse. The application of these categories ranges from the basic annotation task of constructing useful language resources to the high-level design, development, and evaluation of interoperable efficient information retrieval and question-answering systems or inference engines for computational purposes.

  Activity 6

Semantic Roles and Argument Structures

Thierry Declerck

Description: TDG 3, Activity 6, for Semantic Roles and Argument Structures manages a profile for a set of data categories related to semantic roles that some syntactic units (arguments) play with respect to the relevant function words contained in an utterance. Since there has been hardly any consensus on these topics in the linguistic community, this profile will most likely contain a relatively long list of concepts, with very detailed definitions that will probably underscore minimal divergence in the use of the basic concepts.

TDG 4

Syntax

Thierry Declerk

Description: TDG 4 for Syntax manages a profile including a global set of data categories that represent points of reference for particular tagsets used in the syntactic annotation of various languages. TDG 4 is associated with the SynAF standardization proposal (ISO CD 24615), which deals with the description of a metamodel for syntactic annotation involving elementary linguistic (in fact syntactic) abstractions. This TDG aims to establish consensus regarding definitions assigned to a set of data categories for constituency and dependency annotation. categories for constituency and dependency annotation, along with more generic generic data categories that underly both. These criteria represent the primary classes of categories identified in SynAF:

· Basic generic data categories common to all kinds of syntactic annotation (concepts such as /tagging/, /parsing/, or /syntacticFeature

· Constituency-related data categories (concepts such as /chunk/, /phrase/, or /clause/, etc.)

· Dependency-related data categories (concepts such as /modifier/, /complementizer/, or /subject/, etc.)

TDG 6

Language Resource Ontology

Koiti Hasida

Description: TDG 6 for Language Resource Ontology supports the 'ontologization' (reformulation in ontological terms) of standards developed in TC 37, primarily from SC4. This involves the conversion of mostly XML-based annotations to RDF-based descriptions and of annotation schemas to ontologies for the sake of interoperability with each other and with external specifications and ontologies, reflecting the fuller formalization of those standards. This activity has the potential to create a semantic extension of the DCR.

TDG 7

Lexicography

André Le Meur

Description: TDG 7 for Lexicography elaborates data categories and maintains the profile for lexicographical applications, in particular, for print dictionaries, as well as one or more Data Category Selections used in this area.

TDG 8

Language Codes

Håvard Hjulstad

Description: TDG 8 for Language Codes facilitates the inclusion of language codes from ISO 639 (all parts) in the framework of the DCR. These codes [will] reside in the ISO Concept Database and are maintained according to ISO Standards as Databases procedures (ISO Supplement to the ISO/IEC Directives, Annex ST). These codes also involve interaction with ISO 3166 country codes and ISO 15924 script codes.

TDG 9

@@Terminology@@

Sue Ellen Wright

Description: TDG 9 for Terminology manages a profile that includes a global set of all data categories known to be used in concept-oriented terminology databases designed to support written text and spoken discourse in specialized disciplines. The profile includes data categories for monolingual subject-field glossaries, concise translation-oriented terminology management, localization resources, language planning and standardization, and the representation of concept systems, among other objectives. Designers of individual applications would normally collect their own Data Category Selections as manageable subsets of this extensive inventory of items. The profile includes term-related data categories such as those for points of grammar and register, usage-related items like /context/, concept-related and administrative data categories. Terminology in this sense differs from controlled vocabularies such as thesauri and classification systems commonly used for document and object retrieval, although there are similarities in naming labels and writing definitions.

The profile is likely to include a number of Data Category Selections, for instance the following activities:

· The TBX family of TermBase eXchange formats (including TBX-Basic) compliant with ISO 30042:2008

· The ISO/IEC format for terminological definitions used in ISO/IEC standards (ISO 10241)

· Translation-oriented terminology management as reflected in ISO 12616:2002

TDG 11

Multilingual Information Management

Samuel Cruz-Lara

Description: TDG 11 for Multilingual information management elaborates data categories and maintains the profile relevant for description of multilingual information in translation memories, localization files or multimedia applications like subtitling, karaoke or interactive TV. In particular, this profile should be used as a basis for referencing data categories relevant for standards such as TMX (Lisa/Oscar), XLIFF (Oasis) or SMILText (W3C), when it is necessary to elicit the interoperability conditions between them. It also provides generic data categories for the representation and qualification of multilingual information and as such is the basis for any implementation of the MLIF (Multilingual information Framework - ISO CD 24616) standard. This profile comprises, among others, data categories relevant for source and target text categorisation, multilingual tool identification, inline elements for presentational markup, synchronization data categories and linguistic segment categorization. This TDG will also maintain one or more standard Data Category Selections associated with these applications. 

TDG 12

Lexical Resources

Nicoletta Calzolare

Description: TDG 12 for Lexical Resources manages a profile that includes the set of data categories that can be used to describe lexical resources of varying complexities, including all the different layers of linguistic description (phonology, morphology, syntax, semantics) needed for various language technology applications, and including phenomena such as multi-word expressions. Both written and spoken lexicons are considered. The TDG must rely on work done in previous initiatives involving quite many groups all over the world in order to ensure a truly consensual approach. The profile must be used in connection with the Lexical Markup Framework – LMF, which is being used in a number of projects for many languages, including some Asian languages.

Most of the data categories will have been defined in other TDGs, such as morphosyntax, syntax, and lexical semantics. This means that work in this TDG should not repeat what has already been done, but – in close connection with these other TDGs – simply incorporates data categories that comes from other profiles and that are appropriate for encoding lexicons, looks for their consistency, and only adds those items that are specific for lexicon encoding and not for text annotation.

TDG 13

Lexical Semantics

Monica Monachini

Description: TDG 13 for Lexical Semantics manages a profile that includes the set of data categories used for the representation of lexical semantic information in NLP lexicons. In order to allow coordination and interoperability, it becomes imperative for TDG 13 to establish synergies with other TDGs, Syntactic, Semantic, Semantic Role and Argument Structure, Ontology profiles, and, particularly, TDG 12, the Lexical Resource profile. The Lexical Mark-up Framework remains the privileged interface of TDG 13: its specific data categories are supposed to be used in combination with structural elements of the LMF – ISO 24613:2008. Lexical semantic data categories will supplement the structural lexical objects of the abstract LMF metamodel, thus becoming part of its definition and constituting the vocabulary used to express lexico-semantic information. This will allow lexicographers to implement concrete LMF lexicons and will help the LMF model to gain operability and usability. The activity in TDG 13 also aims at investigating and defining the constraints governing the relationships of these data categories with the metamodel and its extensions, mainly semantic and multilingual extensions.

Typical objects of investigation in TDG 13 are the data categories for the Lexical object SenseRelation, Synset Relation and Predicate Relation and their definitions. These data categories are gathered by looking at many best practices in semantic lexicon building: relations used in the framework of the Extended Qualia Structure to relate different senses, (is_part_of; used_for; created_by − ISOcat /isPartOf/; /usedFor/; /createdBy/); relations that link different synsets in lexicons of the WordNet family (inter-WN relations, that link synsets of a same WordNet, and intra-WordNet relations used to link WordNet in a multilingual fashion: has_synonym; has_eq_hyperonym – ISOcat /hasSynonym/; /hasEqHyperonym/); relations that are used to represent relations between Semantic Predicates (or Frames in a Frame Semantic Environment).

A domain information data category falls in the realm of TDG 13 as well, since they are used to represent the domain of use of a word meaning: medicine, biology, informatics, engineering. Contact points hold between the activities in TDG 13 and in TDG 3 Act.6, as concerns Semantic Roles used to specify a Semantic Argument with indication about its deep function: agent, patient. Relationships hold with TDG 6, as concerns the ontological nodes which are used to fill the MonolingualExternalRef and MultilingualExternalRef objects, having the specific purpose to align a meaning in the lexicon with a concept in a (shared) ontology. Ontological classes are surveyed in TDG 13 as possible descriptors to be assigned to semantic predicate’s arguments in order to impose selectional restrictions, as it happens in some well known lexicon practices. This allows predicting possible fillers of a semantic role relation among those senses labelled with the same node: human, animal, food

A mailing list especially dedicated to TDG 13 activity is available and, at present, it counts 13 subscribers among the experts of the sector.

This profile is likely to include a number of Data Category Selections used in different LMF-compliant lexicon instantiations: the BioLexicon, NEDO Lexicon, the KYOTO WordNet-LMF grid.