LIME: A Linguistic Metadata vocabulary

The role of metadata in the Linked Open Data world is becoming more and more acknowledged and recognized as fundamental for the realization of a web of networked information sources that are not only readable and understandable, but also discoverable and exploitable by machines according to well-defined plans and strategies.

While RDF and the other W3C modeling vocabularies offered a language for expressing knowledge and SPARQL provided a means to access and query this knowledge, still machines need to know how to find it and, more importantly, to understand when they need it. Metadata plays a critical role in satisfying these needs, characterizing the salient aspects of an ontology or a dataset in general, even before it is accessed, thus providing a sort of “advertisement” of its existence and a how-to for its use.

In the context of the OntoLex W3C community group – which realized a model for interfacing Ontologies and Lexicons – we have developed a metadata vocabulary, LIME (LInguistic MEtadata), that describes the lexical asset of ontologies and RDF datasets.

The OntoLex specification defines an evolved lexicon model for ontologies, but the LIME vocabulary is suitable for describing the asset of any kind of lexicalization. The description of this asset consists of qualitative and quantitative information about the lexical realization of the described dataset. Relevant metadata includes the listing of the natural languages adopted to lexicalize the dataset, the lexical models adopted to provide the lexicalizations (e.g rdfs:labels, SKOS or SKOS-XL labeling properties, or OntoLex itself) and statistics about the coverage of the dataset elements by the lexical entries for each given language.

As providing a proper lexical characterization of the described resources has been traditionally one of the main points of thesaurus development, we encourage the adoption of the LIME vocabulary (which builds on top of, and complements, other existing metadata vocabularies, such as  VoID, Dublin CoreThe PROV ontology or DCAT ) to augment their visibility, improve their accessibility and qualify their lexical characterization.

The specification of the LIME vocabulary is currently available on its dedicated section of the OntoLex model specification in the OntoLex wiki. The ART Team at the University of Rome “Tor Vergata” is also developing automatic metadata generators that will be embedded inside the Semantic Turkey RDF management system and the collaborative thesaurus editing platform VocBench, developed together with FAO.

More services are coming in the near future, exploiting the lexical metadata exposed by existing datasets and lexical resources ported to RDF, with the aim of supporting the work of thesaurus editors with better interconnectivity to the LOD, semi-automated alignment capabilities and support to multilinguistic evolution of their thesauri.

So, what are you waiting for? Squeeze some LIME on your thesaurus!

Armando Stellato and Manuel Fiorelli

ART Research Group, University of Rome, Tor Vergata

P.S.

A paper about LIME has been presented at the Research Track of the 12th Extended Semantic Web Conference (ESWC), 31 May – 4 June 2015, Portorož, Slovenia.

Manuel Fiorelli, Armando Stellato, JohnP. McCrae, Philipp Cimiano and Maria Teresa Pazienza LIME: The Metadata Module for OntoLex, The Semantic Web. Latest Advances and New Domains, doi:10.1007/978-3-319-18818-8_20, (Gandon, Fabien and Sabou, Marta and Sack, Harald and d’Amato, Claudia and Cudré-Mauroux, Philippe and Zimmermann, Antoine eds.), Lecture Notes in Computer Science, 9088, 321-336,Springer International Publishing, 2015 Document on the Web Download Local Document