DBlexipedia : multilingual lexicalizations for properties of the DBpedia ontology
DBlexipedia provides a multilingual lexicon for the DBpedia ontology by means of existing automatic methods for lexicon induction. DBlexipedia aims at providing a hub for the lexical Semantic Web, an ecosystem in which ontology lexica are published, linked, and re-used across applications.
“We continue to be faced with the dual challenges of a somewhat nebulous understanding of the manifestation of a Semantic Web, and the consequences of not accepting its inevitability” (OpenLink software blog: Semantic-search-engine-optimization).
Semantic Web (or Web of Data) “refers to W3C’s vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, JSON-LD, OWL, and SKOS” (W3C).
The aforementioned Semantic Web formalisms provide crucial advantages for the publication of linguistic resources (lexical-semantic resources, annotated corpora, metadata repositories and typological databases), and for linking them (using RDF triples) with other resources available in the Linked Open Data (LOD) Cloud.
A typical case of lexical-semantic resources on the Semantic Web is DBPedia - a one of the major free data sets in the Web of Data providing a public data infrastructure for a large, multilingual, semantic knowledge base extracted from Wikipedia.
DBpedia makes the content/data of Wikipedia available in RDF and also incorporates links to other datasets on the Web, e.g., to Geonames, thus providing not only a better means of accessing data but also a tool for a much better user experience by delivering semantically enriched free and open data. The lexical data contained in DBpedia are crucial for expanding a LOD (sub-)cloud of linguistic resources named Linguistic Linked Open Data (LLOD) cloud.
The exploitation of DBpedia is challenging for natural language processing (NLP)-based applications because these latter require knowledge about how the ontology elements are verbalized in natural language.
In order to provide such knowledge at the required scale and thereby leverage the use of DBpedia in different applications, a lexicon - named DBlexipedia was constructed (originally for the DBpedia 2014 ontology) by means of existing automatic methods for lexicon induction. DBlexipedia is considered to be the first multilingual, automatically generated lexicon for DBpedia.
Just like DBpedia provides a hub for Semantic Web datasets, the DBlexipedia aims at providing a hub/nucleus for a multilingual lexical Semantic Web - an ecosystem in which ontology lexica are published, linked, and re-used across applications.
DBlexipedia provides English, German and Spanish lexicalizations for over 600 properties of the DBpedia ontology, and is a resource that can support NLP-based applications over the Semantic Web.
Lexicalizations published by DBlexipedia have been acquired automatically by means of M-ATOLL – a framework for the lexicalization of ontologies in multiple languages that extracts valid lexicalizations (in the lemon-format) for a given ontology in respect to a text corpus.
Both DBlexipedia and M-ATOLL are developed by the Semantic Computing Group at CITEC, Bielefeld University.
To publish its data, DBlexipedia uses the YUZU Framework - a micro-framework for publishing linked data for a variety of purposes.
# Click here to view some statistics about the data published by DBlexipedia.
DBlexipedia has been made available in the public domains as Creative Commons Zero resource: you can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
# A complete download of this resource is available here.
For further questions, please contact the main developer Sebastian Walter, a Research Associate in Prof. Dr. Philipp Cimiano's Semantic Computing Group at CITEC of the Bielefeld University.
Source:
Might also be of your interest:
- Open Knowledge Foundation’s Working Group on Open Data in Linguistics
- Ontotext (a complete set of semantic technologies enabling better content management, knowledge discovery and semantic search)
- Download DBpedia (2016-04) datasets in a variety of RDF-document formats
- OpenLink Software (cross-platform product portfolio addressing data access, integration, and management technology)
- Interchanging lexical resources on the Semantic Web
- W3C Open (standards) Web Platform
- A Web of People and Machines: W3C Semantic Web Standards