Automatic multi-label subject indexing in a multilingual environment

Abstract

This paper presents an approach to automatically subject index fulltext documents with multiple labels based on binary support vector machines(SVM). The aim was to test the applicability of SVMs with a real world dataset. We have also explored the feasibility of incorporating multilingual background knowledge, as represented in thesauri or ontologies, into our text document representation for indexing purposes. The test set for our evaluations has been compiled from an extensive document base maintained by the Food and Agriculture Organization (FAO) of the United Nations (UN). Empirical results show that SVMs are a good method for automatic multi- label classification of documents in multiple languages.

Lauser, Boris and Hotho, Andreas Automatic multi-label subject indexing in a multilingual environment., 2003 . In In Proceedings of the 7th European Conference in Research and Advanced Technology for Digital Libraries (ECDL 2003) (2003), Trondheim (Norway), August 17-22 2003. [Conference paper]