BASE – a powerful search engine for Open Access documents

BASE – a powerful search engine for Open Access documents

Friedrich Summann

Bielefeld University Library


BASE (Bielefeld Academic Search Engine) is an OAI-PMH service provider.harvesting as many relevant OAI repositories as we can determine. Currently more than 2300 repositories with more than 37 Million items of document metadata from all around the world are covered.

Technically BASE harvests OAI-PMH interfaces of repositories, digital collections, and electronic journals from universities, research institutions and international organizations. Besides scientific publications the metadata objects include research data, images, audio and video material, maps and related material. Thus the BASE search engine includes all kind of disciplines with an emphasis on those which have a stronger history of open access and electronic publishing.

Since OAI-PMH interfaces show a broad variety of stability, protocol behavior and metadata quality BASE has gathered a broad scope of expertise in protocol handling and metadata processing through the years  We try to normalize and enrich the metadata in order to optimize the quality in several processing steps. In this context the BASE team has built a strong scope of expertise in all kind of OAI-PMH activities and shares it with the OA community.

Since most repositories do not make the difference between open access and restricted  material or references and its not possible to find out automatically BASE has a percentage of estimated 20 to 30 % percent entries without a linked fulltext. We try to make this difference more transparent and visible in the near future.

BASE tries to combine search engine functionaliy (including field search, truncation, sorting, search history, drill-down) with the advantages of bibliographic metadata. Some repositories deliver semantic information, especially keyword, subject headings and classification codes. We have built up a scenario to utilize this information to establish a knowledge-base for the categorizaton of BASE documents into DDC categories (Dewey Decimal Classification). This allows a automatic classification of documents with a certain amount of  metadata (abstract information especially) and a subject-based browsing functionality.

How to Join?

This presentation is part of a one hour session entitled Search Engines for Open Access Web Resources containing 3 short presentations that will take place on Thursday 25 October 2012 (4:00-5:00pm Rome time). To join this session, visit the main page for the Open Acces Week @ AIMS webinars.