Nearly all repository managers aim to offer a discovery and access interface for scientific research. In a recent study, @tmire examined the indexing of repositories by Google Scholar. The use of EPrints, Digital Commons and DSpace is recommended in Google's Inclusion Guidelines for Webmasters. Configuration issues like the metadata fields used and how these are exposed by the repository highly affect the compliance with the Google's inclusion guidelines.
Methodology and results
Google Scholar crawls automatically repositories without the need for intermediary repository staff. The challenge is that repository managers are left in the dark, not knowing why their repository or specific items are not being crawled and indexed. This study used O'Brien and Arlitschapproach, and it was found that the average indexing ratio for 10 recent DSpace repositories was 64.8%. Another approach was used to get a sense of which items are included and which ones are missing. They selected items from five repositories and explicitly searched for their titles in Scholar to see if they would receive a hit from the repository. Older items had higher chances to be found than newer items.
The study concluded that Google Scholar indexing is still much of a black box today, improving repository coverage could be particularly challenging for repository managers. Google Scholar indexing and the associated ratios are likely to further improve for DSpace 4 repositories. This recent release of DSpace included several enhancements explicitly requested by the Google Scholar team.The study postulated that Google Scholar crawler 'should' find it easier to retrieve recent submissions in DSpace 4 repositories.
Read the full article here
- Arlitsch,K and O’Brien,P (2011)Invisible institutional repositories: addressing the low indexing ratios of IRs in Google.Library Hi Tech Vol 30 (1) : 60-81.
- Subirats-Coll, I and Malapela, T and Dister, S and Zeng, M and Goovaerts, M and Pesce, V and Jaques, Y and Anibaldi, S and Keizer, J (2012) Reorienting open repositories to the challenges of the Semantic Web: Experiences from FAO’s contribution to the resource processing and discovery cycle in repositories in the agricultural domain., 2012 . In 6th Metadata and Semantics Research Conference (MTSR 2012), Cádiz, Spain, 28-30 November 2012.