FAO Home

Rangelands West project: questions

Barbara Hutchinson is responsible for the Rangelands West project and she has the following questions / requirements:

PRELIMINARY QUESTIONS:

1)    Can we accommodate metadata from multiple partners without defining a standard?

· Introduced fields from future partners, i.e. map from one standard to another?

· Advantages/Disadvantages to modified Dublin Core vs. AGRIS AP – Why follow these standards?

· (NOTE: as of yesterday, we completed a draft of a tentative metadata format that incorporates aspects of D.C. as well as others – we can share if that would be of help)

2)    How do we implement AGROVOC controlled vocabulary?

· Can we just use a subset related to “rangelands”?

· Does Drupal need to be set up as a multilingual site to support AGROVOC?  Can we use only English terms for now?

· What challenges will we face when we want to use multilingual controlled vocabulary?

3)    Do we need to be OAI-PMH compliant or some other standard for interoperability?

4)    Should we consider incorporating automatic indexing of external websites (not entering metadata?)

5)    Need to consider options given we only have two part-time programmers available to implement system.


CORE FUNCTIONALITY FOR GLOBAL RANGELANDS

1)    DRUPAL PLATFORM (MySQL)

2)    Easily add, edit, update metadata from other partners – implies manual data entry or automatic (bulk) metadata

-  Access resource (links, phone, fax, email, and/or mail?)

3)    Harvest metadata from multiple sources, i.e. FAO full-text, journal archives,  current RangelandsWest metadata, future partners, CGIAR-related range resources

· Metadata source is easily identifiable (logo, branding)

4)    Store Digital Content (Images, Maps, PDF’s, Reports) (unrelated to metadata)

5)    Store Articles related to the metadata (Use Permission)

6)    Search capability that is “better than Google”

· Relevancy – discovery, avoid false positives

· Faceted Search

· Geographic, temporal

· Author, Date

· Fast!

7)    Browseable Categories (pre-canned searches)

8)    Browse content by format or some other category tbd (pre-canned searches)

9)    Make our metadata available

· Feeds, services (offer feeds of new content, etc.)

· File experts (BibTxt, RofWorks)

10) Other wish list items (partial)

a.      Customized home pages for local content (partner state sites)

b.     Social Networking applications (users engaged in development of site/forums etc)

c.      Options for mobile users; dial-up users?

d.     Print pages

e.      Leaving site notification

f.       Help tools

 

Etiquetas: 
AgriDrupal
metadata models
controlled vocabularies
Grupos:

Array

12 Mayo 2010
Valeria Pesce

Some answers to questions regarding Rangelands

Hi Barbara,
 
below I try to give my personal answers to some of your questions.
 
I don't think the real problem in your case is the metadata standard you are going to use (I would say the "metadata model" you design for your system): if this model is flexible/granular enough, you can probably map it to the most used metadata standards (see replies to your first questions). The real challenge in my opinion will be semantics: you can harvest all the metadata, map them to your model and store all records in a coherent way, but semantic indexing will be different in most of them and setting up semantic navigation / search will be tough.
I see that Matt and Wolfgang are "lobbying" for free tags, so that they can just accept what is coming from the sources: this is a very practical solution, but then you may have to work on this free tagging vocabulary to make some sense of it (Drupal has a "taxonomy manager" that can help wth this).
Since this is a common problem for any project that harvests metadata from diverse sources, we are working on some "automatic indexing" implementations (see answer to your question n. 4).

Is there any chance that the majority of your sources use NALT? (The BIG advantage of Agris is the fact that all sources use Agrovoc!)

 
PRELIMINARY QUESTIONS:

1)    Can we accommodate metadata from multiple partners without defining a standard?

In my opinion yes. Technically, it depends on the harvesting technology.
In general, the important thing in my opinion is defining a "metadata model" (independent on any serialization/vocabulary/syntax) in your system: in Drupal terms (for your team), a content type that defines a "document" or a "resource" in a very granular way so that the single elements can be easily mapped to XML elements from any formats.

As for technology, with RSS harvesting you can harvest from different sources and map their different metadata elements to the metadata elements defined in your system (in Drupal: the Feeds module). With XML, importing from different formats requires some work but John Fereira (copied) can explain how to do it with the Feeds module.
With OAI harvesting, theoretically you can harvest records in any metadata format, but we haven't seen this yet with Drupal.

·       Introduced fields from future partners, i.e. map from one standard to another?

Yes, see the "mapping" approach above: you can map the different standards used by partners to your metadata model. You can use the same mapping approach for exports in different standards.

·       Advantages/Disadvantages to modified Dublin Core vs. AGRIS AP – Why follow these standards?

Well, they are exchange formats, they are useful only if you want to exchange records with systems that use them :-) As for DC vs. Agris AP, Agris AP "conveys" more information, but in your case the main reason for using it would be to participate in the Agris database and harvest from Agris centers if this is of interest to you.

·       (NOTE: as of yesterday, we completed a draft of a tentative metadata format that incorporates aspects of D.C. as well as others – we can share if that would be of help)

Is this metadata format something you will use internally or something you want to use for exchange? I think that for internal management you should create a model with which you feel comfortable and that is likely to accommodate the essential elements from the most used standards, then when it comes to import/exports I think implementing one or more standards is definitely a good idea.
 
2)    How do we implement AGROVOC controlled vocabulary?

The Drupal Agrovoc interface is already implemented in our current version of AgriDrupal. We can send the code to Matt and Wolfgang.

·       Can we just use a subset related to “rangelands”?

If you want to do this, you need to manually create this subset. IIT Kanput did something similar for the crops they work on, but it was a scientific work that lasted months... Unless there is a simple "tree" directly under "Rangelands" in Agrovoc...

·       Does Drupal need to be set up as a multilingual site to support AGROVOC?  Can we use only English terms for now?

Yes, we use only English in our current version of AgriDrupal.

·       What challenges will we face when we want to use multilingual controlled vocabulary?

We are adapting our Drupal Agrovoc interface to a multilingual system: simply, users can select the term in their language and the term is stored in Drupal in all languages (plus the term ID). Then you can use the different terms in the different languages for navigating / searching.
 
3)    Do we need to be OAI-PMH compliant or some other standard for interoperability?

I have no idea how many of the sources you are planning to harvest are OAI-PMH data providers, but if they are or if  they plan to be in the future, this would be a very good reason to build an OAI harvester.

As for being an OAI data provider, this would be a plus for all those who want to harvest from you. Our current version of AgriDrupal provides a basic OAI data provider for the bibliographic records in the system, but only in DC unqualified. 

4)    Should we consider incorporating automatic indexing of external websites (not entering metadata?)

Are you planning to index web pages? Or full text documents? We are working on an automatic indexing solution with our partners in IIT Kanpur (see the AgroTagger prototype) and in China but a good solution for Drupal might require a few months...
 
CORE FUNCTIONALITY FOR GLOBAL RANGELANDS

...

3)    Harvest metadata from multiple sources, i.e. FAO full-text, journal archives,  current RangelandsWest metadata, future partners, CGIAR-related range resources

Some harvesting/import techniques (RSS, .csv files, XML) are well supported in Drupal. The most difficult part is the semantic organization of the records you harvest

...

6)    Search capability that is “better than Google”

Drupal default search functions are not very efficient, but using Views filters and Faceted search features you can get very good filtering / browsing interfaces.

7)    Browseable Categories (pre-canned searches)

Implemented by default in Drupal

8)    Browse content by format or some other category tbd (pre-canned searches)

As above