FAO Home

Proposal for a Drupal Workbench Module

[To be revised. The Agrovoc or generic SKOS browser functionality can be separated from the scope of this module and is covered by the SKOS browser module that will be developed for Agrovoc and for the BioTech Glossary]
 
After a first experiment with an Agrovoc search/index Drupal module based on the Agrovoc web services, we are now thinking of a more advanced Workbench module that implements similar search/index functionalities but:
  1. calls the new Workbench web services and stores RDF records;
  2. supports searches of both Agrovoc concepts and Authority Files records (for the moment, just journals, but it should be extensible);
  3. stores URIs and labels (and some additional values) in referenced nodes (see attached file) instead of taxonomy fields: this is more compatible with the RDF approach that for instance DERI uses.
This would basically add support for the integration of external authority files both for subject indexing and for journals (for the moment).
 
Below is a detailed proposal of two possible ways of implementing this new module. It's a long read, only for those really interested, but feedback on which way to go would be appreciated: if you like, you can just look at the advantages and disadvantages paragraphs.
 
If there is no strong opinion on which way to go, I would opt for the first one. One question would be whether we want to invest in such a module now, considering that in a year or so we should be able to implement these functionalities using Drupal RDF support and a foreseeable SPARQL engine for the Concept Server.
 
However, implementing the module in the way(s) suggested below is completely compatible with the Drupal RDF approach, with the big advantage that a module can be easily distributed to any Drupal user even if not an RDF/SPARQL expert, while the implementation of all functionalities using only Drupal RDF modules requires writing the SPARQL queries, defining the mapping to CCK fields, setting up a dynamic interface for the SPARQL query and finding a way to run the query and store the records at the moment of indexing a node...
So, do we all agree that we should go ahead with this module?

Proposal for a Drupal Workbench Module

This module would incorporate the following functionalities:
  • Workbench Agrovoc search/index
  • Workbench Authority Files search/index
  • Workbench Agrovoc navigator (search/browse/hierarchy) (especially for AIMS)
The Workbench Agrovoc search/index functionality will have similar features and a similar interface to the basic Agrovoc search/index module already implemented but will retrieve the data from the Agrovoc Concept Scheme, it will be RDF-based and it will support suggesting new terms.
 
The Workbench Authority Files search/index functionality will work as the one above, but will only look up records from a specific authority file.  
 
In addition, both functionalities above require a specific workflow for storing new values locally while proposing them to the Workbench, getting temporary URIs and periodically checking for the final URIs of approved values (see below, point 1.3).
 
The Workbench Agrovoc search/browse/viewer will implement functionalities similar to the ones now implemented in AIMS: it will allow to create pages with specific Agrovoc views (e.g. search, browse, hierarchy)
 
The functionalities to implement are many, so I would distinguish between a version 1 that includes all essential functionalities and a version 2 with additional features. The Workbench Agrovoc search/browse/viewer could be implemented in version 2.
The proposals below focus on the first two functionalities.
 
Two proposals for the implementation (although the first one is probably to be preferred at the moment):
 
1. Implementation based on the Workbench web services
 
This implementation has a workflow that is very similar to the basic Agrovoc search/index module already developed, only the underlying technology would change, queries on the Authority Files triples would be added and a specific workflow for storing local/suggested values would be added. 
 
When the module is enabled: The module automatically creates a specific content type for each entity that it expects to retrieve from the Workbench (first version: Agrovoc concept and Journal): these content types will have fields that store the URI and the essential properties of the concept: the title will be the URI and other essential fields will be the labels in all languages enabled in the Drupal installation, the description, a “temporary” field to indicate if the concept is an approved one (final URI) or a suggested one (temporary URI), and the relations with other concepts (it can be decided if all relations will be stored as generic relations or if also the types of relations will be stored: version 1: only generic relations).
 
Relations between concepts will be implemented as node references and the only mandatory field for a concept will be its URI, so that new nodes can be added as referenced nodes on the fly, by just giving the URI, and the remaining information will be added when that concept is actually retrieved by a user to index something. The URI being the title of the node, node references among concepts will reproduce the original relations between concepts.
(This storing of RDF results into a content type structure is in line with the Drupal RDF modules developed by DERI, one of which, the RDF SPARQL Proxy module, stores results into CCK nodes mapping the fields returned by a SPARQL query to the CCK structure: see a description of this module and this approach in: http://openspring.net/sites/openspring.net/files/corl-etal-2009iswc.pdfIn this way, concepts and journals can be referenced by any node and can be managed with Views, so that only concepts and journals that actually have labels will be displayed.
 
The module should also automatically create two Views (one for each type of entity, concept and journal) that show URI, description and labels of ONLY the records that have labels (records that only have URIs are there only as related concept, but are not to be considered until a user actually uses them for indexing and therefore retrieves them). These Views will be used by the node reference field to first check if a concept is already in the system before looking up Agrovoc.
 
When a user creates a content type where he wants to include a field for Agrovoc concepts and/or a field for journal:The user should select a field of type “node reference” with the only option to reference (not “create and reference”) and with cardinality configured as “unlimited” and select the appropriate content type among one of those created by the module (Agrovoc concept or journal): the module should automatically intercept the selection of such types of node reference and include a “Search Concept Scheme” link above the node reference field.

When a user creates a new node of the above content types:
The user can either first search among local concepts (those already stored retrieved from the Concept Scheme) through the standard node reference field, or just click on the “Search Concept Scheme” link and open the search popup.
The popup window allows to search existing Concept Server concepts by label, but also allows to locally create and suggest new concepts by calling the corresponding web service and getting a temporary URI: concepts created in this way are listed together with the selected ones in the popup for final confirmation or removal before clicking on the “Import and reference” button that closes the popup.

(Version 2: the search interface will also exploit RDF to allow navigation among concepts and / or suggestion of related concepts). Selected and suggested concepts are then stored in nodes of the corresponding type (Agrovoc concept or journal) and automatically referenced in the multiple node reference field (which can only reference existing nodes, not create new ones on the fly, otherwise users could try to type new values in it without checking for valid URIs).

As per the normal functioning of node reference fields, selecting a concept that is already in the system just creates the reference to that node, so the retrieval of a concept that is already in the system does not create duplicates.

The new suggested concepts (temporary URI + proposed label(s) + optional proposed description) will be stored like the others but the “temporary” field will be checked. The module also has to implement a procedure to periodically synchronize temporary URIs of suggested concepts with the final URIs assigned if and when a concept is approved (Imma can give details regarding the appropriate web services and related workflow).
 
Advantages: it can be implemented immediately; it allows to call the web services for suggesting new terms in the Authority Files, which is not possible with solution 2.
 
Disadvantages: it uses the Workbench web services, so it is tied to them: only triples available through these web services can be queried and stored.
 
2. Implementation based on Drupal RDF SPARQL modules
 
The module would do the same as the implementation above but instead of calling the Workbench web services it would query the Concept Server through SPARQL queries. This implementation would make full use of the DERI RDF SPARQL Proxy module (http://drupal.org/project/rdfproxy and http://openspring.net/sites/openspring.net/files/corl-etal-2009iswc.pdf): it would define different mappings between the Concept Server schema (and potentially any other schema) and the local content types (concept, journal, but potentially many others which may store values from other RDF stores).

This requires the availability of the Concept Server behind a SPARQL engine, which I think is foreseen anyway. 
When the module is enabled:
All as in implementation 1, but this module would also automatically create the SPARQL Proxy mappings between the Concept Scheme schema and the created content types (version 2: it would also allow users to create new content types and corresponding mappings that can be used against other RDF stores).

When a user creates a content type where he wants to include a field for Agrovoc concepts and/or a field for journal (or link to any other node defined in the step above):
All as in implementation 1 (addition of the “Search Concept Scheme” link).

When a user creates a new node of the above content types:
All as in implementation 1, but the popup window should load the appropriate mapping (“proxy”) according to the content type selected for the node reference, first run the correspondent SPARQL query without storing the resulting triples, and then only store the triples that the user has selected.
The workflow for suggesting terms and synchronizing temporary / approved URIs should still be implemented through web services, unless the Concept Scheme SPARQL engine allows for insert queries. 

Advantages: it is a more general approach: since through the RDF SPARQL Proxy module different mappings to different RDF sources (schemas) can be configured, the module could dynamically load different mappings and search and store triples from different sources (e.g., using the same module one could have a geographic indexing field in a content type that queries the geo-political ontology and stores triples form there in a referenced node)
 
Disadvantages: it cannot be implemented until the Concept Server is available behind a SPARQL engine; implementing the workflow for suggesting terms and synchronizing temporary/approved URIs can be more difficult or impossible through SPARQL queries, while there are dedicated Workbench web services for this (Imma can give more details).

صوت

Array

28 مايو 2010
Imma Subirats

Authority Control Webservices - Journals

With regard to the Journals, the basic actions that an user should be able to do while describing resources in AgriDrupal arte:

1.) While the user is creating a new record, he should choose the journal title from an authority list.

Description: The webservice developed so far provides the search by journal in any language and with different criteria (e.g. exact match, containing, starting with, ending with, etc.). The result of the search lists all pertinent Journals visualizing the LABEL in different languages, publisher and ISSN. The user should choose one or more values and get the URI and ISSN together with Journal's label/s.

Method: getTermExpansion(string ontologyName, string searchString, string format, string searchCriteria, string lang) + getPossibleRelationsFromConceptURI(string ontologyName, string conceptURI, String relationType) + getRelatedValuesFromConceptURIRelationURI string ontologyName, string conceptURI, String relationURI)

http://202.73.13.50:54123/ACSWWebserviceV1Client/sampleACSWWebServiceProxy/TestClient.jsp

Numbers. 19, 13 and 14

 

2.) If the Journal title is not in the authority list, the user should suggest a new term

Description: The web service allows users to suggest a new term in any language. The new term is inserted in the workbench with status “suggested” and is assigned a draft URI. The term is validated through the workbench, using normal workflow procedures at a later stage. If the suggestion is accepted the URI becomes final and the other languages are provided. If the suggestion is rejected the URI is marked as "rejected" in the workbench and marked with a reason code, so automatic reconciliation procedures can be run to replace the rejected URI with the correct one. The draft URI never is deleted from the workbench and is never be used for any other term.

Method: http://202.73.13.50:54123/ACSWWebserviceV1Client/sampleACSWWebServiceProxy/TestClient.jsp Number 34

 

3.) Reconciliation procedure should run at regulars intervals to clean up data with temporary URIs. The web service allows extracting all records that have changed for given period of time.

Method: http://202.73.13.50:54123/ACSWWebserviceV1Client/sampleACSWWebServiceProxy/TestClient.jsp Number 33

28 مايو 2010
John Fereira

An alternative to rdfproxy

A former colleague of mine (although he's doing some contract work for us from Thailand) just sent out a screencast of some of the work that he's been doing with Vivo/SPARQL/RDF.  The last part of the video demonstrates an alternative to using the rdfproxy module but it's all pretty impressive.  I'm going to be talking with him to see how difficult it would be to generalize his work for *any* rdf store rather than just Vivo.  Have a look:

http://milesworthington.com/screencasts/vivodata_5-26-10.html

 

 

 

28 مايو 2010
Valeria Pesce

Vivo/SPARQL/RDF and DERI SPARQL Proxy module

This Vivo/SPARQL/RDF solution looks very powerful indeed! The approach looks quite similar to the DERI module, but this solution seems to automate many things.

Although I like the idea of being able to flexibly define content types and mappings.

However, these solutions look more promising for other projects like Agris 2010 because their objective is harvestign and storing.

I think for the Workbench search/index functionality we would have to use a very similar approach but make the queries dynamic, load them on demand from the search interface, show the results and above all allow to select / discard the results before storing them (+ automatically reference them from the current node). Not a huge thing, but I think we need a new module anyway.

If the DERI people are planning to maintain the proxy module, that seems flexible enough to allow us to build our module on their CCK/query mapping mechanism.

27 مايو 2010
Johannes Keizer

Magnificent proposals!

I am principally for the 2. alternative.  AGROVOC will be available as a SPARQL endpoint from  next week on, but the structure will change still. We have to discuss the implications.  THis is very relevant to my discussions with the SEMIC-EU people today

28 مايو 2010
John Fereira

Agrovoc via SPARQL

This sounds really interesting.  As soon as the endpoint is available could you share it with me.  I'd like to do some testing with it using the sparqlimporter module I developed.

My only concern about the 2nd alternative is that the rdfproxy module is still "dev" code and it doesn't appear that there is a lot of activity on the project (last update was in Nov. 2009).  There are very few issues, and other than the last one about a week ago (which was actually a bug report with a recommended fix) the previous issues were from December and none of those patches have gone into a new release. 

28 مايو 2010
Valeria Pesce

regarding the RDF SPARQL Proxy module

Hi John,

regarding the RDF SPARQL Proxy module, the good thing could be that collaboration between our group and the authors of that module (DERI Galway) is foreseen: see here.

 

28 مايو 2010
John Fereira

Deri - FAO collaboration

I am getting a permission denied error when trying to access that link even though I'm logged in.

29 مايو 2010
Valeria Pesce

private post for another group

Sorry, that message was posted to another group as private... But it relates to AgriDrupal so perhaps Johannes will add this group?