Agrifeeds – A closer look at the Drupal modules (part 2)

Two Drupal modules have been written purposely for Agrifeeds, so to better understand what is happening behind the scenes, they will be briefly described. Both modules can work on any other Drupal website, as they have been designed for a generic use.

Automatic Tagger

Automatic Tagger was born out of the necessity of collecting more data about the country a particular feed item was about. It basically creates a widget for a Taxonomy Reference field, which will search the text of a node, when saved, for the taxonomy terms that are contained inside the vocabulary specified in the settings. It works alongside the modules Synonyms and i18n Taxonomy, if they are installed, but does not depend on them.

Synonyms is self-explanatory: it allows to enter synonyms for some terms. This was particularly useful for Agrifeeds, as some countries have official names that are rarely used in text. For instance, it is much more probable to find the word Syria, than Syrian Arab Republic.

i18n Taxonomy allows for vocabularies to be translated. Automatic Tagger will look for those translations by looking at the node’s language. If there isn’t a translation it will use the term in its source language. As of right now, the module only works with localized vocabularies, but further work on this is planned.

All the settings can be found when editing the field in Manage Fields. The first that should be set are the vocabulary to be used, and the number of tags it should find. If the terms that should be searched for are a subset of the vocabulary, there is the option to set the minimum and maximum depth of the terms that should be used.

Additionally, if the term that should be used is contained in the description, or any custom field that has been added to the vocabulary, the user can choose any combination of these to use when searching. It will accept both text fields and taxonomy term reference fields.

Other settings allow for case insensitive search, can disable the widget in case tags are already present, and let the user decide whether or not to use synonyms, added with the Synonyms module.

Finally, the user can decide between two different algorithms to use when searching for terms. The first one will use the terms that are found first in the text. The second algorithm instead will count how many times each of the terms appear in the text, and insert those that appear more often.

Of all the fields that are present in a content type, Automatic Tagger will search all those governed by the text module (i.e. it will look in the “Text”, “Long Text” and “Long Text and summary” fields, unless more are defined by other modules), plus the title of the node. The weight of the field in Manage Fields determines the order in which they will be searched.

The module will check that the term it has found is not part of a bigger word (e.g. Chinatown would not be tagged China), and it will not allow duplicate tags. If the source of what it should look for is a term reference, it will look for its name, and if found, tag the node with the original term.