Controlled vocabulary are selected lists of terms and phrases (with guidelines for their use), that are used to populate metadata elements. The over-riding goal in using controlled vocabulary is to make the retrieval of resources and information through searches more efficient. Controlled vocabulary reduce ambiguity in language and help to ensure data consistency.
A thesaurus orders its words hierarchically. If you look up a particular term (for example, 'houses'), the thesaurus will offer broader terms (for example: buildings), narrower terms (for example: cottages), or related terms (for example palaces - terms which are different, but overlap in meaning).
Where there are different words with the same meaning (eg houses and dwellings), a thesaurus will also tell you which is the preferred term (for example: 'dwellings, USE houses'). The thesaurus’s hierarchical structure is intended to help find a suitable subject term at the appropriate level of detail.
The thesaurus entry may be listed in the format outlined below:
Controlled vocabulary word lists
Word lists are simply that – list of words that are created usually around one resource-type to aid consistency in data entry. Word lists are not co-ordinated like subject headings or organised hierarchically like thesauri. These sorts of vocabularies are, typically, simple alphabetical lists of terms or phrases. They’re also often created locally, for particular projects or institutions.
Traditionally, the main purpose of subject headings and thesauri terms was retrieval, while classification schemes were more about putting things ‘in their place’ on a shelf, in a box, or into a category. Generally (there have always been exceptions), an item would be assigned many different subject terms, but only one classification. This makes perfect sense in a physical world, but in the digital world there is no reason why something should not have more than one ‘location’. So the distinction between classifications and subject terms is beginning to break down.
Classifications are usually hierarchical: they start off with broad subject areas and then break them down into increasingly narrower topics. In this way they resemble thesauri, but classifications are generally much more rigid in their structure. While it is entirely feasible for a thesauri term to have more than one broader term (this is known as ‘polyhierarchy’), a classification scheme will break down its subject domain in just one way. Because of this, classifications offer a single ‘world view’, imposing a structure that is never going to satisfy every user. And, unlike thesauri terms, classification schemes declare their structural biases openly through the numbers and codes they employ.
For example, in the Dewey Decimal Classification (Dewey) resources on Buddhism are usually classified at '294'. These digits are meaningful: the 200s are for 'Religion'; the 290s, 'Other and comparative religions' (note that most of the numbers from 200-289 are devoted to Christianity); and the 294s, 'Religions of Indic Origin'. Here the nineteenth-century Western world view upon which the Dewey classification is based becomes apparent.
The classification scheme’s use of codes or numbers is the other important feature that distinguishes it from other kinds of controlled vocabulary, which are word-based. This coding can be used to advantage in a digital context, especially if it is based on a decimal system, like Dewey. Since numbers are more efficiently 'machine-readable' than words they can be used to advantage in searching. For example, searching for all the Dewey Classifications beginning with '2' would retrieve items relating to religion. They can also be used to generate hierarchical browsable web-interfaces: users might be shown the first 10 subject categories, then choose one of these to view 10 sub-categories, then one of these to look at the next level, and so on.
A note on using thesauri and subject headings
Sometimes people use thesauri to generate subject headings. The thesaurus example above could, for example, make the heading: 'buildings - houses - cottages'. This is not traditional indexing practice, which insists that thesauri terms are used at the appropriate level and do not include any broader or narrower terms. However, in the age of digital retrieval it can make good sense to include those terms somewhere. If only the term 'cottages' was added to a record, a search on 'buildings' is unlikely to retrieve it (unless the search software was quite sophisticated).
So including broader and narrower terms in the hierarchy would greatly improve the search results. Some cataloguing systems now do this automatically - if you choose a term from their thesaurus broader and narrower terms are automatically added into the catalogue record. This kind of practice is blurring the distinction between thesauri and subject headings.
Using a controlled vocabulary
There are many vocabulary sources and tools available and there are various ways in which they can be implemented, such as:
- Using an existing controlled vocabulary as it is
- Adapting or customising a vocabulary - eg deciding to use a classification or thesauri to a particular level of detail
- Developing your own vocabulary - not recommended, but sometimes the best solution
- Using 'uncontrolled' vocabulary - ie keywords entered by your cataloguers or, more radically, your users
In choosing a vocabulary, some other things are useful to consider, for example:
- Your users - are the terms used going to be meaningful to them?
- The nature and extent of your collection - if your collection is small, you’re unlikely to need a highly detailed vocabulary
- The skills and available time of your cataloguing staff - some of these vocabularies will require experience or training to use properly
- Your community - it makes good sense to use vocabularies that similar collections are using
- Copyright issues - you may need to check whether permission or a license is required to use the vocabulary in the way you wish to