BS 8723 Structured Vocabularies for Information Retrieval: Part 1: Definitions, Symbols and Abbreviations, and Part 2: Thesauri

Jennifer Rowley (School for Business and Regional Development, University of Wales, Bangor, UK)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 1 May 2007

484

Keywords

Citation

Rowley, J. (2007), "BS 8723 Structured Vocabularies for Information Retrieval: Part 1: Definitions, Symbols and Abbreviations, and Part 2: Thesauri", Journal of Documentation, Vol. 63 No. 3, pp. 428-431. https://doi.org/10.1108/00220410710746835

Publisher

:

Emerald Group Publishing Limited

Copyright © 2007, Emerald Group Publishing Limited


Reviewing a British Standard is an interesting endeavour, partly because it is a document with a very distinctive purpose, but also because its drafting has been informed by a group of experts with a considerable collective expertise in the area. Accordingly the primary purpose of this review is to alert the library and information management community to the publication of this important new standard covering vocabularies for information retrieval, and more specifically thesauri.

A clear statement of the bibliographic lineage of this standard might be a good starting point. BS 8723 seeks to replace BS 5723:1987 and BS 6723:1985, which were developed in a largely paper‐based era and were concerned only with thesauri. BS 8723 is wider in scope and recognises that today's thesauri are mostly electronic tools that support indexing and retrieval in electronic environments.

BS 8723 will be published in five parts:

  1. 1.

    Part 1. Definitions, symbols and abbreviations.

  2. 2.

    Part 2. Thesauri.

  3. 3.

    Part 3. Vocabularies other than thesauri.

  4. 4.

    Part 4. Interoperation between vocabularies.

  5. 5.

    Part 5. Interoperation between vocabularies and other components of information storage and retrieval systems.

Parts 1 and 2 together correspond broadly to ISO 2788:1986, and Parts 1 and 4 together will correspond to ISO 5964:1985. It is anticipated that further work will lead to the corresponding revision of ISO 2788.

Only Parts 1 and 2 of BS 8723 have been published to date, so they are the focus of this review.

The drafting of a standard is often a relatively lengthy process, and at best proceeds in parallel with development in the practices and systems that it is designed to influence. In a rapidly changing global digital environment this has both positives and negatives. On the negative side, it is likely that no standard will be entirely up‐to‐date even when it is published let alone in the years after publication when it may act as a platform for practice. Further, it is difficult to accommodate practices across different counties, and even to align standards such as those issued by the BSI, the ISO and the ANSI. So application of the standard in many contexts may involve an element of adaptation, which some might argue undermines the objectives of the standard in the first place. On the positive side, in a rapidly changing environment, standards have a very important role to play in offering a general codification of concepts and practices which offers a framework for further work. Furthermore in a digital age in which the interoperability of metadata is of central concern any standards that make a contribution toward interoperability might be seen to have an increasingly significant role. This standard seeks to balance the perpetuation of well‐established principles of thesaurus construction, whilst adapting the standard to suit the current electronic environment. This standard is not so much timely, but probably long overdue.

The new standard recognises that as electronic tools, thesauri are built and maintained with the support of software and are integrated with other software such as search engines and content management systems. Whereas in the past thesauri were designed for information professionals trained in indexing and searching today they are more likely to be used by untrained users. The scope of application of thesauri has been extended to a wider range of contexts, such as museum collections, and image databases. Further, as the Internet allows simultaneous searching across resource collections that have been indexed using different vocabularies the interoperability promoted by standards has become all the more important. The standard seeks to support the development of structured vocabularies in a range of different contexts, including post‐coordinate retrieval systems, hierarchical directories, pre‐coordinate indexes and classification systems.

Part 1 of the standard effectively defines the “language” of structured vocabularies. Section 2 offers brief definitions for a range of useful terms, such as “scope note”, “schedule”, and “search thesaurus”. Section 3 offers a table with symbols and abbreviations, and their meanings. Some of the meanings are defined in Section 2, whilst other are defined in the table in Section 3, but others are taken for granted. For example nowhere is there a definition of “narrower term” or “narrower term (partitive)”. Overall most of what is included here is what might be expected, although there must be a great difficulty in deciding the scope of such a list, especially when later parts of the standard that might introduce different terms have yet to be written.

Part 2 is a much more substantial document which offers recommendations for the development and maintenance of thesauri in a wide range of different information retrieval applications. Part 2 contains the following key sections:

  • Section 6. Covers thesaurus terms, in terms of the scope of preferred terms, the grammatical forms of terms, capitalization, punctuation and special characters, and selection of the preferred from.

  • Section 7. Covers complex concepts, such as the nature of multi‐word terms, working with complex concepts, the retention of constituent concepts, consistency in the treatment of complex concepts, and, order of words in multi‐word terms.

  • Section 8. Covers relationships between terms, such as equivalence relationships, hierarchical relationships, associative relationships, and customized relationships.

  • Section 9. Covers facet analysis.

  • Section 10. Covers presentation and layout, including single record display, alphabetical display, hierarchical display, classified display, displaying poly hierarchical relationships and introduction to the thesaurus.

  • Section 11. Covers thesaurus functions in electronic systems.

  • Section 12. Covers management aspects of thesaurus construction, such as planning a thesaurus, early stages of compilation, managing construction, and dissemination.

  • Section 13. Covers updating, including review procedure, the nature of changes, and the dissemination of updates.

  • Section 14. Covers the requirements of thesaurus management software, including size and character limitations, inter‐term relationships, term notes, codes and notation, node labels, data import/export, editorial navigation and support, and editorial safeguards.

Taken together these sections range from: the basics of selecting thesaurus terms and indicating the relationships between them; to, many of the practicalities of managing an evolving thesaurus and its application. The style is generally instructional, and advisory, rather than dictatorial, as in:

Each preferred term included in a thesaurus should represent a single concept … A concept may be expressed by a single‐word term or a multi‐word term.

Misspelt entry points should be provided only where …

The style generally tends to be more assertive in the earlier sections where an attempt is being made to encourage standardisation of thesauri and their presentation. In places the style is reminiscent of AACR2, where the thesaurus creator is being offered guidance on how to apply general principles:

… a complex concept term should not be split if the following conditions apply.

Later sections on presentation and layout, management and use of thesauri in electronic systems tend more towards a style that is about giving information and advice, and these sections offer useful guidance on the broader aspects of thesaurus construction and management. It is in these sections that the issues related to the design of thesauri for use in electronic systems are the most in evidence, and there are welcome comments on presenting a thesaurus on screen, and the facilities necessary for browsing and searching the thesaurus. One limitation is arguably the limited reference to the use of thesauri for automatic indexing.

Related articles