Sorting out the Web: Approaches to Subject Access

Brian Vickery (Oxford, UK)

Program: electronic library and information systems

ISSN: 0033-0337

Article publication date: 1 September 2002

137

Keywords

Citation

Vickery, B. (2002), "Sorting out the Web: Approaches to Subject Access", Program: electronic library and information systems, Vol. 36 No. 3, pp. 209-209. https://doi.org/10.1108/prog.2002.36.3.209.4

Publisher

:

Emerald Group Publishing Limited

Copyright © 2002, MCB UP Limited


The focus of this book is subject access to World Wide Web resources. It is basically aimed at students learning how to “sort out the Web”, but will also be of interest to information professionals wishing to widen their knowledge of such subject access. The book is primarily a descriptive study of methods currently in use on the Web, with some analysis of deficiencies and problems. The successive chapters cover metadata (as a means of facilitating the introduction of subject description), classification, controlled vocabularies (subject headings and thesauri), search engines, and a brief look at procedures under development (such as machine‐aided indexing, automated text processing, text mining and visualisation).

Each chapter provides a clearly written and simple survey of the techniques involved, with a good and well‐illustrated range of examples from the Web. The book certainly fulfils its intended purpose, of displaying and explaining the variety of subject access tools available (say, as of mid‐2000), though each is treated only in a brief and introductory way. The emphasis is on general subject‐access tools, such as standard library classifications, subject heading lists and thesauri, and the more general search engines, though the value to specialist users of specialist tools is acknowledged. The term “taxonomy” gets into the text, but is not explored, and the Berners‐Lee vision of a “semantic Web” does not make an appearance. There are few references beyond 1999, but the author’s Web site contains later references and many links to search tools.

I think the book could have benefited from a clearer analysis of the search problem that the Web poses. Google, for example, claimed in 2002 to be covering over 2,000 million Web pages, perhaps averaging 10,000 or 20,000 characters each, and in which each word appears to be indexed (Google provided 2,500 million hits for the word “the”). By contrast, Yahoo!’s classification covers assuredly less than a million Web sites (I counted about 75,000 for science and medicine). It is clear that the two approaches are trying to work at two different levels of access, so that full‐text indexing and classification cannot be seen as alternatives. The real problem, as indeed Schwartz recognises and discusses, is to find new ways of reducing and concentrating the output from full‐text search engines, so that the process of browsing through that output becomes manageable.

In summary, the book is a useful and reasonably up‐to‐date introduction to the methods and problems of subject access on the Web, and for this purpose is to be recommended.

Related articles