The aim of this report is to review the current state of the art in content-based image retrieval (CBIR), a technique for retrieving images on the basis of automatically-derived features such as colour, texture and shape. Our findings are based both on a review of the relevant literature and on discussions with researchers and practitioners in the field.

Content-based Image Retrieval

This webpage has been archived. Its content will not be updated. View web retention policy

The aim of this report is to review the current state of the art in content-based image retrieval (CBIR), a technique for retrieving images on the basis of automatically-derived features such as colour, texture and shape. Our findings are based both on a review of the relevant literature and on discussions with researchers and practitioners in the field.

Executive summary

The need to find a desired image from a collection is shared by many professional groups, including journalists, design engineers and art historians. While the requirements of image users can vary considerably, it can be useful to characterize image queries into three levels of abstraction: primitive features such as colour or shape, logical features such as the identity of objects shown, and abstract attributes such as the significance of the scenes depicted. While CBIR systems currently operate effectively only at the lowest of these levels, most users demand higher levels of retrieval.

Users needing to retrieve images from a collection come from a variety of domains, including crime prevention, medicine, architecture, fashion and publishing. Remarkably little has yet been published on the way such users search for and use images, though attempts are being made to categorize users’ behaviour in the hope that this will enable their needs to be better met in the future.

Current indexing practice for images relies largely on text descriptors or classification codes, supported in some cases by text retrieval packages designed or adapted specially to handle images. Again, remarkably little evidence on the effectiveness of such systems has been published. User satisfaction with such systems appears to vary considerably.
CBIR operates on a totally different principle from keyword indexing. Primitive features characterizing image content, such as colour, texture, and shape, are computed for both stored and query images, and used to identify (say) the 20 stored images most closely matching the query. Semantic features such as the type of object present in the image are harder to extract, though this remains an active research topic. Video retrieval is a topic of increasing importance – here, CBIR techniques are also used to break up long videos into individual shots, extract still keyframes summarizing the content of each shot, and search for video clips containing specified types of movement.

Three commercial CBIR systems are now available – IBM’s QBIC, Virage’s VIR Image Engine, and Excalibur’s Image RetrievalWare. In addition, demonstration versions of numerous experimental systems can be viewed on the Web, including MIT’s Photobook, Columbia University’s WebSEEk, and Carnegie-Mellon University’s Informedia. CBIR systems are beginning to find a foothold in the marketplace; prime application areas include crime prevention (fingerprint and face recognition), intellectual property (trademark registration), journalism and advertising (video asset management) and Web searching. Both the Alta Vista and Yahoo! Search engines now have CBIR facilities, courtesy of Virage and Excalibur respectively.

The effectiveness of all current CBIR systems is inherently limited by the fact that they can operate only at the primitive feature level. None of them can search effectively for, say, a photo of a dog – though some semantic queries can be handled by specifying them in terms of primitives. A beach scene, for example, can be retrieved by specifying large areas of blue at the top of the image, and yellow at the bottom. There is evidence that combining primitive image features with text keywords or hyperlinks can overcome some of these problems, though little is known about how such features can best be combined for retrieval.
Standards development relevant to CBIR can be grouped under three headings – image compression, query specification and metadata description. By far the most important emerging standard is MPEG-7, which will define search features of all kinds for both still image and video data.

Our conclusion is that, despite its current limitations, CBIR is a fast-developing technology with considerable potential, and one that should be exploited where appropriate.

Read the final report below

Documents & Multimedia

Bookmark and Share
Summary
Author
John Eakins, Margaret Graham
Publication Date
1 October 1999
Publication Type
Topic