The Schema System

Abstract

The Schema System embodies a knowledge-based approach to scene interpretation. Low-level routines are applied to extract image descriptors called tokens, and these tokens are further organized by intermediate-level routines into more abstract structures that can be associated with object instances. The thousands of tokens that are extracted from an image can be grouped in a combinatorially explosive manner. Therefore, knowledge in the Schema System is not limited to the descriptions of objects; it includes information about how each object can be recognized. Object schemas control the invocation and execution of the low-level and intermediate-level routines with the goal of forming hypotheses about objects in the scene. The system described produces image interpretations based on two-dimensional reasoning, although nothing in the system organization and control strategies precludes the inclusion of three-dimensional information.

The schema framework exploits course-grained parallelism in a cooperative interpretation process. Schema instances run concurrently, and an object schema often has available a variety of strategies for identification, each one invoking knowledge sources to gather support for the presence of a hypothesized object. Interschema communication is carried out asynchronously through a global blackboard. In this way schema instances cooperate to identify and locate the significant objects present in the scene.

This paper first discusses the design of the Schema System with regard to the issues mentioned above, and then describes in some detail how that design is put into practice. The system uses the operators and algorithms of the VISIONS system for knowledge sources, and complex strategies for controlling the low- and intermediate-level KSs have been implemented. The ISR, an intermediate token database tuned for associative and spatial queries, is used for storing and manipulating image data. The result is an integrated, knowledge-directed system composed of modular knowledge structures that produces a two-dimensional interpretation of a digitized image. Interpretations of seven images from two natural domains are presented.

Sorry, no version of this paper is available on-line. Interested readers are refered to Vol. 2 of the International Journal of Computer Vision, pages 209-250. (1989)