Recognition efficiency issues for freehand sketches Tevfik Metin Sezgin
MIT Artificial Intelligence Laboratory, 200 Technology Square, Cambridge MA, 02139 USA
1. Introduction
digital logic circuit sketches, if the user is sketching an RSflip-flop circuit composed of two N AN D gates and wires,
Sketch understanding has received attention as an en-
the editing behavior of the sketching interface may depend
abling technology for natural human-computer interaction
on its operand (e.g., when the eraser part of the stylus is
(Thomas Stahovich & Randall Davis, 2002).
used on wires, it deletes parts that it touches; when used on
widespread availability of pen based PDAs, and more re-
the gates it deletes the whole all at once). Because, widely
cently with the emergence of Tablet PCs, there is an in-
used computer vision algorithms such as interpretation tree
creasing interest in sketch recognition. Current approaches
search and subgraph isomorphism perform were not devel-
to sketch recognition treat sketches as static images and
oped with this “recognize as we go” requirement in mind,
apply structural or syntactic recognition techniques com-
they either result in poor performance or are simply not ap-
monly used in computer vision. In this paper, we character-
ize sketching as an interactive, incremental process, and ar-gue that sketch recognition algorithms should be tailored to
Model based object recognition methods in the literature
take advantage of these properties of sketches that separate
perform a search either in the correspondence space, the
them from images. We report experimental results showing
transformation space or use a combination of two. Be-
how the order in which strokes are drawn affects the recog-
cause sketches have a high degree of variability (e.g., non-
nition speed and propose possible approaches for achieving
affine scaling properties), transformation space search be-
algorithms with better memory and speed requirements.
comes inappropriate for recognition, thus we focused oncorrespondence space methods which try to find correspon-dences between image features and model features sub-
2. The problem
ject to some constraints. In the next section, we will de-
One property of sketches that is not exploited as much is
scribe how a popular correspondece space search algorithm
that they are created in an incremental fashion. On the other
– namely a variant of the interpretation tree (IT) algorithm
hand, most existing sketch recognition algorithms are vari-
– performs for continuous sketch recognition.
ants of computer vision algorithms, designed to deal withthe free-hand and articulated nature of sketches despite the
3. Continuous sketch recognition with the IT
fact that computer vision algorithms have been developed
algotithm
to deal with static pictures. Starting with a blank sheet ofpaper to the end of the sketching process, the sketching
The details of how the IT algorithm works is described in
surface sees a number of plausible scenes formed of com-
(Grimson, 1989). In our experimentes, we used a variant
pleted objects even if the scene is not semantically mean-
of the IT algorithm modified for continuous sketch recog-
ingful. In many circumstances, the recognition system may
nition. The basic idea is to instantiate plausable partial in-
be required to recognize such valid completed objects in
terpretations of a given scene. The list of plausable partial
the scene even if the sketch is not completed. One obvious
interpretations is extended as new strokes are drawn. After
scenario where object recognition is needed as the sketch is
each stroke is drawn, it is classified as a geometric primitive
being constructed is when the designer of the sketch based
(line, polyline, oval, curves) using the early sketch process-
interface wants the system to show its understanding by
ing toolkit described in (Tevfik Metin Sezgin, 2001). Par-
displaying iconic descriptions or neatened versions of the
tial interpretations are created and updated as follows. For
objects that it recognizes1. Another instance where it is
required to recognize objects before a sketch is completedoccurs in the case of editing. For example, in the domain of
1Whether or not this is appropriate for all domains or tasks is
itself an interesting research question.
• If there are no partial interpretations of the given type,
and if the geometric primitive derived from the lat-
est stroke fits into a slot2 without violating any con-
5. Results and future work
straints, create a new partial template with that partic-
As seen in the figure, costs for different orders range from
ular slot assigned to the primitive.
24 to 121. The reason for this difference is that dependingon how an object is drawn, the combinatoric explosion inthe number of interpretations will vary. In other words, the
• If there are existing partial interpretations which can
branching factor of the corresponding IT will be different
be extended with the latest primitive without violating
for different drawing orders. An example illustrates why
any constraints, these interpretations are cloned and
this happes: For a stick figure, if we start with two touch-
ing lines, they could potentially be a pairs of arms, legs, orthe body along with any of the other limbs. On the otherhand, if we have an oval touching a line, these strokes canonly be the head and the body. So, in one case there aremultiple ways in which two strokes can be labeled and inthe other case the labeling is unique. In the computer visionliterature, these kinds of combinatorial explosion in searchis controlled by actively searching highly constraining fea-ture sets to initiate the recognition (the concept of key or an-chor components). However, in continuous sketch recogni-tion, there is no guarantee that the key components will besketched first. We believe that a more appropriate recogni-tion strategy would be to delay the labeling of strokes untilthe key components of a particular object are drawn. This isindeed a hard problem because it requires knowing the cur-rent state of a drawing. In this case, determining the exactstate would be as costly as enumerating all plausable inter-pretations so we propose to use an estimate of the state. Theidea is to use an estimate of the current state along with theknowledge of the search space (learned offline from objectdescriptions) to guide the search to minimize combinatoricexplosion. This approach is closely realted to decision the-
Figure 1. Number of partial interpretations generated during
oretic and non-myopic approaches to search, and literature
recognition of stickfigures drawn with different stroke orderings,
on (PO)MDPs and planning. We are currently investigat-
sorted in ascending order. The y-axis shows the cost.
ing this proposed framework as a more CPU and memoryefficient approach to sketch recognition. 4. Experiments Acknowledgements
In order to test how the simple recognition strategy de-
I would like to thank my thesis advisor Prof. Randall Davis
scribed above performs for continuous sketch recognition,
we implemented a stickfigure recognizer. Because sketchesare created incrementaly and the drawing order for parts of
References
a stickfigure can change, we tested how many partial inter-pretations we get for different drawing orders. We recorded
Grimson, W. E. L. (1989). The combinatorics of heuristic
raw strokes for a stickfigure and added the strokes to the
search termination. AI Lab Memo 1111.
sketching surface in different drawing orders to simulatedifferent ways in which an object can be drawn. To mea-
Tevfik Metin Sezgin, Thomas Stahovich, R. D. (2001).
sure the cost of recognition for a particular ordering, we
Sketch based interfaces:early processing for sketch un-
used the number of partial interpretations that were instan-
derstanding. Proceedings of PUI-2001, November 2001.
tiated at the completion of the sketch. Fig. 1 shows the
Thomas Stahovich, J. L., & Randall Davis, C. (2002). A
costs for different orders sorted in an ascending order.
framework for multi-domain sketch recognition. AAAI
2By a slot, we refer to a component of a particular object
Spring Symposium: Sketch Understanding March 25-27,
model. For example, a plus sign has two slots which refer to
the intersecting horizontal and vertical lines. This is referred toas object feature in the computer vision literature.
CURRICULUM VITAE Name: Hernqvist, Per Håkan Erik Born: 1950-01-09 Address : Alfhöjdsgatan 6, 431 38 Mölndal Telephone: 031-160683 mobile 0709-670009 Mail: Profile and qualities: I have worked in the pharmaceutical business for 39 years, first within marketing and the last 30 years within clinical research. In this area I have conducted all moments within clinical tri
DR. CHRISTINE M. GRESIK ELMHURST MEMORIAL HEMATOLOGY & ONCOLOGY ASSOCIATES NEW PATIENT MEDICAL HISTORY DATE: ______/______/______ NAME ( LAST, FIRST, MI ):_____________________________________ Sex: r Female r Male Date of Birth: _____/_____/_____ Age: ______ Marital Status: r Single r Married (how long) ________ r Divorced r Separated r Widowed