Proceedings of the 45th annual meeting of the association of computational linguistics
Interpreting Comparative Constructions in Biomedical Text Marcelo Fiszman,1 Dina Demner-Fushman,2 Francois M. Lang,2 Philip Goetz,2 Thomas C. Rindflesch2
1University of Tennessee – GSM, Knoxville, TN 37920
2Lister Hill National Center for Biomedical Communications
National Library of Medicine, Bethesda, MD 20894
{ddemner|goetzp|flang|trindflesch}@mail.nih.gov
Abstract
In the abstracts of these reports, a treatment for
some disease is typically discussed using two types
of comparative structures. The first announces that
underspecified semantic interpretation to
the (primary) therapy focused on in the study will
be compared to some other (secondary) therapy. A
structures that are prevalent in the research
literature reporting on clinical trials for
which constructs predications based on the
An outcome statement (2) often appears near the
Unified Medical Language System. Results
end of the abstract, asserting results in terms of the
of a preliminary evaluation were recall of
relative merits of the primary therapy compared to
81%. We discuss the generalization of the
therapeutic and diagnostic procedures. The
available structures in computable format
The processing of comparative expressions such
as (1) and (2) was incorporated into an existing system, SemRep [Rindflesch and Fiszman, 2003;
Introduction
Rindflesch et al., 2005], which constructs semantic
As natural language processing (NLP) is predications by mapping assertions in biomedical increasingly able to support advanced information
text to the Unified Medical Language System®
management techniques for research in medicine (UMLS)® [Humphreys et al., 1998]. and biology, it is being incrementally improved to provide extended coverage and more accurate 2 Background
results. In this paper, we discuss the extension of
Comparative structures in English
an existing semantic interpretation system to address comparative structures. These structures The range of comparative expressions in English is provide a way of explicating the characteristics of
extensive and complex. Several linguistic studies
one entity in terms of a second, thereby enhancing
have investigated their characteristics, with
the description of the first. This phenomenon is differing assumptions about syntax and semantics important in clinical research literature reporting (for example [Ryan, 1981; Rayner and Banks, the results of clinical trials.
1990; Staab and Hahn, 1997; Huddleston and Pullum, 2002]). Our study concentrates on
BioNLP 2007: Biological, translational, and clinical language processing, pages 137–144,
Prague, June 2007. c 2007 Association for Computational Linguistics
structures in which two drugs are compared with
Naproxen is safer than
respect to a shared attribute (e.g. how well they aspirin in the treatment of the treat some disease). An assessment of their relative
merit in this regard is indicated by their positions
Sodium valproate was
on a scale. The compared terms are expressed as
noun phrases, which can be considered to be prochlorperazine in reducing pain conjoined. The shared characteristic focused on is
expressed as a predicate outside the comparative In examples (3) through (7), the characteristic the structure. An adjective or noun is used to denote
compared drugs have in common is treatment of
the scale, and words such as than, as, with, and to
some disorder, for example treatment of pertussis
serve as cues to identify the compared terms, the
scale, and the relative position of the terms on the
Few studies describe an implemented automatic
analysis of comparatives; however, Friedman
The first type of structure we address (called [Friedman, 1989] is a notable exception. Jindal and
comp1 and illustrated in (3)) merely asserts that the
Liu [Jindal and Liu, 2006] use machine learning to
primary and secondary terms (in bold) are being identify some comparative structures, but do not compared. A possible cue for identifying these provide a semantic interpretation. We exploit structures is a form of compare. A further SemRep machinery to interpret the aspects of characteristic is that the compared terms are comparative structures just described. separated by a conjunction, or a preposition, as in (3).
To compare misoprostol with dinoprostone for cervical ripening
Rindflesch et al., 2005] recovers underspecified
semantic propositions in biomedical text based on
As shown in (4), a scale may be mentioned a partial syntactic analysis and structured domain (efficacy); however, in this study, we only identify
knowledge from the UMLS. Several systems that
the compared terms in structures of this type.
extract entities and relations are under
development in both the clinical and molecular
misoprostol with dinoprostone for
biology domains. Examples of systems for clinical
text are described in [Friedman et al., 1994],
[Johnson et al., 1993], [Hahn et al., 2002], and
In the more complex comparative expression we
[Christensen et al., 2002]. In molecular biology,
accommodate (called comp2), the relative ranking
examples include [Yen et al., 2006], [Chun et al.,
of two compared terms is indicated on a scale 2006], [Blaschke et al., 1999], [Leroy et al., 2003], denoted by an adjective (e.g. effective in (5)). The
[Rindflesch et al., 2005], [Friedman et al., 2001],
relative position of the compared terms in scalar and [Lussier et al., 2006]. comparative structures of this type expresses either
During SemRep processing, a partial syntactic
equality or inequality. Inequality is further divided
parse is produced that depends on lexical look-up
into superiority, where the primary compared term
in the SPECIALIST lexicon [McCray et al., 1994]
is higher on the scale than the secondary, and and a part-of-speech tagger [Smith et al., 2004]. inferiority, where the opposite is true. Cues MetaMap [Aronson, 2001] then matches noun associated with the adjective designating the scale
phrases to concepts in the Metathesaurus® and
signal these phenomena (e.g. as ADJ as in (5) for
determines the semantic type for each concept. For
equality, ADJer than in (6) for superiority, and less
example, the structure in (9), produced for (8),
ADJ than in (7) for inferiority).
allows both syntactic and semantic information to
Azithromycin is as effective
be used in further SemRep processing that
as erythromycin estolate for the
treatment of gastroesophageal reflux disease
C2: compare Term1 with/to Term2
C3: compare Term1 and/versus Term2
C4a: Term1 comparison with/to Term2
C4b: comparison of Term1 with/to Term2
C4c: comparison of Term1 and/versus Term2
(12) comp2: Scalar patterns
Predicates are derived from indicator rules that
S1: Term1 BE as ADJ as {BE} Term2
map syntactic phenomena (such as verbs and S2a: Term1 BE more ADJ than {BE} Term2
nominalizations) to relationships in the UMLS S2b: Term1 BE ADJer than {BE}Term2
Semantic Network. Argument identification is S2c: Term1 BE less ADJ than {BE} Term2
guided by dependency grammar rules as well as S4: Term1 BE superiorto Term2
constraints imposed by the Semantic Network. In
processing (8), for example, an indicator rule links
As with SemRep in general, the interpretation of
the nominalization treatment with the Semantic comparative structures exploits underspecified
Network relation “Pharmacologic Substance syntactic structure enhanced with Metathesaurus
TREATS Disease or Syndrome.” Since the concepts and semantic types. Semantic groups
semantic types of the syntactic arguments [McCray et al., 2001] from the Semantic Network
identified for treatment in this sentence are also available. For this project, we exploit the
(‘Pharmacologic Substance’ for “lansoprazole” and
group Chemicals & Drugs, which contains such
‘Disease or Syndrome’ for “Gastroesophageal semantic types as ‘Pharmacologic Substance’,
reflux disease”) match the corresponding semantic
‘Antibiotic’, and ‘Immunologic Factor’. (The
types in the relation from the Semantic Network,
principles used here also apply to compared terms
the predication in (10) is constructed, where with semantic types from other semantic groups,
subject and object are Metathesaurus concepts.
such as ‘Procedures’.) In the comp1 patterns, a
form of compare acts as an indicator of a
comparative predication. In comp2, the adjective serves that function. Other words appearing in the
patterns cue the indicator word (in comp2) and help identify the compared terms (in both comp1
Linguistic patterns
and comp2). The conjunction versus is special in
We extracted sentences for developing that it cues the secondary compared term (Term2) comparative processing from a set of some 10,000
in comp1, but may also indicate a comp1 structure
MEDLINE citations reporting on the results of in the absence of a form of compare (C5). clinical trials, a rich source of comparative structures. In this sample, the most frequent 3.2 Interpreting comp1 patterns
patterns for comp1 (only announces that two terms
When SemRep encounters a form of compare, it
are compared) and comp2 (includes a scale and assumes a comp1 structure and looks to the right
positions on that scale) are given in (11) and (12).
for the first noun phrase immediately preceded by
In the patterns, Term1 and Term2 refer to the with, to, and, or versus. If the head of this phrase is
primary and secondary compared terms, mapped to a concept having a semantic type in the
respectively. “{BE}” means that some form of be
group Chemicals & Drugs, it is marked as the
is optional, and slash indicates disjunction. These
secondary compared term. The algorithm then
patterns served as guides for enhancing SemRep looks to the left of that term for a noun phrase
argument identification machinery but were not having a semantic type also in the group Chemicals
implemented as such. That is, they indicate & Drugs, which becomes the primary compared
necessary components but do not preclude term. When this processing is applied to (13), the
semantic predication (14) is produced, in which the
(11) comp1: Compared terms
C1: Term1 {BE} compare with/to Term2
argument is the primary compared term and the
other is the secondary. As noted earlier, although a
head has been mapped to a concept with a
scale is sometimes asserted in these structures (as
semantic type in the group Chemicals & Drugs, it
in (13)), SemRep does not retrieve it. An assertion
is marked as the secondary compared term. As in
regarding position on the scale never appears in comp1, the algorithm then looks to the left for the comp1 structures.
first noun phrase having a head in the same
semantic group, and that phrase is marked as the
tolerability of Hypericum perforatum with imipramine in
To find the scale name, SemRep examines the
secondary compared term and then locates the first
adjective to its left. The nominalization of that
adjective (as found in the SPECIALIST Lexicon)
is designated as the scale and serves as an
SemRep considers noun phrases occurring argument of the predicate SCALE in the
immediately to the right and left of versus as being
interpretation. For adjectives superior and inferior
compared terms if their heads have been mapped to
(patterns S4 and S5 in (12)) the scale name is
Metathesaurus concepts having semantic types “goodness.” belonging to the group Chemicals & Drugs. Such
In determining relative position on the scale,
noun phrases are interpreted as part of a comp1 equality is contrasted with inequality. If the structure, even if a form of compare has not adjective of the construction is immediately occurred. The predication (16) is derived from preceded by as (pattern S1 in (12) above), the two (15).
compared terms have the same position on the scale (equality), and are construed as arguments of
(15) Intravenous lorazepam versus
a predication with predicate SAME_AS. In all
dimenhydrinate for treatment of
other comp2 constructions, the compared terms are
in a relationship of inequality. The primary
compared term is considered higher on the scale
unless the adjective is inferior or is preceded by
less, in which case the secondary term is higher.
SemRep treats compared terms as being LOWER_THAN are used to construct predications
coordinated. For example, this identification with the compared terms to interpret position on
allows both “Lorazepam” and “Dimenhydrinate” the scale. The equality construction in (18) is
to function as arguments of TREATS in (15). expressed as the predications in (19).
Consequently, in addition to (16), the predications
(18) Candesartan is as effective
in (17) are returned as the semantic interpretation
as lisinopril once daily in
of (15). Such processing is done for all comp1 and
comp2 structures (although these results are not (19) Candesartan COMPARED_WITH
given for (13) and are not further discussed in this
The superiority construction in (20) is expressed as
Interpreting comp2 patterns
the predications in (21). (20) Losartan was more effective
In addition to identifying two compared terms than atenolol in reducing
when processing comp2 patterns, a scale must be
named and the relative position of the terms on that
scale indicated. The algorithm for finding hypertension, diabetes, and LVH.
compared terms in comp2 structures begins by (21) Losartan COMPARED_WITH
locating one of the cues as, than, or to and then Atenolol
examines the next noun phrase to the right. If its
Evaluation
To evaluate the effectiveness of the developed
methods we created a test set of 300 sentences
The inferiority construction in (22) is expressed as
containing comparative structures. These were
extracted by the second author (who did not
(22) Morphine-6-glucoronide was
participate in the development of the methodology)
from 3000 MEDLINE citations published later in
morphine in producing pupil
date than the citations used to develop the
methodology. The citations were retrieved with a
PubMed query specifying randomized controlled
studies and comparative studies on drug therapy.
Sentences containing direct comparisons of the
pharmacological actions of two drugs expressed in
the target structures (comp1 and comp2) were extracted starting from the latest retrieved citation
Accommodating negation
Negation in comparative structures affects the comparative structures had been examined. These position of the compared terms on the scale, and is
were annotated with the PubMed ID of the citation,
accommodated differently for equality and for names of two drugs (COMPARED_WITH inequality. When a scalar comparison of equality predication), the scale on which they are compared (pattern S1, as ADJ as) is negated, the primary (SCALE), and the relative position of the primary term is lower on the scale than the secondary drug with respect to the secondary (SAME_AS, (rather than being at least equal). For example, in
interpreting the negated equality construction in
SemRep and evaluated against the annotated test
(24) Amoxicillin-clavulanate was
set. We then computed recall and precision in
not as effective as ciprofloxacin
several ways: overall for all comparative
structures, for comp1 structures only, and for
comp2 structures only. To understand how the
overall identification of comparatives is influenced
by the components of the construction, we also
computed recall and precision separately for drug
names, scale, and position on scale (SAME_AS,
For patterns of inequality, SemRep negates the
together). Recall measures the proportion of
predication indicating position on the scale. For manually annotated categories that have been example, the predications in (27) represent the correctly identified automatically. Precision negated superiority comparison in (26). Negation measures what proportion of the automatically of inferiority comparatives (e.g. “X is not less annotated categories is correct. effective than Y”) is extremely rare in our sample.
In addition, the overall identification of
comparative structures was evaluated using the F-
(26) These data show that celecoxib is not better than
measure [Rijsbergen, 1979], which combines recall and precision. The F-measure was computed using
diclofenac (P = 0.414) in terms of ulcer complications.
macro-averaging and micro-averaging. Macro-averaging was computed over each category first
and then averaged over the three categories (drug
names, scale, and position on scale). This approach
gives equal weight to each category. In micro-
NEG_HIGHER_THAN averaging (which gives an equal weight to the
performance on each sentence) recall and precision
were obtained by summing over all individual 5 Discussion
sentences. Because it is impossible to enumerate all entities and relations which are not drugs, scale,
In examining SemRep errors, we determined that
or position we did not use the classification error
more than 60% of the false negatives (for both
rate and other metrics that require computing of comp1 and comp2) were due to “empty heads” true negative values.
[Chodorow et al., 1985; Guthrie et al., 1990], in which the syntactic head of a noun phrase does not
reflect semantic thrust. Such heads prevent SemRep from accurately determining the semantic
Upon inspection of the SemRep processing results
type and group of the noun phrase. In our sample,
we noticed that the test set contained nine expressions interpreted as empty heads include duplicates. In addition, four sentences were not those referring to drug dosage and formulations, processed for various technical reasons. We report
such as extended release (the latter often
the results for the remaining 287 sentences, which
abbreviated as XR). Examples of missed
contain 288 comparative structures occurring in interpretations are in sentences (28) and (29), 168 MEDLINE citations. Seventy four citations where the empty heads are in bold. Ahlers et al. contain 85 comp2 structures. The remaining 203 [Ahlers et al., 2007] discuss enhancements to structures are comp1.
SemRep for accommodating empty heads. These
Correct identification of comparative structures
mechanisms are being incorporated into the
of both types depends on two factors: 1) processing for comparative structures. recognition of both drugs being compared, and 2)
(28) Oxybutynin 15 mg was more
recognition of the presence of a comparative effective than propiverine 20 mg
structure itself. In addition, correct identification of
the comp2 structures depends on recognition of the
scale on which the drugs are compared and the patients. relative position of the drugs on the scale. Table 1
presents recall, precision, and F-score reflecting effective as oxybutynin immediate release for increasing bladder Task Recall Precision
False positives were due exclusively to word
sense ambiguity. For example, in (30) bid (twice a
day) was mapped to the concept “BID protein”,
which belongs to the semantic group Chemicals &
Drugs. The most recent version of MetaMap,
which will soon be called by comparative
processing, exploits word sense disambiguation
We considered drug identification to be correct
[Humphrey et al., 2006] and will likely resolve
only if both drugs participating in the relationship
were identified correctly. The recall results (30) Retapamulin ointment 1% (bid)
indicate that approximately 30% of the drugs and
comparative structures of comp1, as well as 40%
oral cephalexin (bid) for 10 days
of comp2 structures, remain unrecognized; in treatment of patients with SID,
however, all components are identified with high and was well tolerated.
precision. Macro-averaging over compared drug
Although, in this paper, we tested the method on
names, scale, and position on scale categories we
structures in which the compared terms belong to
achieve an F-score = 0.78. The micro-average the semantic group Chemicals & Drugs, we can
score for 287 comparative sentences is 0.5.
straightforwardly generalize the method by adding other semantic groups to the algorithm. For
example, if SemRep recognized the noun phrases
of comparative structures, and that is the
in bold in (31) and (32) as belonging to the group
interpretation of outcome statements in MEDLINE
Procedures, comparative processing could proceed
citations, as a method for supporting automatic
access to the latest results from clinical trials
(31) Comparison of multi-slice spiral CT and magnetic resonance imaging in evaluation of the un- Conclusion
We expanded a symbolic semantic interpreter to
identify comparative constructions in biomedical
(32) Dynamic multi-slice spiral
text. The method relies on underspecified syntactic
CT is better than dynamic magnetic
analysis and domain knowledge from the UMLS.
resonance to some extent in
We identify two compared terms and scalar
comparative structures in MEDLINE citations.
Although we restricted the method to comparisons
of drug therapies, the method can be easily
The semantic predications returned by SemRep
generalized to other entities such as diagnostic and
to represent comparative expressions can be therapeutic procedures. The availability of this considered a type of executable knowledge that information in computable format can support the supports reasoning. Since the arguments in these identification of outcome sentences in MEDLINE, predications have been mapped to the UMLS, a which in turn supports translation of biomedical structured knowledge source, they can be research into improvements in quality of patient manipulated using that knowledge. It is also care. possible to compute the transitive closure of all
Acknowledgement This study was supported in
SemRep output for a collection of texts to part by the Intramural Research Programs of the determine which drug was asserted in that National Institutes of Health, National Library of collection to be the best with respect to some Medicine. characteristic. This ability could be very useful in supporting question-answering applications.
References
As noted earlier, it is common in reporting on
the results of randomized clinical trials and Ahlers C, Fiszman M, Demner-Fushman D, Lang F,
Rindflesch TC. 2007. Extracting semantic
systematic reviews that a comp1 structure appears
early in the discourse to announce the objectives of
the study and that a comp2 structure often appears
near the end to give the results. Another example of this phenomenon appears in (33) and (34) (from
Aronson AR. 2001. Effective mapping of biomedical
text to the UMLS Metathesaurus: The MetaMap
(33) To compare the efficacy of famotidine and omeprazole in
Blaschke C, Andrade MA, Ouzounis C, and Valencia A.
1999. Automatic extraction of biological information from scientific text: protein-protein interactions.
Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. Morgan
(34) Omeprazole is more effective than famotidine for the control of
Christensen L, Haug PJ, and Fiszman M. 2002.
understanding system. Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, Association for Computational Linguistics,
We suggest one example of an application that
can benefit from the information provided by the
Chodorow MS, Byrd RI, and Heidom GE. 1985.
knowledge inherent in the semantic interpretation
Extracting Semantic Hierarchies from a Large On-
Line Dictionary. Proceedings of the 23rd Annual Leroy G, Chen H, and Martinez JD. 2003 A shallow Meeting of the Association for Computational
parser based on closed-class words to capture
relations in biomedical text. J Biomed Inform, 36(3):145-158.
Chun HW, Tsuruoka Y, Kim J-D, Shiba R, Nagata N,
Hishiki T, and Tsujii J. 2006, Extraction of gene-
Lussier YA, Borlawsky T, Rappaport D, Liu Y, and
disease relations from Medline using domain
Friedman C. 2006 PhenoGO: assigning phenotypic
dictionaries and machine learning. Pac Symp
context to Gene Ontology annotations with natural
language processing. Pac Symp Biocomput, 64-75.
Friedman C. 1989. A general computational treatment
McCray AT, Srinivasan S, and Browne AC. 1994.
of the comparative. Proc 27th Annual Meeting Assoc
Lexical methods for managing variation in
biomedical terminologies. Proc Annu Symp Comput Appl Med Care, 235-9.
Friedman C, Alderson PO, Austin JH, Cimino JJ, and
Johnson SB. 1994. A general natural-language text
McCray AT, Burgun A, and Bodenreider O. 2001
processor for clinical radiology. J Am Med Inform
Aggregating UMLS semantic types for reducing
conceptual complexity. Medinfo, 10(Pt 1): 216-20.
Friedman C, Kra P, Yu H, Krauthammer M, and Rayner M and Banks A. 1990. An implementable
Rzhetsky A. 2001. GENIES: a natural-language
semantics for comparative constructions.
processing system for the extraction of molecular
Computational Linguistics, 16(2):86-112.
pathways from journal articles. Bioinformatics, 17
Rindflesch TC. 1995. Integrating natural language
processing and biomedical domain knowledge for
Guthrie L, Slater BM, Wilks Y, Bruce R. 1990. Is there
increased information retrieval effectiveness. Proc
content in empty heads? Proceedings of the 13th 5th Annual Dual-use Technologies and Applications Conference on Computational Linguistics, v3:138 –
Rindflesch TC and Fiszman M. 2003. The interaction of
domain knowledge and linguistic structure in natural
MEDSYNDIKATE--a natural language system for
language processing: Interpreting hypernymic
the extraction of medical information from findings
propositions in biomedical text. J Biomed Inform,
reports. Int J Med Inf, 67(1-3):63-74.
Huddleston R, and Pullum GK. 2002. The Cambridge
Rindflesch TC, Marcelo Fiszman , and Bisharah Libbus.
Grammar of the English Language. Cambridge
2005. Semantic interpretation for the biomedical
research literature. Medical informatics: Knowledge management and data mining in biomedicine.
Humphrey SM, Rogers WJ, Kilicoglu H, Demner-
Fushman D, Rindflesch TC. 2006. Word sense disambiguation by selecting the best semantic type
Rijsbergen V. 1979. Information Retrieval,
based on Journal Descriptor Indexing: Preliminary
experiment. J Am Soc Inf SciTech 57(1):96-113.
Ryan K. 1981. Corepresentational grammar and parsing
Humphreys BL, Lindberg DA, Schoolman HM, and
English comparatives. Proc 19th Annual Meeting
Barnett OG. 1998. The Unified Medical Language
System: An informatics research collaboration. J Am
Smith L, Rindflesch T, and Wilbur WJ. 2004. MedPost:
Med Inform Assoc,5(1):1-11.
a part-of-speech tagger for biomedical text.
Jindal, Nitin and Bing Liu. 2006. Identifying
comparative sentences in text documents. Staab S and Hahn U. Comparatives in context. 1997.
Proceedings of the 29th Annual International ACM Proc 14th National Conference on Artificial SIGIR Conference on Research & Development on Intelligence and 9th Innovative Applications of Artificial Intelligence Conference, 616-621.
Johnson SB, Aguirre A, Peng P, and Cimino J. 1993.
Yen YT, Chen B, Chiu HW, Lee YC, Li YC, and Hsu
Interpreting natural language queries using the
CY. 2006. Developing an NLP and IR-based
UMLS. Proc Annu Symp Comput Appl Med Care,
algorithm for analyzing gene-disease relationships.
09_HWPlus_Elem_WB_Key.qxd 16/11/10 10:24 Page 89 Workbook key 11 2 Pierre is a French name . 3 Oxford is an English university . A Hello. What’s your name? 9 2 Cathy is Louise’s sister. 3 Stephen is 4 English is an international language . B Suzanne. What’s your name ? A My name is John. Where are you from , 5 George is Mary’s husband. 6 A Mercedes is a Ge
HERPES: GENITAL, VENERAL WARTS Herpes Simplex Type II(Genital Herpes) Genital Herpes Virus in the Nerve ganglia; Herpes I Virus; Blisters in 1-2 days, becoming open genital ulcers. Ulcers last 2 weeks or longer. From poor immune response, stress, sickness, menstruation, cold or fatigue. Herpes II virus in a pregnant woman may develop into fatal encephalitis requiring Caesarian Section; The sa