Named Entity Recognition - Techniques and Evaluation

Seminararbeit, 2011

22 Seiten, Note: 1,7



1 Introduction
1.1 Scope of this Work
1.2 Applications of NERC
1.3 Type, Domain and Language Factors

2 NERC Evaluation
2.1 Ways of Evaluation
2.1.1 MUC Evaluations
2.1.2 Exact-match Evaluations
2.2 Evaluation Metrics

3 Features for NERC
3.1 List lookup features
3.2 Document and corpus features
3.3 Short Example

4 Overview of NERC Systems
4.1 Supervised Learners
4.1.1 Baseline Approach
4.1.2 Maximum Entropy Approach
4.2 Semi-Supervised Learners
4.3 Unsupervised Learners
4.3.1 Augmenting Ontologies
4.3.2 Generating Gazetteers and Resolving Ambiguity

5 Conclusion

List of Abbreviations

List of Listings

List of Figures

List of Tables


1 Introduction

A Named Entity (NE) is "anything that can be referred to with a proper name" [JJ08, p. 761]. Named Entity Recognition and Classification (NERC) (or Named Entity Recognition (NER) -1 will use these terms synonymously) is the task of finding Named Entities of given types (usually including the three types ORGANIZATION, PERSON and LOCATION, also known as the ENAMEX types since the MUC-6 competition) in texts. This can be done by either using hand-crafted rules or specially trained machine learning classifiers.

1.1 Scope of this Work

The scope of this work covers the machine learning classifiers only. There are three different subtypes of machine learning classifiers, regarding the amount of manually labeled training data they need to achieve valuable results: Supervised, semi-supervised, and unsupervised classifiers. All those need a carefully chosen set of features in order to encode the input documents so that they will be processable by a machine. Finally, the procedure and the metrics being used for evaluating NERC systems are very important points in the discussion of the topic. This work covers all those fields and tries to give an overview of NERC as well as to identify the advantages and disadvantages of existing systems and techniques.

1.2 Applications of NERC

In [JJ08, p. 761], the authors claim that

The starting point for most information extraction applications is the detection and classification of the named entities in a text.

The core application of NERC is indeed the Information Extraction (IE) branch. When you are try­ing to e.g. assign a category label to a given text, it is very helpful to know about the organizations, people, locations and brands mentioned in that text. Other tasks where NERC could be used include au­tomatic summarization and question answering. A natural language recognition task which is important in developing working NERC systems is word sense disambiguation.

1.3 Type, Domain and Language Factors

Before starting with NERC, it is essential to notice that the three factors language, text domain and the entity types used during building / training the system are of great impact to the resulting recognition system. There has been research activity devoted to many different languages (see [NTM07, p. 2]) as well as language-independent approaches [TKSDM03]. Having chosen a set of target languages for the NERC system to build, one must consider that it can be difficult to port an existing system to a new domain. If a classifier was trained using juristic texts, it will be difficult for this classifier to deal with material originated from bio informatics. In addition, most systems must be trained to recognize e.g. car brands before being able to successfully fulfill that task - this refers to the entity type factor.

2 NERC Evaluation

In order to compare NERC systems, it is very important to discuss the methods used in that comparison before going on to explaining specific techniques that fulfill the NERC task. On one hand, there are different views of what is a correct answer of a classifier; on the other hand, you can use different metrics which return numerical values representing different aspects of the quality of a system.

Because one aim of this work is to compare Named Entity Recognition and Classification systems, it starts by explaining the evaluation techniques; in the subsequent chapters, this knowledge is required for the evaluation and categorization of system results.

2.1 Ways of Evaluation

In simple words, the NERC task consists of tagging groups of words (or tokens) with tags marking the class of Named Entities they belong to. Thus, there are two possible sources of errors, which can of course occur in combined forms, too:

Boundary related errors The boundaries of a tag returned by the classifier are not correct. Either the words in the marked sequence do not belong to an NE (false alarm), the tag overlaps with another sequence (NE or not NE) or the boundaries are too short.

Type related errors A sequence is assigned a wrong type.

Imagine the following example output of a classifier, taken from [NTM07, p. 13]:

illustration not visible in this excerpt

Listing 2.1: "Example (erroneous) output of a NERC classifier"

The tagging syntax is the standard one taken from the Message Understanding Conferences; in the ex­ample, the standard NE types organization, location and person are tagged. The careful reader instantly recognizes several errors in the output. Table 2.1 summarizes the errors made by the classifier and also indicates if they are boundary and/or type related.

There are different ways to deal with that information, some of which are explained below.

2.1.1 MUC Evaluations

In Message Understanding Conference (MUC) events, the quite natural way of a two-axed approach is gone. A system is credited a correct TYPE if it tagged an entity with the correct NE type, regardless of the boundaries (which only must overlap); if the system found an entity with correct boundaries, it is credited a correct TEXT (regardless of the assigned type). So it is possible to earn two points for each NE in the text; nevertheless, it is possible to lose two points if an entity is missed or tagged entirely wrong. For more information, see [NTM07].

illustration not visible in this excerpt

Table 2.1: Error types in classifier output

illustration not visible in this excerpt

Table 2.2: Recall and precision for some example values

2.1.2 Exact-match Evaluations

An approach which is more restrictive than the one explained in section 2.1.1 is the exact-match eval­uation. There is only one axe, and the system is credited on point for each entity which was assigned as well the correct TYPE as the correct TEXT. So there is no possibility for the system to score in case of "partial matches".

2.2 Evaluation Metrics

The metrics commonly used in the evaluation of NERC systems are recall, precision and E(1) measure. Recall is the ratio of the number of correct system answers to the expected total number of answers, precision is the ratio of the number of correct system answers to the total number of system answers (in this context, an "answer" consists of assigning a tag to one or more tokens):

illustration not visible in this excerpt

Table 2.2 illustrates recall and precision for some example values.

The third familiar metric, the F(1) measure, combines those two values. The F measure is defined as

illustration not visible in this excerpt

where R is the recall and P the precision value. The parameter β can be use to weight the measure, thereby, depending on the sign, either favoring precision or recall. The F1 measure, commonly used in most papers dealing with NERC, is the result of setting β = 1:

This ratio is identical to the harmonic mean of P and R. Table 2.3 illustrates the F1 measure with the same set of example values used in Table 2.2.

Imagine now that the values in the tables 2.2 and 2.3 are results of three different NERC systems - each row originates from one single system, so we have values for recall, precision and F1 measure for each system. Which system should be preferred? Basically, this depends entirely on the future application. In an Information Extraction (IE) task, the second system (note that this has the lowest F1 value), would probably be the chosen one. The reason for that choice is the best precision value of 95%. If the output of the classifier is taken to categorize the contents of the given input text, it is very important to have accurate answers. Missing some NEs is quite OK if the actually tagged ones are mostly correctly recognized.

This demonstrates that one must be careful when only the F1 measure values are presented; a different metric could supply more valuable results.

illustration not visible in this excerpt

proper name, verb, noun, foreign word Lowercased / uppercased version of word Word length Patterns

Table 3.1: Examples of word-level features

3 Features for NERC

A feature defines a mapping from words/tokens (some rare approaches, like [WP03], are made using characters instead) to a target space, where this space is specially designed for algorithmic consumption. Having defined one or more such mappings, whole texts are represented as feature vectors which serve as abstractions over texts for the machine learner. The more detailed and significant the chosen set of features, the easier it will be for the learning process to develop reasonable rules, as the authors of [TKSDM03] recognized:

The choice of the learning approach is important for obtaining a good system for recognizing named entities. However, in the CoNLL-2002 shared task we found out that choice of features is at least as important.

Basically, all features can be divided into three groups: word-level features, list lookup features as well as document and corpus features. Table 3.1 shows some basic example word level features taken from [NTM07, p. 8].

3.1 List lookup features

List lookup features play an important role in some NERC systems (see [NTM06]). By supplying lists of known NEs, an "is a" relation is defined which enables the system to check if an input token could possibly be an NE. However, this cannot be the only foundation of a classification system, since extensive lists of NEs can be hard to maintain and you have to develop some disambiguation technique to distinguish e.g. between the organization name "Apple" and the trivial noun "apple". Of course, complete lists of entities are not the only types of lists that can be used in NERC: Further possibilities are lists of entity cues (like "Dr." or "MD" is a hint for a person) or general lists like dictionaries and lists of stop words (which both can be used e.g. for the disambiguation task).

3.2 Document and corpus features

Document and corpus features can supply clues about the context in which words occur which goes behing word level.


Ende der Leseprobe aus 22 Seiten


Named Entity Recognition - Techniques and Evaluation
Technische Universität Darmstadt  (Fachbereich Informatik)
Text Analytics
ISBN (eBook)
664 KB
Eigennamen, Natural Language Processing
Arbeit zitieren
Dominic Scheurer (Autor:in), 2011, Named Entity Recognition - Techniques and Evaluation, München, GRIN Verlag,


  • Noch keine Kommentare.
Im eBook lesen
Titel: Named Entity Recognition - Techniques and Evaluation

Ihre Arbeit hochladen

Ihre Hausarbeit / Abschlussarbeit:

- Publikation als eBook und Buch
- Hohes Honorar auf die Verkäufe
- Für Sie komplett kostenlos – mit ISBN
- Es dauert nur 5 Minuten
- Jede Arbeit findet Leser

Kostenlos Autor werden