A named entity recognizer identifies named entities and their semantic types in text. Typically, named entities refer to clinical concepts in CLAMP-Cancer. As shown in Figure below, CLAMP-Cancer provides two different models for named entity recognition:
Each model will be described in more details.
DF_CRF_based_named_entity_recognizer is the default named entity recognizer used in
CLAMP-Cancer. The recognizer identifies three types of clinical concepts:
Problems, treatments, and tests.
It is built from training the CRF model on a dataset of clinical notes, namely, the i2b2
2010
challenge corpus (https://www.i2b2.org/NLP/Relations/).
Advanced users can use the
config.conf file to change the default recognizer model as in the file defaultModel.jar.
DF_Dictionary_lookup uses terms in the dictionary to match them directly with
the identified named entities. Currently the defaultDic.txt used in CLAMP-Cancer consists of
terms and their semantic types from UMLS
(https://www.nlm.nih.gov/research/umls/)
. The semantic type of the matched term in UMLS is assigned to the recognized named
entity.
To configure DF_Dictionary_lookup:
First, click on the config file under the DF_Dictionary_matcher folder. This will open
up a
new window that takes the following three parameters: Case sensitive, Stemming and
Dictionaries.(Shown in the picture below)
Case sensitive
If you check the checkbox for "Case sensitive", the matcher will differentiate between
capital and lowercase letters when searching for a term in the dictionary. For example,
"Breast Cancer” will not matched with "breast cancer".
Stemming
If you check the checkbox for "Stemming", the matcher will match the stemmed form of a
candidate named entity with the terms in the dictionary. For example, "breast cancers"
will
be matched to "breast cancer".
Dictionaries
You can also replace or edit the dictionary file suggested for this function.
Using the defaultRegExpr.txt file, this module can identify named entities. defaultRegExpr.txt file can contain several regular expression. If a phrase matches a regular expression, it is recognized as a named entity. You can add your own regular expression to the existing file by double clicking the file and add the items that you want to include.