Named Entity Recognizer

A named entity recognizer identifies named entities and their semantic types in text. Typically, named entities refer to clinical concepts in CLAMP-Cancer. As shown in Figure below, CLAMP-Cancer provides two different models for named entity recognition:

  1. DF_CRF_based_named_entity_recognizer ,and
  2. DF_Dictionary_lookup
  3. DF_Regular_expression_NER

Each model will be described in more details.

Three named entity recognizers and their configuration files
Three named entity recognizers and their configuration files
DF_CRF_based_named_entity_recognizer

DF_CRF_based_named_entity_recognizer is the default named entity recognizer used in CLAMP-Cancer. The recognizer identifies three types of clinical concepts:
Problems, treatments, and tests. It is built from training the CRF model on a dataset of clinical notes, namely, the i2b2 2010 challenge corpus (https://www.i2b2.org/NLP/Relations/). Advanced users can use the config.conf file to change the default recognizer model as in the file defaultModel.jar.

  1. To replace the default file:
    1. Double click on config.conf file to open it
    2. Click on the button with three dots to browse for your own file
    3. Click on the open button
      Sentence Detector
      How to replace the default file
DF_Dictionary_lookup

DF_Dictionary_lookup uses terms in the dictionary to match them directly with the identified named entities. Currently the defaultDic.txt used in CLAMP-Cancer consists of terms and their semantic types from UMLS (https://www.nlm.nih.gov/research/umls/) . The semantic type of the matched term in UMLS is assigned to the recognized named entity.
To configure DF_Dictionary_lookup: First, click on the config file under the DF_Dictionary_matcher folder. This will open up a new window that takes the following three parameters: Case sensitive, Stemming and Dictionaries.(Shown in the picture below)
Case sensitive
If you check the checkbox for "Case sensitive", the matcher will differentiate between capital and lowercase letters when searching for a term in the dictionary. For example, "Breast Cancer” will not matched with "breast cancer".
Stemming
If you check the checkbox for "Stemming", the matcher will match the stemmed form of a candidate named entity with the terms in the dictionary. For example, "breast cancers" will be matched to "breast cancer".
Dictionaries
You can also replace or edit the dictionary file suggested for this function.

  1. To replace the default dictionary file:
    1. Double click on config.conf file to open it
    2. Click on the button with three dots to browse for your own file
    3. Click on the open button
      Replace the default dictionary file
      Replace the default dictionary file
  2. To edit the current dictionary file:
    1. Double click on the defaultDict.txt file to open it
    2. Add the terms that you want to include in the dictionary file
    3. Click the Save button at the top of the page
      Edit the current dictionary file
      Edit the current dictionary file
DF_Regular_expression_NER

Using the defaultRegExpr.txt file, this module can identify named entities. defaultRegExpr.txt file can contain several regular expression. If a phrase matches a regular expression, it is recognized as a named entity. You can add your own regular expression to the existing file by double clicking the file and add the items that you want to include.