NER Feature Extractor

This component consists of different feature extractors (Figure 9.1), which are used for extracting different types of features for named entity recognition, CLAMP-Cancer users will use this component to build their own named entity recognizer in a corpus annotation project (Refer to Section 4.2) . Similar to the previous components, we can customize these features by changing or replacing their default config files. Explanation of each extractor is as follows:

Three tokenizers and their configuration files — List of NER feature extractors

DF_Brown_clustering_feature

It is a type of word representation feature generated on the unlabeled data which is provided by the SemEval 2014 Challenge. Advanced users can eplace their own Brwon clustering file with the system’s default file.

To replace the default file:
1. Double click on config.conf file to open it
2. Click on the button with three dots to browse for your own file
3. Click on the open button
  
  How to replace the default file

For more information on how to create your own Brown Clustring file visit:
https://github.com/percyliang/brown-cluster

DF_Dictionary_lookup_feature

This extractor uses a dictionary consisting of terms and their semantic types from UMLS to extract potential features. Advanced users can replace or edit the default file following the steps below:
Note:The format of the content should be as the same as the default file: (phrase then tab then semantic type)

To replace the default file:
1. Double click on config.conf file to open it
2. Click on the button with three dots to browse for your own file
3. Click on the open button
  
  How to replace the default file
To edit the default file:
1. Double click on the word_path.txt file to open it
2. Add the terms that you want to include in the file
3. Click the Save button at the top of the page
  
  How to edit the default file

DF_Ngram_feature

This module uses the words along with their part-of-speech (pos) tagging as NER features.

DF_prefix_suffix_feature

This function extracts the prefix and suffix of words that may be a representative of a specific type of named entities.

DF_Random_indexing_feature

Similar to the brown clustering, it is a type of word representation feature generated on unlabeled data using a 3 rd party package. For more information visit: https://jcheminf.springeropen.com

To replace the default file:
1. Double click on config.conf file to open it
2. Click on the button with three dots to browse for your own file
3. Click on the open button
  
  How to replace the default file

DF_Section_feature

This function extracts the section in which a candidate named entity presents.

DF_Sentence_pattern_feature

This function distinguishes the pattern of a sentence by CLAMP-Cancer built in rules.

DF_Word_embedding_feature

Similar to the brown clustering and random indexing, it is a type of distributed word representation feature generated on the unlabeled data (MIMIC II) provided by the SemEval 2014 Challenge using a neural network.Advanced users can replace the default file with their own file.

To replace the default file:
1. Double click on config.conf file to open it
2. Click on the button with three dots to browse for your own file
3. Click on the open button
  
  How to replace the default file

DF_Word_shape_feature

This function extracts the type of a word; it identifies whether or not it begins with an english letter, number, and etc.

DF_Words_regular_expression_feature

This function extracts the regular expression patterns of words that may indicate a specific type of named entity. Advanced users can create their own regular expressions or edit the default file

To replace the default file:
1. Double click on config.conf file to open it
2. Click on the button with three dots to browse for your own file
3. Click on the open button
  
  How to replace the default file
To edit the default file:
1. Double click on the reglist.txt file to open it
2. Add the terms that you want to include in the file
3. Click the Save button at the top of the page
  
  How to edit the default file