Machine learning model development

CLAMP-Cancer enables you to build your own machine learning model based on a corpus that you have already annotated or a pre annotated one that you have imported into a corpus annotation project. The model can be used for predictions on new files. In the current version of CLAMP-Cancer, CRF (Conditional Random Field) is used to build machine-learning model for named entity recognition (NER).
The first step to build a Machine Learning model is to configure its schema. After configuring the schema, you will be able to start running the training model and evaluation processes. Once these processes are completed, you can view the generated model, its associated log files, and named entities predicted by the model in the output folder. The following steps will guide you on how to perform the steps mentioned above.

Building machine learning models
(NER model)

  1. Select your desired train folder on the Corpus panel
  2. Click on the "Train Model" button at the top of the window as shown in the figure below.
    Train Model button
    Train Model button
  3. On the pop up window as shown in the figure below, enter a name for the model that you are building.
    Configuration window for machine learning model building
    Configuration window for machine learning model building
  4. Click the checkbox for the features that you want to include in your model
  5. In the Evaluation box, choose if you want to test the built model against a test dataset and/or if you want to do a n-fold cross-validation during the training process.

    If you choose to test the model against a test set, make sure that you have your desired annotated xmi files in the folder of your choice. You can browse for the folder by clicking on the three dot button next to the checkboxes. With the n-fold cross validation, you are not required to do so as the training data will be used to test the model performance.

  6. Click on the Finish button to start building the model.

    Once the building process starts, you can check the progress in the Console window, as well as the progress bar at the bottom of the screen. You can also stop the building process at anytime by clicking the red stop button in the Progress window.
    Note:During the model building process, the training files can not be annotated. Clicking on the text of the training files pops up an alert window indicating that the user operation is waiting for a function to complete.

    Annotations on the training file will be paused during the model building process
    Annotations on the training file will be paused during the model building process