CLAMP-Cancer enables you to build your own machine learning model based on a corpus that you
have already annotated or a pre annotated one that you have imported into a corpus
annotation project. The model can be used for predictions on new files. In the current
version of CLAMP-Cancer, CRF (Conditional Random Field) is used to build machine-learning model
for named entity recognition (NER).
The first step to build a Machine Learning model is to configure its schema. After
configuring the schema, you will be able to start running the training model and evaluation
processes. Once these processes are completed, you can view the generated model, its
associated log files, and named entities predicted by the model in the output folder. The
following steps will guide you on how to perform the steps mentioned above.
If you choose to test the model against a test set, make sure that you have your desired annotated xmi files in the folder of your choice. You can browse for the folder by clicking on the three dot button next to the checkboxes. With the n-fold cross validation, you are not required to do so as the training data will be used to test the model performance.
Once the building process starts, you can check the progress in the Console window,
as
well as the progress bar at the bottom of the screen. You can also stop the building
process at anytime by clicking the red stop button in the Progress window.
Note:During the model building process, the training files can not
be annotated. Clicking on
the text of the training files pops up an alert window indicating that the user
operation is
waiting for a function to complete.