\(\mu\text{TC}\) Command Line Interface¶
For any given text classification task, \(\mu\text{TC}\) will try to find a suitable text model from a set of possible models defined in the configuration space using the command:
microTC-params -k3 -Smacrorecall -s24 -n24 user-profiling.json -o user-profiling.params
the parameters means for:
user-profiling.json is database of exemplars, one json-dictionary per line with text and klass keywords
-k3 three folds
-s24 specifies that the parameter space should be sampled in 24 points and then get the best among them
-n24 let us specify the number of processes to be launch, it is a good idea to set -s as a multiply of -n.
-o user-profiling.params specifies the file to store the configurations found by the parameter selection process, in best first order
-S or –score the name of the fitness function (e.g., macrof1, microf1, macrorecall, accuracy, r2, pearsonr, spearmanr)
-H makes b4msa to perform last hill climbing search for the parameter selection, in many cases, this will produce much better configurations (never worst, guaranteed)
all of these parameters have default values, such as no arguments are needed
Notes: - “text” can be a string or an array of strings, in the last case, the final vector considers each string as independent strings. - there is no typo, we use “klass” instead of “class” because of oscure historical reasons - -k accepts an a:b syntax that allow searching in a sample of size a and test in b; for 0 < a < 1, and 0 < b < 1. It is common to use b = 1 - a; however, it is not a hard constraint and just you need to follow a + b <= 1 and no overlapping ranges. - If -S is r2, pearsonr, or spearmanr then \(\mu\text{TC}\) computes the parameters for a regression task.
Training the model¶
At this point, we are in the position to train a model. Let us that the workload is emotions.json and that the parameters are in emotions.params then the following command will save the model in emotions.model
microtc-train -o emptions.model -m emotions.params emotions.json
You can create a regressor adding the -R option to microtc-train
Using the model¶
At this point, we are in the position to test the model (i.e, emotions.model) in a new set. That is, we are in the position to ask the classifier to assign a label to a particular text.
microtc-predict -m emotions.model -o emotions-predicted.json test-emotions.json
Finally, you can evaluate the performance of the prediction as follows:
microtc-perf gold.json emotions-predicted.json
This will show a number of scores in the screen.
{
"accuracy": 0.7025,
"f1_anger": 0.705,
"f1_fear": 0.6338797814207651,
"f1_joy": 0.7920353982300885,
"f1_sadness": 0.6596858638743456,
"macrof1": 0.6976502608812997,
"macrof1accuracy": 0.490099308269113,
"macrorecall": 0.7024999999999999,
"microf1": 0.7025,
"quadratic_weighted_kappa": 0.5773930753564155
}
or, in case of provide the –regression flag
{
"filename": "some-path/some-name.predicted",
"pearsonr": [
0.6311471948385253,
1.2734619266038659e-23
],
"r2": 0.3276512897198096,
"spearmanr": [
0.6377984613587965,
3.112636137077516e-24
]
}