Hand motion prediction - Part 2
In this post, we will convert the target variable to binary class and compare the results to the multi class models in the previous post. The binary class have motion state and steady state. Let's build classification models.
ROC curve is one of the good metrics to measure the performance of models. Each point in the ROC curve represents different thresholds of models. AUC (Area Under Curve) and F1 score that we calculated for multi class are also good metrics to evaluate the models.
AUC calculation
gbt : 0.903
xgbt : 0.896
ada : 0.86
dt : 0.763
edt : 0.9
rf : 0.892
knn : 0.877
lr : 0.848
svc : 0.864
nb : 0.81
We got higher F1 scores than those from multi class models as we expected. If we build models separately only for rest and hold, and only for stroke, preparation, and retraction, we might have better predictions. Even with binary models, there are still misclassification. If we separate the data twice; motion vs rest, and then categorize the motion and the rest into different phases, we might be able to increase F1 score.
We have total 32 features, and some features are more relevant to target variable and some are less relevant. Tree models such as decision trees, random forests, gradient boosting, extra trees, and etc. have an attribute, 'feature_importances_', that shows what features are more significant to predict target variable. The first two rows in a plot below shows the feature importance for binary models, and the bottom two rows are for multi class models. The first two columns. 'sv_rh' and 'sv_rw' represent the scalar velocity of right hand and scalar velocity of right wrist, respectively. And the third column represents the vectorial velocity of right wrist in y direction. We can assume that the three participants used their right hands and wrists more in a certain direction, therefore, the three features show the strongest importance to predict the target variable. The features that start with 'va' are the vectorial acceleration in x, y, or z directions, and they show the least correlations to target variable.
You can play with feature reduction by excluding or minimizing the less relevant features and how it will change the results. Another methods you can try is an ensemble modeling. We already built 10 classification models, and you can take averages of probabilities of prediction from 10 models and have final prediction, or vote for the prediction with 10 models that if 7 models predicted as 1, you decide the final prediction as 1. You can also play with thresholds. According to your focus on precision or recall, you can set threshold as 0.7, and so if the average of probabilities is greater than 0.7, it returns 1, and if it is less than 0.7, it returns 0. It worth to try different techniques if your main goal is to optimize your model for the best prediction.