This section aims to determine the existence of statistical differences in the performance of the selected models, i.e., MLP, CNN, LSTM and CNN-LSTM, with each dataset.
Plotly loading issue
This page contains Plotly interactive figures. Sometimes, the figures might not load properly and show a blank image. Reloading the page might solve the loading issue.
Note
As shown in Impact of the amount of training data, the models results do not follow a normal distribution. Therefore, the following comparisons employ non-parametric tests.
Table 8.1, 8.2 show that the CNN is the best-performant model in any dataset and any amount of data.
In the smartphone dataset, the CNN-LSTM also performs well for low amounts of data, while with higher quantities of data, there are no statistical differences among the MLP, LSTM and CNN-LSTM. Regarding the smartwatch dataset, the LSTM and CNN-LSTM perform best with medium and higher amounts of data, along with the CNN model.
For the fused dataset, the CNN and the CNN-LSTM show the best accuracies with any amount of data, while the MLP model significantly shows the worst results.
The following sections address the performance of the selected models in each activity and data source.
SEATED
Table 8.3, 8.4 show different results regarding the data source.
Using the smartphone dataset, the CNN-LSTM seems to perform well with low and high quantities of data, while the CNN and LSTM are also the best with high amounts of data. With the smartwatch dataset, the best results are obtained by the CNN, LSTM and CNN-LSTM with low amounts of data, while with higher amounts no significant differences are observed among models. Regarding the fused dataset, the CNN and the CNN-LSTM are the best-performing models, followed by the LSTM. The MLP provides the worst results.
The results in Table 8.5, 8.6 show that the CNN models are the best-performing with any amount of data and data source. The CNN-LSTM are also the best-performing with the smartphone and fused datasets with any quantity of data, while with the smartwatch dataset it struggles with low quantities of data. The LSTM also performs well with high amounts of data using the smartphone and smartwatch datasets. It also provides better results than the MLP with the fused dataset.
The results shown in Table 8.7, 8.8 indicate that the CNN models provide the best results with any amount of data across the three data sources. The CNN-LSTM obtains good results with low amounts of data with the smartphone dataset, while it also produces the best results with medium and high quantities of data with the fused dataset. The LSTM performs well using the smartwatch dataset, similar to the CNN. The MLP provides the worst results with the smartwatch dataset, although its results are not different from the LSTM using the smartphone and fused datasets.
Table 8.9, 8.10 show that the CNN, LSTM and CNN-LSTM obtain the best results in the smartwatch dataset. These three models also perform well with low amounts of data using the smartphone and fused datasets. However, no significant differences among models are observed after \(n \geq 5\) and \(n \geq 7\), respectively. In the case of the smartphone dataset, significant differences appear after \(n \geq 15\), with the MLP and the CNN being the best. With the fused dataset, after \(n \geq 21\), the LSTM provides the significantly worse results.
The results from Table 8.11, 8.12 indicate that the CNN is the best-performing model with any quantity of data across data sources. The CNN-LSTM also performs well with any amount of data using the fused dataset, while also showing a good performance with low and medium amounts of data using the smartphone and smartwatch datasets. The LSTM performs well with high amounts of data using the smartphone and smartwatch datasets, and provides better results than the MLP model using the fused dataset. In the case of the MLP, it provides the worst results in any scenario.
The results obtained in the executed analyses show that the CNN is always the best-performing model in terms of overall accuracy for all data sources and any amount of data. The LSTM also performs well with the smartwatch dataset, and the CNN-LSTM with the smartwatch and fused datasets. The MLP is the worst performing model across data sources, where the differences between the smartwatch- and fused-trained models are significant.
Regarding activity-wise performance, the CNN model presents the best results in every activity and data source. The LSTM performs similarly to the CNN model using the smartwatch dataset in all activities and the smartphone dataset in all activities except TURNING. The performance of the CNN-LSTM seems to work well in some activities on the smartphone and smartwatch datasets, although their results are a bit unstable. On the other hand, the CNN-LSTM shines with the fused dataset, obtaining similar results as the CNN. The MLP model presents the worst results in every case except the WALKING and TURNING with the smartphone dataset.
These results are graphically summarized in Figure 8.1, 8.2, representing the best-performing model in terms of overall accuracy and activities F1-score. Following, some examples are given to show how to interpret the figure: in the SITTING_DOWN activity and for \(n=1\), when using the smartphone dataset no significant differences among model types are observed; with the smartwatch dataset, the CNN model statistically obtains the best performance, and with the fused dataset, the LSTM has the best performance, although not statistically better when compared with another model (whether MLP, CNN or CNN-LSTM, it should be determined by checking Table 8.11).
The figure shows a clear dominance by the CNN model, where it still provides the best metrics even when there is no significant difference compared to other models. In addition, when it does not provide the best results, they are still not significantly different from the best models on most occasions. It is noticeable the lack of influence of the model in the SEATED activity with smartwatch dataset, and in the TURNING activity with smartphone and fused datasets.
In summary, it can be stated that the CNN model is the best of the considered ones since it performs well in every situation. In addition, the LSTM and the CNN-LSTM would also be a feasible option when using smartwatch and fused data. The usage of the MLP model would be strongly discouraged since here and in related works it obtains the worst results.