This section aims to determine if there are differences in how each data source, i.e., smartphone (sp), smartwatch (sw) and fused datasets, performs in the selected models for HAR.
Plotly loading issue
This page contains Plotly interactive figures. Sometimes, the figures might not load properly and show a blank image. Reloading the page might solve the loading issue.
Note
As shown in Impact of the amount of training data, the models results do not follow a normal distribution. Therefore, the following comparisons employ non-parametric tests.
Regarding the accuracy of the models (Table 7.1), the smartwatch dataset always presents the best performance with few amounts of data (i.e., \(n \in [3, 4]\)), while the fused dataset is the best with high amounts of data across all models. When comparing smartphone and smartwatch in the highest amounts of data, the smartphone is superior in the MLP models, while the smartwatch is better in the LSTM models, but no significant differences are appreciated in the CNN and CNN-LSTM models.
Next, we focus on how each data source affects the performance of individual activities with each selected model.
SEATED
Results from Table 7.3 and Table 7.4 (post-hoc) show that the smartwatch-trained models obtain the best scores for the SEATED activity with any amount of data and across all the models.
In the case of the CNN and CNN-LSTM models in \(n \geq 10\), the fused-trained models also achieve the best results, with no significant differences with the smartwatch-trained models. Therefore, the smartphone-trained models achieve the worst results in these models.
Regarding the MLP and LSTM models, when trained with smartphone and fused data their performance is significantly worse than when trained with smartwatch data. Between the smartphone- and fused-trained models, no significant differences exist in the MLP model with high amounts of data, although differences are observed in the LSTM model in favour of the fused-trained models.
Table 7.5, 7.6 show a similar pattern across all models, where the smartwatch-trained models produce the best results with low and medium amounts of data while the fused-trained models also are the best-performing with medium and high amounts of data.
This pattern can be observed in the MLP, CNN and CNN-LSTM models, although the value of \(n\) where the fused-trained models start to outperform the smartwatch-trained models varies. In the case of the CNN and CNN-LSTM, the models trained with smartwatch data are significantly better than ones trained with smartphone data, while the contrary occurs in the MLP model. In the remaining model, the LSTM, since no differences exist between the smartwatch- and fused-trained models, the smartphone-trained models provide the worst results.
Table 7.7, 7.8 show different patterns regarding the model employed. For the MLP and CNN models, the smartwatch models are the best with few quantities of data, but no significant differences among data sources are appreciated in \(n \in [4,5]\). After \(n \geq 6\), the smartphone- and fused-trained models obtain the best results.
In the LSTM and the CNN-LSTM, the smartwatch-trained models are the best-performing with low amounts of data, while with medium and high quantities of data, the fused-trained models are the best. Regarding the models trained with smartphone and smartwatch data, significant differences exist in the CNN-LSTM models in favour of the smartphone-trained models, but not on the LSTM.
The results presented in Table 7.9, 7.10 indicate that the smartphone- and fused-trained models obtain the best metrics in almost any case.
The smartphone-trained models consistently obtain the best results with any amount of data across all models. On the other hand, the fused-trained models require medium amounts of data to equal the smartphone results in the MLP and CNN-LSTM. The smartwatch-trained models only perform well when the minimum amount of training data is used. For any other quantities, they provide the worst results.
The results shown in Table 7.11, 7.12 present similar patterns to those of the STANDING_UP activity. On the one hand, in the MLP, CNN and CNN-LSTM, the smartwatch-trained models are the best-performing with low and medium amounts of data while the fused-trained models provide the best metrics with medium and high amounts of data. On the other hand, the models trained with smartwatch and fused data are the best on the LSTM models.
As in the STANDING_UP activity, aside from the superiority of the fused-trained models, the smartwatch-trained models outperform the smartphone-trained models in the CNN and the CNN-LSTM, while the opposite applies for MLP models.
The obtained results show that the smartwatch-trained models have the best overall accuracy and activities F1-score across all models with low amounts of data. On the other side, models trained with the fused dataset present the best overall accuracy results with higher amounts of data.
When focusing on individual activities, the smartwatch-trained models are the best to recognize the SEATED activity. The smartwatch-trained models are also the best for the STANDING_UP activity with low and medium amounts of data, while with higher amounts the fused-trained models show better results. In the WALKING activity, the fused-trained models obtain the best results across all models, while the smartphone-trained models dataset joins them in the MLP and CNN models. Similarly, the models trained with the smartphone and the fused datasets are the best in the TURNING activity. For the SITTING_DOWN activity, the smartwatch-trained models are good with a low quantity of data while the fused-trained models are the best with medium and higher quantities. It is worth noting that the patterns observed in the STANDING_UP and SITTING_DOWN activities are very similar, which can be explained due to the inverse nature of these movements.
While the models trained with the fused dataset usually show the best results, sometimes they are not statistically better than the results obtained with the other datasets. In other words, the fact that the best results are presented by the fused-trained models and also by the smartphone or smartwatch ones implies that the fusion of the data is not always worth it. For example, in the TURNING activity, the smartphone- and the fused-trained models are always the best, which indicates that the fusion of smartphone and smartwatch data does not improve the smartphone results. The same occurs in the WALKING activity with the MLP and CNN models. However, the fusion is worth it for the remaining models in that activity and the STANDING_UP or SITTING_DOWN activities.
These results are visually summarized and simplified in Figure 7.1, 7.2. Following, some examples are given to show how to interpret the figure: in the WALKING activity and the CNN model, for \(n=1\), the statistically best metrics are obtained with the smartwatch dataset; for \(n=2\), the best metrics are obtained with the smartwatch dataset, although not statistically better compared with another dataset (whether smartphone or fused, should be determined by checking Table 7.7), and for \(n=4\), no significant difference is observed between data sources.
Given the obtained results, it is not possible to determine a clear winner. In the end, the most suitable data source will depend on the amount of data that can be collected, the target activities and the selected model – in line with . We could determine that the fused dataset would be the best option with any model and a moderate amount of data, while the smartwatch dataset would be good for the SEATED activity and the smartphone dataset works fine with the TURNING activity. The smartwatch dataset would also be the preferred choice for the STANDING_UP and SITTING_DOWN activities using the LSTM model.