This section contains the results obtained from the preliminar dataset for localized HAR employing Wi-Fi CSI data. The reported results consist on confusion matrixes from CNN models trained and evaluated using the approaches described in Experimental procedure.
Plotly loading issue
This page contains Plotly interactive figures. Sometimes, the figures might not load properly and show a blank image. Reloading the page might solve the loading issue.
Figure 12.1 shows the confusion matrix of the \(10\)-fold cross-validation approach. The classification accuracy reaches \(100\%\) in the SEATED_RX and SEATED_TX, and activities such as WALKING_TX, TURNING_TX reach accuracies over \(90\%\). Misclassifications can be observed between the sitting down and standing up activities, but a clear diagonal (i.e., perfect prediction) can be seen.
Figure 12.2 (D1T/D1E) shows similar results as the previous approach: perfect accuracy on SEATED_RX and over \(90\%\) on WALKING_TX and SEATED_TX. In addition, the main diagonal can be perfectly observed.
In the D1T/D2 evaluation (Figure 12.3), while the diagonal can still be observed, activities such as SEATED_RX are mostly misclassified as WALKING_TX. However, the SEATED_TX or TURNING_TX are perfectly classified.
After \(30\) minutes (Figure 12.4), the diagonal starts to disappear in the activities going towards the Rx, which are misclassified as activities closer to the Tx, such as WALKING_TX and TURNING_TX.
Table 12.1 contains the accuracy, precision, recall and F1-score metrics obtained in each evaluation approach and Table 12.2 the relative decrement in each metric with regards to the first evaluation approach. The \(10\)-fold cross-validation achieves the best metrics, with averages around \(86-87\%\). Then, in the second approach, where the temporal dependency of the data is maintained in D1, the performance metrics slightly decrease around \(84-85\%\), showing a relative drop of \(\approx3\%\) in accuracy and recall, and \(\approx2\%\) in precision and F1-score.
Table 12.1: Summary of obtained metrics in the evaluation approaches.
Accuracy
Precision
Recall
F1-score
CV
0.869255
0.870630
0.869255
0.857086
D1T/D1E
0.841584
0.853984
0.841584
0.842004
D1T/D2
0.519231
0.617909
0.519231
0.490104
D1T/D3
0.382353
0.462571
0.382353
0.313558
D1T/D4
0.265306
0.320546
0.265306
0.202500
The evaluation results of the HAR model with D2, D3 and D4 show a drastic drop in the reported metrics. For instance, the accuracy drops to \(\approx52\%\) in D2, \(\approx38\%\) in D3 and \(\approx26\%\) in D4. These results constitute relative accuracy drops of \(40.26\%\), \(56.01\%\) and \(69.47\%\) with data gathered just \(10\), \(30\) and \(90\) minutes after the training data was collected. Similar drops can be observed in the remaining metrics.
Code
comparisons = {'CV vs. D1T/D1E': [cv_reports, d1_report],'CV vs. D1T/D2': [cv_reports, d2_report],'CV vs. D1T/D3': [cv_reports, d3_report],'CV vs. D1T/D4': [cv_reports, d4_report],}metric_increment_summary(comparisons)
Table 12.2: Decrement (%) of metrics in the evaluation approaches.
Accuracy
Precision
Recall
F1-score
CV vs. D1T/D1E
-3.183271
-1.911931
-3.183271
-1.759685
CV vs. D1T/D2
-40.267145
-29.027340
-40.267145
-42.817424
CV vs. D1T/D3
-56.013715
-46.869374
-56.013715
-63.415865
CV vs. D1T/D4
-69.478904
-63.182340
-69.478904
-76.373408
These results show a clear degradation in the classification accuracy of the employed CNN model when the evaluation took into account data collected spaced in time regarding the training data. That is, classification accuracy quickly degrades over time. Notwithstanding, temporal instability of data is only one possible explanation for the poor obtained results. Concretely, the following factors could affect the results:
The selected methods might not be able to properly work with CSI data, i.e., generalize from the training data. While CNN approaches have proven to provide good results working with CSI data, most related works using the ESP32 microcontroller employ other architectures, such as the MLP.
The employed hardware for CSI extraction, ESP32-S2 microcontroller, might not be appropriate for such a task. Other devices, such as the Intel 5300 or Atheros NICs might be a better option.
The collected dataset might have been affected by some external interference, altering the environment and changing the CSI data.
The CSI data is not stable over time and therefore can not be used for real-life applications.