Validation of employed methods

This section validates the employed methods (i.e., data preparation and DL model) on public datasets and uses another method presented in the literature on the collected dataset. This aims to confirm or discard the identified factor (1) as the cause of the poor results in .

Results

Code
import os

from libs.chapter5.analysis.reports import metrics_summary, metric_increment_summary
from libs.common.utils import load_json

REPORTS_DIR = os.path.join('data', 'chapter5', 'model-reports')
REPORTS_FILE = '{}_report.json'
CHOI_REPORTS_PATH = os.path.join(REPORTS_DIR, 'preliminar-dataset', 'choi-method')

Validation on public datasets

shows the average accuracy, precision, recall, and F1-score of the cross-validation procedure obtained by the selected CNN model in the two described public datasets.

The results of the model in the StanWiFi dataset are around 96% in all metrics. These results are better than the ones presented by the creators of the database (), but slightly worse than other proposed solutions in the literature.

Regarding the Multienvironment datasets (E1 and E2), the model obtains significantly worse results than other works in the literature. These results can be explained due to the higher complexity of the dataset compared with StanWiFi and the collected dataset. In addition, the employed model is an adaptation from the previous section and it is not optimized for this dataset, whereas other works completely focus on this dataset and use more complex methods such as specific feature extraction, adaptative windows, or windowing approaches with an excessive overlapping.

Code
stanwifi_reports = load_json(os.path.join(REPORTS_DIR, 'stanwifi', REPORTS_FILE.format('cv')))
e1_reports = load_json(os.path.join(REPORTS_DIR, 'multienvironment', REPORTS_FILE.format('e1-cv')))
e2_reports = load_json(os.path.join(REPORTS_DIR, 'multienvironment', REPORTS_FILE.format('e2-cv')))
metrics_summary([stanwifi_reports, e1_reports, e2_reports], ['StanWiFi CV', 'Multienvironment E1', 'Multienvironment E2'])
Table 13.1: Results of applying the proposed method on the StanWiFi and Multienvironment E1 and E2 datasets.
Accuracy Precision Recall F1-score
StanWiFi CV 0.962032 0.963551 0.962032 0.961797
Multienvironment E1 0.830443 0.871437 0.830443 0.832676
Multienvironment E2 0.789978 0.821395 0.789978 0.783325

Overall, these results show that while the employed method does not improve the existing results in the literature, neither does it completely fail in its classification purpose. In addition, the model could yield more satisfactory results after a proper optimization for both datasets.

Validation of other method

contains the accuracy, precision, recall and F1-score metrics obtained in each evaluation approach and include the relative decrement in each metric regarding the first evaluation approach.

Code
choi_cv_report = load_json(os.path.join(CHOI_REPORTS_PATH, REPORTS_FILE.format('cv')))
choi_d1_report = load_json(os.path.join(CHOI_REPORTS_PATH, REPORTS_FILE.format('d1')))
choi_d2_report = load_json(os.path.join(CHOI_REPORTS_PATH, REPORTS_FILE.format('d2')))
choi_d3_report = load_json(os.path.join(CHOI_REPORTS_PATH, REPORTS_FILE.format('d3')))
choi_d4_report = load_json(os.path.join(CHOI_REPORTS_PATH, REPORTS_FILE.format('d4')))

metrics_summary([choi_cv_report, choi_d1_report, choi_d2_report, choi_d3_report, choi_d4_report], ['CV', 'D1T/D1E', 'D1T/D2', 'D1T/D3', 'D1T/D4'])
Table 13.2: Summary of obtained metrics in the evaluation approaches using Choi’s method.
Accuracy Precision Recall F1-score
CV 0.898863 0.914035 0.898863 0.890320
D1T/D1E 0.841584 0.875248 0.841584 0.845221
D1T/D2 0.259615 0.305082 0.259615 0.253803
D1T/D3 0.284314 0.254167 0.284314 0.234527
D1T/D4 0.112245 0.112294 0.112245 0.082696

In the 10-fold cross-validation approach, Choi’s method achieves better results than the one employed in the previous section, around three percentual points in all metrics.

Regarding the D1T/D1E evaluation, both methods obtain similar outcomes around 84% in all metric except in Choi’s method precision, which is two percentual points better than the other method.

However, Choi’s method fails when taking into account the effect of time. In the D1T/D2, D1T/D3, D1T/D4 evaluations, the obtained results are much worse than the ones presented previously, with accuracy drops of 6.37%, 71.11%, 68.36% and 87.51% (3.18%, 40.26%, 56.01% and 69.47% in our method).

Code
comparisons = {
    'CV vs. D1T/D1E': [choi_cv_report, choi_d1_report],
    'CV vs. D1T/D2': [choi_cv_report, choi_d2_report],
    'CV vs. D1T/D3': [choi_cv_report, choi_d3_report],
    'CV vs. D1T/D4': [choi_cv_report, choi_d4_report],
}
display(metric_increment_summary(comparisons))
Table 13.3: Decrement (%) of metrics in the evaluation approaches using Choi’s method.
Accuracy Precision Recall F1-score
CV vs. D1T/D1E -6.372340 -4.243565 -6.372340 -5.065488
CV vs. D1T/D2 -71.117350 -66.622465 -71.117350 -71.493068
CV vs. D1T/D3 -68.369617 -72.192902 -68.369617 -73.658177
CV vs. D1T/D4 -87.512565 -87.714499 -87.512565 -90.711641

Summary

Based on the results obtained in the previous sections, we can determine that the employed methods and model – factor (1) – are not the cause of the bad results obtained in since 1) the methods and model obtained acceptable results in other public datasets and 2) a validated method in the literature also obtained very poor results with the collected datasets.

Code reference

Tip

The documentation of the Python functions employed in this section can be found in Chapter 5 reference: