D1 | D2 | D3 | D4 | |
---|---|---|---|---|
SEATED_RX | 2864.0 | 614.0 | 593.0 | 569.0 |
STANDING_UP_RX | 1305.0 | 293.0 | 276.0 | 269.0 |
WALKING_TX | 2285.0 | 455.0 | 466.0 | 469.0 |
TURN_TX | 1133.0 | 222.0 | 238.0 | 208.0 |
SITTING_DOWN_TX | 1538.0 | 351.0 | 301.0 | 315.0 |
SEATED_TX | 2890.0 | 415.0 | 504.0 | 499.0 |
STANDING_UP_TX | 1289.0 | 291.0 | 271.0 | 267.0 |
WALKING_RX | 2470.0 | 503.0 | 504.0 | 510.0 |
TURN_RX | 997.0 | 194.0 | 228.0 | 175.0 |
SITTING_DOWN_RX | 1524.0 | 353.0 | 304.0 | 301.0 |
Total | 18295.0 | 3691.0 | 3685.0 | 3582.0 |
Looking into the Future: Wi-Fi CSI based HAR
Inertial-based HAR with smartphones and smartwatches has proven its feasibility for real-life applications and state-of-the-art performance. Over the last years, another stream of research has emerged to free users from carrying any type of device: the Wi-Fi CSI (Channel State Information). The key point of the Wi-Fi CSI-based systems is that they employ a Wi-Fi infrastructure enabling device-free sensing (i.e., users do not have to wear any device). In addition, the CSI can be used for diverse tasks, such as HAR and indoor positioning, which would be a major challenge using inertial-based sensors. In this chapter, we analyse the feasibility of a Wi-Fi CSI-based HAR and positioning system using a consumer router and an ESP32 microcontroller, evaluating it by simulating real-life conditions. Preliminary results show a clear instability of the CSI data, thus being unfeasible for real-life applications with the employed devices.
This page contains Plotly interactive figures. Sometimes, the figures might not load properly and show a blank image. Reloading the page might solve the loading issue.
The contents on this section correspond with the Chapter 5 of the dissertation document and constitute an extension of the work “Temporal Stability on Human Activity Recognition based on Wi-Fi CSI” (Matey-Sanz, Torres-Sospedra, and Moreira 2023) presented in the \(13^{th}\) International Conference on Indoor Positioning and Indoor Navigation (IPIN).
Methodology: preliminar localized HAR experiment
Data collection
A dataset is collected using a TP-Link Archer C80 router (one TX antenna) and a SparkFun Thing Plus ESP32-S2 WROOM (one RX antenna) connected to a laptop. The TX and RX were separated by \(5\) meter in Line of Sight (LOS) condition, with two chairs placed in between them, at \(0.5\) meter from each device. Although the chair partially blocks the signal, we consider the setup to be in LOS condition since no heavy obstacles (e.g., walls) are blocking the signal.
The TX device was configured to work with the standard IEEE 802.11n operating in the channel \(6\). The RX device was configured to establish a connection with the TX, send ping requests at \(100\)Hz, and extract the Wi-Fi CSI information from the HT-LTF subcarriers (\(64\), \(56\) non-null) of the ping responses.
Figure 1 depicts the data collection process. It consisted of one subject moving from one chair to the other repeatedly, collecting data for the activities widely used along this thesis: SEATED
, STANDING_UP
, WALKING
, TURNING
and SITTING_DOWN|
. Since the subject performed the activities in both directions (i.e., from TX to RX and vice versa), the activities were labelled accordingly (e.g., SEATED_TX/RX
, WALKING_TX/RX
, etc.) adding a localization component to them.
Figure 2 depicts the data collection strategy, which was spaced out over time to explore potential degradation of CSI data over time. The following datasets were collected:
D1
: The subject performed the sequence of activities \(20\) times (\(10\) in each direction).D2
: After \(10\) minutes of collectingD1
, the subject performed again the sequence of activities \(4\) times (\(2\) in each direction).D3
: After \(20\) minutes of collectingD2
, the subject performed again the sequence of activities \(4\) times.D4
: After \(60\) minutes of collectingD3
, the subject performed again the sequence of activities \(4\) times.
Table 1 shows the number of CSI samples collected for each activity and dataset.
Data preparation
First, from the raw CSI data, the signal amplitude values of each subcarrier where obtained using the equation \[ amplitude_{i} = \sqrt{real_{i}^2 + imaginary_{i}^2}, \] where \(real_{i}\) and \(imaginary_{i}\) are the corresponding components of the complex number associated with the \(i^{th}\) subcarrier. The phase of the signal was discarded.
Next, the dataset was arranged in windows of \(50\) samples with a \(50\%\) overlap. Then, each window was processed using the following techniques:
- The DBSCAN clustering algorithm (Ester et al. 1996) was employed to detect outliers and replace them using the average value of the \(5\) previous and posterior values.
- A \(2\)-level discrete wavelet transform was used to decompose the signals, apply threshold-based filtering on the detail coefficients and reconstruct the signal with the inverse discrete wavelet transform.
Figure 3 and Figure 4 depict the raw (after amplitude extraction) and processed CSI data of the first two sequences of D1
.
The script employed to execute this process is 01_1_preliminar-dataset-processing.py
with the flag --method proposed
.
Code
"""Data preprocessing script for preliminar dataset.
Processes the raw data by: arange samples in windows and process them using 1) DBSCAN for outlier detection
and 2-level DWT for threshold based filtering or 2) Choi et al. method.
**Example**:
$ python 01_1_preliminar-dataset-processing.py
--input_data_path <PATH_OF_RAW_DATA>
--windowed_data_path <PATH_TO_STORE_RESULTS>
--method <PROCESSING_METHOD>
--window_size <WINDOW_SIZE>
--window_overlap <WINDOW_OVERLAP>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
import numpy as np
from alive_progress import alive_bar
from libs.chapter5.pipeline.processing import proposed_method, choi_method
from libs.chapter5.pipeline.raw_data_loading import load_labelled_data
= 50
WINDOW_SIZE = 25
WINDOW_OVERLAP
def create_windows(executions_amplitudes, executions_labels, window_size, window_overlap):
= {}
win = {}
win_labels for execution_id in executions_amplitudes:
= executions_amplitudes[execution_id]
amplitudes = executions_labels[execution_id]
exec_labels
= amplitudes
data = data.shape[1] // window_overlap
n
= []
windows = []
windows_labels for i in range(0, (n-1) * window_overlap, window_overlap):
if i+window_size > data.shape[1]:
break
= exec_labels[i:i+window_size]
window_labels = np.unique(window_labels, return_counts=True)
values, counts if len(values) != 1:
continue
+window_size])
windows.append(data[:,i:i
windows_labels.append(values[counts.argmax()])
= np.array(windows)
windows = np.array(windows_labels)
windows_labels
= windows
win[execution_id] = windows_labels
win_labels[execution_id] return win, win_labels
def process_windows(executions_windows, processing_function):
= {}
processed_windows = executions_windows.keys()
executions_ids with alive_bar(len(executions_ids), title=f'Processing windows', force_tty=True) as progress_bar:
for execution_id in executions_ids:
= []
proc_windows = executions_windows[execution_id]
windows for window in windows:
proc_windows.append(processing_function(window))= np.array(proc_windows)
processed_windows[execution_id]
progress_bar()return processed_windows
def save_windowed_data(data, labels, directory):
if not os.path.exists(directory):
os.makedirs(directory)
= os.path.join(directory, '{0}-x.npy')
x_file_path = os.path.join(directory, '{0}-y.npy')
y_file_path
for execution_id in data:
= data[execution_id]
x = labels[execution_id]
y
format(execution_id), x)
np.save(x_file_path.format(execution_id), y)
np.save(y_file_path.
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--input_data_path', help='Path of input data', type=str, required=True)
parser.add_argument('--windowed_data_path', help='Path to store windowed data', type=str, required=True)
parser.add_argument('--method', help='Processing method', required=True, choices=['proposed', 'choi'])
parser.add_argument(= parser.parse_args()
args
= proposed_method if args.method == 'proposed' else choi_method
processing_function
for dataset in ['D1', 'D2', 'D3', 'D4']:
print(f'Processing dataset {dataset}')
= load_labelled_data(os.path.join(args.input_data_path, dataset))
executions_amp, labels = create_windows(executions_amp, labels, WINDOW_SIZE, WINDOW_OVERLAP)
windows, windows_labels = process_windows(windows, processing_function)
windows_processed save_windowed_data(windows_processed, windows_labels, os.path.join(args.windowed_data_path, dataset))
HAR classifier
Since a previous section showed that the CNN was the best-performing model from the selected ones, in this chapter we keep using a CNN architecture despite the domain of the input data being different. The Grid search technique was used to determine the best hyperparameters for the selected architecture. The process was configured to train and evaluate each combination five times using the Adam optimizer during \(50\) epochs with a batch size of \(32\) windows. The process was executed in two phases to reduce the computational cost: 1) optimization of layers and learning hyperparameters, and 2) optimization of the number of layers. Table 2 contains the best combination of hyperparameters
The script employed to execute the Grid Search is 02_hyperparameter-optimization.py
with the flag --model cnn
.
Code
"""Hyperparameters Grid Search script.
Performs an hyperparameter Grid Search on the specified model. The selected hyperparameters for the search
can be found in `tuning_configuration.py`.
**Example**:
$ python 02_hyperparameter-optimization.py
--data_dir <PATH_OF_DATA>
--model <MLP,CNN>
--phase <initial,extra-layers>
--batch_size <BATCH_SIZE>
--epochs <EPOCHS>
--executions <EXECUTIONS>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
from libs.chapter5.pipeline.data_loading import load_data
from libs.chapter5.pipeline.data_grouping import combine_windows
from libs.chapter5.pipeline.hyperparameters_tuning import get_model_builder, create_tuner, tune, get_tuning_summary
from libs.chapter5.pipeline.tuning_configuration import get_tuning_configuration
from libs.common.data_loading import ground_truth_to_categorical
from libs.common.utils import save_json, set_seed
= 'GRID_SEARCH_{0}'
TUNING_DIR = 'summary.json'
TUNING_SUMMARY_FILE
= 32
BATCH_SIZE = 50
EPOCHS = 5
N_EXECUTIONS
= {
MAPPING 'SEATED_RX': 0,
'STANDING_UP_RX': 1,
'WALKING_TX': 2,
'TURN_TX': 3,
'SITTING_DOWN_TX': 4,
'SEATED_TX': 5,
'STANDING_UP_TX': 6,
'WALKING_RX': 7,
'TURN_RX': 8,
'SITTING_DOWN_RX': 9,
}
def tune_model(data, model_type, batch_size, epochs, n_executions, phase):
set_seed() = get_model_builder(model_type)
model_builder = phase == 'extra-layers'
optimizing_layers
for source, (x, y) in data.items():
= x.shape[1]
features_dimension = get_tuning_configuration(model_type, source if optimizing_layers else None)
tuning_configuration 'features_dimension'] = features_dimension
tuning_configuration[= f'{model_type}_{source}{"_layers" if optimizing_layers else ""}'
tuning_project print(f'Tuning {model_type} model with {source} data')
= create_tuner(
tuner
model_builder,
n_executions,
tuning_configuration, format(phase),
TUNING_DIR.
tuning_project
)
= tune(tuner, x, y, epochs, batch_size)
tuner
save_tuning_summary(tuner, os.path.join(TUNING_DIR, tuning_project))
def save_tuning_summary(tuner, tuning_dir):
save_json(get_tuning_summary(tuner), tuning_dir, TUNING_SUMMARY_FILE)
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--data_dir', help='data directory', type=str, required=True)
parser.add_argument('--model', help='optimize hyperparameters for selected model', type=str, choices=['mlp', 'cnn'])
parser.add_argument('--phase', help='tuning phase: <initial> to tune layer hyperparameters and <extra-layers> to tune number of layers' , type=str, choices=['initial', 'extra-layers'])
parser.add_argument('--batch_size', help='training batch size', type=int, default=BATCH_SIZE)
parser.add_argument('--epochs', help='training epochs', type=int, default=EPOCHS)
parser.add_argument('--executions', help='executions per trial', type=int, default=N_EXECUTIONS)
parser.add_argument(= parser.parse_args()
args
= load_data(args.data_dir)
d1_windows, d1_labels = ground_truth_to_categorical(d1_labels, MAPPING)
y = combine_windows(d1_windows, y)
x, y print(x.shape)
= {
data 'csi': (x, y)
} tune_model(data, args.model, args.batch_size, args.epochs, args.executions, args.phase)
Experimental procedure
Figure 5 depicts the three different evaluation approaches employed to determine the performance of a Wi-Fi CSI model for localized HAR and study the stability of the CSI data over time.
- K-fold cross-validation: classical procedure widely employed in the literature for model evaluation. It consists of splitting the available data into \(K\) parts, where each \(k_{i}\) part is used to evaluate a model trained with the remaining \(K-1\) parts. We employ this evaluation approach with the
D1
dataset and \(K=10\). - Maintaining the temporal dependencies: the K-fold cross-validation is not the most appropriate evaluation approach when dealing with time series since the temporality of the data is altered. To maintain that temporality, the first \(16\) sequences of activities (\(80\%\) of data) from
D1
are used for training and the remaining (last) \(4\) sequences (\(20\%\)) for evaluation. These subsets ofD1
are namedD1T
(training) andD1E
(evaluation). This is a basic approach to investigate the stability of the data. - Effect of time: the model trained with
D1T
is evaluated using the data fromD2
,D3
andD4
. This approach allows to analyse the variation of the classification performance in different time frames (\(10\), \(30\) and \(90\) minutes afterD1
) and therefore, to determine the stability of the CSI data.
The script employed to execute this process is 03_1_multiple-evaluations.py
with the flag --model cnn
.
Code
"""Multiple evaluation script
Performs a cross-validation and an evaluation with different subsets collected at different time frames.
**Example**:
$ python 03_1_multiple_evaluations.py
--data_dir <PATH_OF_DATA>
--reports_dir <PATH_TO_STORE_REPORTS>
--model <MLP,CNN>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
from tensorflow import keras
from tensorflow.keras import layers
from libs.chapter5.pipeline.data_loading import load_data
from libs.chapter5.pipeline.data_grouping import combine_windows, split_train_test
from libs.chapter5.pipeline.ml import cross_validation, evaluate_model
from libs.common.data_loading import ground_truth_to_categorical
from libs.common.utils import save_json, set_seed
= {
MAPPING 'SEATED_RX': 0,
'STANDING_UP_RX': 1,
'WALKING_TX': 2,
'TURN_TX': 3,
'SITTING_DOWN_TX': 4,
'SEATED_TX': 5,
'STANDING_UP_TX': 6,
'WALKING_RX': 7,
'TURN_RX': 8,
'SITTING_DOWN_RX': 9,
}= ['SEATED_RX','STANDING_UP_RX','WALKING_TX','TURNING_TX','SITTING_DOWN_TX', 'SEATED_TX', 'STANDING_UP_TX','WALKING_RX','TURNING_RX','SITTING_DOWN_RX']
LABELS = len(LABELS)
NUM_CLASSES
= ['e01_rx_tx', 'e01_tx_rx', 'e02_rx_tx', 'e02_tx_rx', 'e03_rx_tx', 'e03_tx_rx', 'e04_rx_tx', 'e04_tx_rx',
TRAIN_IDS 'e05_rx_tx', 'e05_tx_rx', 'e06_rx_tx', 'e06_tx_rx', 'e07_rx_tx', 'e07_tx_rx', 'e08_rx_tx', 'e08_tx_rx']
= ['e09_rx_tx', 'e09_tx_rx', 'e10_rx_tx', 'e10_tx_rx']
TEST_IDS
= 32
BATCH_SIZE = 50
EPOCHS = 10
FOLDS
def mlp_model():
set_seed()
= keras.Sequential([
model 128, activation='relu', input_shape=(500,)),
layers.Dense(1024, activation='relu'),
layers.Dense(1024, activation='relu'),
layers.Dense(1024, activation='relu'),
layers.Dense(='softmax')
layers.Dense(NUM_CLASSES, activation
])
compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.0005), metrics=['accuracy'])
model.return model
def cnn_model():
set_seed()
= keras.Sequential([
model =128, kernel_size=(5,25), input_shape=(56, 50, 1)),
layers.Conv2D(filters
layers.BatchNormalization(),'relu'),
layers.Activation(
layers.MaxPooling2D(),
layers.Flatten(),
512, activation='relu'),
layers.Dense(='softmax')
layers.Dense(NUM_CLASSES, activation
])
compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.0001), metrics=['accuracy'])
model.return model
def model_builder(model_type):
if model_type == 'cnn':
return cnn_model
return mlp_model
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--data_dir', help='data directory', type=str, required=True)
parser.add_argument('--reports_dir', help='directory to store the generated classification reports', type=str, required=True)
parser.add_argument('--model', help='optimize hyperparameters for selected model', type=str, choices=['mlp', 'cnn'])
parser.add_argument(= parser.parse_args()
args
= load_data(os.path.join(args.data_dir, 'D1'))
d1_windows, d1_labels = ground_truth_to_categorical(d1_labels, MAPPING)
d1_labels_cat = combine_windows(d1_windows, d1_labels_cat)
x, y
print("Starting 10-fold cross-validation")
= cross_validation(x, y, model_builder(args.model), FOLDS, BATCH_SIZE, EPOCHS, LABELS)
cv_reports 'cv_report.json')
save_json(cv_reports, args.reports_dir,
print("Starting D1T training and D1E evaluation")
= split_train_test(d1_windows, d1_labels_cat, TRAIN_IDS, TEST_IDS)
(x_d1t, y_d1t), (x_d1e, y_d1e) = model_builder(args.model)()
model =BATCH_SIZE, epochs=EPOCHS, verbose=0)
model.fit(x_d1t, y_d1t, batch_size= evaluate_model(model, x_d1e, y_d1e, LABELS)
report 'd1_report.json')
save_json(report, args.reports_dir,
print("Starting D2, D3 and D4 evaluation")
for eval_dataset in ['D2', 'D3', 'D4']:
= load_data(os.path.join(args.data_dir, eval_dataset))
windows, labels = ground_truth_to_categorical(labels, MAPPING)
labels_cat = combine_windows(windows, labels_cat)
x, y = evaluate_model(model, x, y, LABELS)
report f'{eval_dataset.lower()}_report.json')
save_json(report, args.reports_dir,
Investigating the causes of failure
The previous methodology resulted in non-satisfactory outcomes (see Localized HAR based on Wi-Fi CSI). The results showed a clear degradation in the classification accuracy of the employed CNN model when the evaluation took into account data collected spaced in time regarding the training data. That is, classification accuracy quickly degrades over time.
Notwithstanding, temporal instability of CSI data is only one possible explanation for the poor obtained results. Concretely, the following factors could affect the results:
- The selected methods (i.e., data preprocessing and model architecture) might not be able to properly work with CSI data, i.e., generalize from the training data. While CNN approaches have proven to provide good results working with CSI data (Ma, Zhou, and Wang 2019), most related works using the ESP32 microcontroller employ other architectures, such as the MLP.
- The employed hardware for CSI extraction, ESP32-S2 microcontroller, might not be appropriate for such a task. Other devices, such as the Intel 5300 or Atheros NICs might be more appropriate.
- The collected dataset might have been affected by some external interference, altering the environment and changing the CSI data.
- The CSI data is not stable over time and therefore can not be used for real-life applications.
Next, we aim to determine the cause of the bad results presented in Localized HAR based on Wi-Fi CSI. First, to determine that our method is appropriate for CSI data (1), we applied it to two public datasets and compared the results with other state-of-the-art works (Validation of employed methods). Then, to prove that alternative methods validated in the literature would have obtained similar results to our method (1), we applied the method from a related work on our collected dataset (Validation of employed methods). Finally, to verify the temporal stability of the CSI data (4), a new dataset was collected over several days to evaluate the similarity of the data across days (Temporal stability of Wi-Fi CSI data from ESP32 microcontrollers). The remaining factors could not be explored due to resource limitations (2) and the impossibility of determining the existence of external interferences while collecting the dataset (3).
Validation of method on public datasets
Methodology
Two publicly available datasets have been used to validate the methods and model employed: the StanWiFi and the Multi-environment dataset.
- StanWiFi: it was collected by (Yousefi et al. 2017) and made available in GitHub1. The dataset was collected with a Wi-Fi router (Tx) and an Intel 5300 NIC with three Rx antennas, both separated by \(3\) meter in a LOS environment. The dataset contains CSI data from \(90\) subcarriers sampled at \(1000\)Hz corresponding to \(7\) activities: lie, fall, walk, run, sit down, stand up and pick up. For comparison purposes, the pick up activity was removed from the dataset since other works do so.
- Multi-environment: collected in three different environments,
E1
andE2
in LOS conditions andE3
in NLOS condition (Alsaify et al. 2020). The latter dataset is discarded since we focus on LOS conditions. The datasets were collected using two computers (Tx and Rx) equipped with an Intel 5300 NIC, which were separated by \(3.7\) meter sinE1
and \(7.6\) meters inE2
. The CSI data was collected from \(90\) subcarriers at \(320\)Hz corresponding to \(12\) different activities classified in \(6\) groups: no movement, falling, walking, sitting/standing, turning and pick up.
The data preparation steps described in Data preparation were applied to both datasets. While for the collected dataset the windows consisted of \(0.5\) seconds of data, a window size of \(1\) seconds was employed in both public datasets since they contain a higher amount of data.
The script employed to execute the process in StanWiFi dataset is 01_2_stanwifi-processing.py
.
Code
"""Data preprocessing script for StanWiFi dataset.
Processes the raw data by processing the windows generated by the author's scripts using DBSCAN for outlier detection
and 2-level DWT for threshold based filtering.
**Example**:
$ python 01_2_stanwifi-processing.py
--input_data_path <PATH_OF_RAW_DATA>
--windowed_data_path <PATH_TO_STORE_RESULTS>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
import numpy as np
from cross_vali_input_data import csv_import
from libs.chapter5.pipeline.processing import proposed_method
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--windowed_data_path', help='Path to store windowed data', type=str, required=True)
parser.add_argument(= parser.parse_args()
args
\
x_bed, x_fall, x_pickup, x_run, x_sitdown, x_standup, x_walk, = csv_import()
y_bed, y_fall, y_pickup, y_run, y_sitdown, y_standup, y_walk
= [x_bed, x_fall, x_run, x_sitdown, x_standup, x_walk]
x_subsets = [y_bed, y_fall, y_run, y_sitdown, y_standup, y_walk]
y_subsets
= []
x_proc = []
y_proc for x, y in zip(x_subsets, y_subsets):
x_proc.append(proposed_method(x))0,4], axis=1))
y_proc.append(np.delete(y, [
= np.vstack(x_proc)
x = np.transpose(x, axes=(0,2,1))
x = np.vstack(y_proc)
y
'x.npy'), x)
np.save(os.path.join(args.windowed_data_path, 'y.npy'), y) np.save(os.path.join(args.windowed_data_path,
The script employed to execute the process in Multi-environment dataset is 01_3_multienvironment-processing.py
.
Code
"""Data preprocessing script for Multi-environment dataset.
Processes the raw data by aranging samples in windows and processing them using DBSCAN for outlier detection
and 2-level DWT for threshold based filtering
**Example**:
$ python 01_3_multienvironment-processing.py
--input_data_path <PATH_OF_RAW_DATA>
--windowed_data_path <PATH_TO_STORE_RESULTS>
"""
import argparse
import copy
import os
import sys
"../../..")
sys.path.append(
import numpy as np
import pandas as pd
from alive_progress import alive_bar
from libs.chapter5.pipeline.processing import proposed_method
from math import sqrt
= {
ACTIVITY_MAPPING 'A01': 'A1',
'A02': 'A2',
'A03': 'A1',
'A04': 'A1',
'A05': 'A2',
'A06': 'A3',
'A07': 'A5',
'A08': 'A3',
'A09': 'A5',
'A10': 'A4',
'A11': 'A4',
'A12': 'A6',
}
def load_multienvironment_dataset(environment):
= {}
data = os.listdir(environment)
subject_dirs = list(filter(lambda x: x.startswith('Subject'), subject_dirs))
subject_dirs with alive_bar(len(subject_dirs), title=f'Loading data from subjects', force_tty=True) as progress_bar:
for subject_dir in subject_dirs:
= f'S{int(subject_dir.split(" ")[-1]):02d}'
subject = {}
data[subject] = os.path.join(environment, subject_dir)
subject_dir_path for file in os.listdir(subject_dir_path):
if not file.endswith('.csv'):
continue
= file.split('_')[3]
base_activity = os.path.join(subject_dir_path, file)
file_path = pd.read_csv(file_path)
df = df.iloc[160:-160] #remove 0.5 sec after and before due to noise
df
if base_activity not in data[subject]:
= df
data[subject][base_activity] else:
= pd.concat([data[subject][base_activity], df])
data[subject][base_activity]
progress_bar()return data
def amplitude_from_raw_data(data):
= {}
amplitudes with alive_bar(len(data.keys()), title=f'Extracting amplitudes from subject\'s data', force_tty=True) as progress_bar:
for subject in data:
= {}
amplitudes[subject] for activity in data[subject]:
= data[subject][activity]
activity_data = []
activity_amplitudes for index, row in activity_data.iterrows():
= []
instance_amplitudes for antenna in range(1,4):
for subcarrier in range(1,31):
= row[f'csi_1_{antenna}_{subcarrier}']
csi_data = csi_data.split('+')
real, imaginary = int(real)
real = int(imaginary[:-1])
imaginary
** 2 + real ** 2))
instance_amplitudes.append(sqrt(imaginary
activity_amplitudes.append(instance_amplitudes)= np.array(activity_amplitudes)
amplitudes[subject][activity]
progress_bar()return amplitudes
def create_windows(amplitudes, window_size=320, window_overlap=160):
= {}
windows = {}
windows_labels for subject_id in amplitudes:
= []
subject_windows = []
subject_windows_labels for activity_id in amplitudes[subject_id]:
= amplitudes[subject_id][activity_id].T
activity_amplitudes
= activity_amplitudes.shape[1] // window_overlap
n for i in range(0, (n-1) * window_overlap, window_overlap):
if i+window_size > activity_amplitudes.shape[1]:
break
+window_size])
subject_windows.append(activity_amplitudes[:,i:i
subject_windows_labels.append(ACTIVITY_MAPPING[activity_id])
= np.array(subject_windows)
windows[subject_id] = np.array(subject_windows_labels)
windows_labels[subject_id] return windows, windows_labels
def process_windows(windows):
= {}
proc_windows with alive_bar(len(windows.keys()), title=f'Processing subject\'s windows', force_tty=True) as progress_bar:
for subject_id in windows:
= copy.deepcopy(windows[subject_id])
windows_copy for i in range(len(windows_copy)):
= proposed_method(windows_copy[i])
windows_copy[i] = windows_copy
proc_windows[subject_id]
progress_bar()return proc_windows
def save_windowed_data(data, labels, directory):
for subject_id, subject_data in data.items():
f'{subject_id}_x.npy'), subject_data)
np.save(os.path.join(directory, f'{subject_id}_x.npy'), labels[subject_id])
np.save(os.path.join(directory,
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--input_data_path', help='Path of input data', type=str, required=True)
parser.add_argument('--windowed_data_path', help='Path to store windowed data', type=str, required=True)
parser.add_argument(= parser.parse_args()
args
for dataset in ['ENVIRONMENT 1', 'ENVIRONMENT 2']:
print(f'Processing dataset {dataset}')
= load_multienvironment_dataset(os.path.join(args.input_data_path, dataset))
data = amplitude_from_raw_data(data)
amplitudes = create_windows(amplitudes)
windows, windows_labels
del data, amplitudes
= process_windows(windows)
proc_windows save_windowed_data(proc_windows, windows_labels, args.windowed_data_path)
As regards the HAR classifier, the model architecture described in HAR classifier was employed, with minor adaptations in some hyperparameters due to computational limitations2. The adaptations in each dataset are the following:
2 The higher dimensionality of both datasets (higher sampling rate and data from more subcarriers) compared with the collected one makes it unfeasible to use the previous model due to memory limitations.
- StanWiFi: \(16\) number of filters, \(128\) batch size and \(30\) epochs.
- Multi-environment (
E1
andE2
): \(8\) number of filters, \(256\) batch size and \(30\) epochs.
Finally, the experimental procedure consisted of the \(10\)-fold cross-validation to evaluate the CNN model in the public datasets. The results are compared with other related works also employing a K-fold cross-validation approach.
The script employed to execute this process in the StanWiFi is 03_2_cross-validation.py
with the flag --dataset stanwifi
. The same script was used for the Multienvironment dataset employing the flag --dataset multienvironment
.
Code
"""Cross-validation script
Performs a cross-validation on the selected dataset.
**Example**:
$ python 03_2_cross-validation.py
--data_dir <PATH_OF_DATA>
--reports_dir <PATH_TO_STORE_REPORTS>
--dataset <stanwifi,multienvironment>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from libs.chapter5.pipeline.data_grouping import combine_windows
from libs.chapter5.pipeline.ml import cross_validation
from libs.common.data_loading import ground_truth_to_categorical
from libs.common.utils import save_json, set_seed
= ['LIE DOWN', 'FALL', 'WALK', 'RUN', 'SITDOWN', 'STANDUP']
STANWIFI_LABELS = 128
STANWIFI_BATCH_SIZE
= ['No movement', 'Falling', 'Walking', 'Sitting/Standing', 'Turning', 'Pick up pen']
MULTI_ENV_LABELS = {'A1': 0, 'A2': 1, 'A3': 2, 'A4': 3, 'A5': 4, 'A6': 5}
MULTI_ENV_MAPPING = 256
MULTIENV_BATCH_SIZE
= 10
FOLDS = 30
EPOCHS
def stanwifi_model():
set_seed()
= keras.Sequential([
model =16, kernel_size=(5,25), input_shape=(90, 500, 1)),
layers.Conv2D(filters
layers.BatchNormalization(),'relu'),
layers.Activation(
layers.MaxPooling2D(),
layers.Flatten(),
512, activation='relu'),
layers.Dense(6, activation='softmax')
layers.Dense(
])
compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.0001), metrics=['accuracy'])
model.return model
def multienvironment_model():
set_seed()
= keras.Sequential([
model =8, kernel_size=(5,25), input_shape=(90, 320, 1)),
layers.Conv2D(filters
layers.BatchNormalization(),'relu'),
layers.Activation(
layers.MaxPooling2D(),
layers.Flatten(),
512, activation='relu'),
layers.Dense(6, activation='softmax')
layers.Dense(
])
compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.0001), metrics=['accuracy'])
model.return model
def load_multienv_data(path, dataset_dir):
= os.path.join(path, dataset_dir)
dataset_path = ['S01', 'S02', 'S03', 'S04', 'S05', 'S06', 'S07', 'S08', 'S09', 'S10'] if dataset_dir == 'E1' else ['S11', 'S12', 'S13', 'S14', 'S15', 'S16', 'S17', 'S18', 'S19', 'S20']
subjects
= {}
windows = {}
windows_labels for subject_id in subjects:
= np.load(os.path.join(dataset_path, f'x_{subject_id}.npy'))
windows[subject_id] = np.load(os.path.join(dataset_path, f'y_{subject_id}.npy'))
windows_labels[subject_id] return windows, windows_labels
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--data_dir', help='data directory', type=str, required=True)
parser.add_argument('--reports_dir', help='directory to store the generated classification reports', type=str, required=True)
parser.add_argument('--dataset', help='optimize hyperparameters for selected model', type=str, choices=['stanwifi', 'multienvironment'])
parser.add_argument(= parser.parse_args()
args
if args.dataset == 'stanwifi':
= np.load(os.path.join(args.data_dir, 'x.npy'))
x = np.load(os.path.join(args.data_dir, 'x.npy'))
y
= stanwifi_model
model_builder = STANWIFI_BATCH_SIZE
batch_size = STANWIFI_LABELS
labels = cross_validation(x, y, stanwifi_model, FOLDS, STANWIFI_BATCH_SIZE, EPOCHS, STANWIFI_LABELS)
reports 'cv_report.json')
save_json(reports, args.reports_dir, else:
for dataset in ['E1', 'E2']:
= load_multienv_data(args.data_dir, dataset)
windows, windows_labels = ground_truth_to_categorical(windows_labels, MULTI_ENV_MAPPING)
windows_labels_cat = combine_windows(windows, windows_labels_cat)
x, y = cross_validation(x, y, multienvironment_model, FOLDS, MULTIENV_BATCH_SIZE, EPOCHS, MULTI_ENV_LABELS)
reports f'{dataset.lower()}-cv_report.json') save_json(reports, args.reports_dir,
Validation of alternative method in the collected dataset
The methods proposed by Choi et al. (2022) have been applied to the collected dataset. In their work, the authors extract a set of hand-crafted features from the CSI data and employ an MLP model for crowd counting and localization. Choi’s et al. methods have been selected since they followed an appropiate evaluation taking into account the stability of the signal and only showed a small drop in performance.
Methodology
As in Data preparation, the amplitude is extracted from the CSI data and the dataset is arranged in windows of \(50\) samples with a \(50\%\) overlap. Then, the methods presented by Choi et al. (2022) are applied:
- Noise removal: the Hampel and the Savitzky-Golay filters are applied on each subcarrier.
- Feature extraction: the extracted features to be used as input of the MLP model are the Mean, SD, Maximum, Minimum, Lower quartile, Higher quartile, IQR, Differences between adjacent subcarriers and the Euclidean distance.
The script employed to execute this process is 01_1_preliminar-dataset-processing.py
with the flag --method choi
.
Code
"""Data preprocessing script for preliminar dataset.
Processes the raw data by: arange samples in windows and process them using 1) DBSCAN for outlier detection
and 2-level DWT for threshold based filtering or 2) Choi et al. method.
**Example**:
$ python 01_1_preliminar-dataset-processing.py
--input_data_path <PATH_OF_RAW_DATA>
--windowed_data_path <PATH_TO_STORE_RESULTS>
--method <PROCESSING_METHOD>
--window_size <WINDOW_SIZE>
--window_overlap <WINDOW_OVERLAP>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
import numpy as np
from alive_progress import alive_bar
from libs.chapter5.pipeline.processing import proposed_method, choi_method
from libs.chapter5.pipeline.raw_data_loading import load_labelled_data
= 50
WINDOW_SIZE = 25
WINDOW_OVERLAP
def create_windows(executions_amplitudes, executions_labels, window_size, window_overlap):
= {}
win = {}
win_labels for execution_id in executions_amplitudes:
= executions_amplitudes[execution_id]
amplitudes = executions_labels[execution_id]
exec_labels
= amplitudes
data = data.shape[1] // window_overlap
n
= []
windows = []
windows_labels for i in range(0, (n-1) * window_overlap, window_overlap):
if i+window_size > data.shape[1]:
break
= exec_labels[i:i+window_size]
window_labels = np.unique(window_labels, return_counts=True)
values, counts if len(values) != 1:
continue
+window_size])
windows.append(data[:,i:i
windows_labels.append(values[counts.argmax()])
= np.array(windows)
windows = np.array(windows_labels)
windows_labels
= windows
win[execution_id] = windows_labels
win_labels[execution_id] return win, win_labels
def process_windows(executions_windows, processing_function):
= {}
processed_windows = executions_windows.keys()
executions_ids with alive_bar(len(executions_ids), title=f'Processing windows', force_tty=True) as progress_bar:
for execution_id in executions_ids:
= []
proc_windows = executions_windows[execution_id]
windows for window in windows:
proc_windows.append(processing_function(window))= np.array(proc_windows)
processed_windows[execution_id]
progress_bar()return processed_windows
def save_windowed_data(data, labels, directory):
if not os.path.exists(directory):
os.makedirs(directory)
= os.path.join(directory, '{0}-x.npy')
x_file_path = os.path.join(directory, '{0}-y.npy')
y_file_path
for execution_id in data:
= data[execution_id]
x = labels[execution_id]
y
format(execution_id), x)
np.save(x_file_path.format(execution_id), y)
np.save(y_file_path.
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--input_data_path', help='Path of input data', type=str, required=True)
parser.add_argument('--windowed_data_path', help='Path to store windowed data', type=str, required=True)
parser.add_argument('--method', help='Processing method', required=True, choices=['proposed', 'choi'])
parser.add_argument(= parser.parse_args()
args
= proposed_method if args.method == 'proposed' else choi_method
processing_function
for dataset in ['D1', 'D2', 'D3', 'D4']:
print(f'Processing dataset {dataset}')
= load_labelled_data(os.path.join(args.input_data_path, dataset))
executions_amp, labels = create_windows(executions_amp, labels, WINDOW_SIZE, WINDOW_OVERLAP)
windows, windows_labels = process_windows(windows, processing_function)
windows_processed save_windowed_data(windows_processed, windows_labels, os.path.join(args.windowed_data_path, dataset))
As HAR classifier, an MLP model is employed, but instead of using the same architecture as the one employed by Choi, a Grid search process was executed to determine the most appropriate hyperparameters for our dataset.
The Grid search was carried out as described in HAR classifier. Table 3 contains the best combination of hyperparameters.
The script employed to execute the Grid Search is 02_hyperparameter-optimization.py
with the flag --model mlp
.
Code
"""Hyperparameters Grid Search script.
Performs an hyperparameter Grid Search on the specified model. The selected hyperparameters for the search
can be found in `tuning_configuration.py`.
**Example**:
$ python 02_hyperparameter-optimization.py
--data_dir <PATH_OF_DATA>
--model <MLP,CNN>
--phase <initial,extra-layers>
--batch_size <BATCH_SIZE>
--epochs <EPOCHS>
--executions <EXECUTIONS>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
from libs.chapter5.pipeline.data_loading import load_data
from libs.chapter5.pipeline.data_grouping import combine_windows
from libs.chapter5.pipeline.hyperparameters_tuning import get_model_builder, create_tuner, tune, get_tuning_summary
from libs.chapter5.pipeline.tuning_configuration import get_tuning_configuration
from libs.common.data_loading import ground_truth_to_categorical
from libs.common.utils import save_json, set_seed
= 'GRID_SEARCH_{0}'
TUNING_DIR = 'summary.json'
TUNING_SUMMARY_FILE
= 32
BATCH_SIZE = 50
EPOCHS = 5
N_EXECUTIONS
= {
MAPPING 'SEATED_RX': 0,
'STANDING_UP_RX': 1,
'WALKING_TX': 2,
'TURN_TX': 3,
'SITTING_DOWN_TX': 4,
'SEATED_TX': 5,
'STANDING_UP_TX': 6,
'WALKING_RX': 7,
'TURN_RX': 8,
'SITTING_DOWN_RX': 9,
}
def tune_model(data, model_type, batch_size, epochs, n_executions, phase):
set_seed() = get_model_builder(model_type)
model_builder = phase == 'extra-layers'
optimizing_layers
for source, (x, y) in data.items():
= x.shape[1]
features_dimension = get_tuning_configuration(model_type, source if optimizing_layers else None)
tuning_configuration 'features_dimension'] = features_dimension
tuning_configuration[= f'{model_type}_{source}{"_layers" if optimizing_layers else ""}'
tuning_project print(f'Tuning {model_type} model with {source} data')
= create_tuner(
tuner
model_builder,
n_executions,
tuning_configuration, format(phase),
TUNING_DIR.
tuning_project
)
= tune(tuner, x, y, epochs, batch_size)
tuner
save_tuning_summary(tuner, os.path.join(TUNING_DIR, tuning_project))
def save_tuning_summary(tuner, tuning_dir):
save_json(get_tuning_summary(tuner), tuning_dir, TUNING_SUMMARY_FILE)
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--data_dir', help='data directory', type=str, required=True)
parser.add_argument('--model', help='optimize hyperparameters for selected model', type=str, choices=['mlp', 'cnn'])
parser.add_argument('--phase', help='tuning phase: <initial> to tune layer hyperparameters and <extra-layers> to tune number of layers' , type=str, choices=['initial', 'extra-layers'])
parser.add_argument('--batch_size', help='training batch size', type=int, default=BATCH_SIZE)
parser.add_argument('--epochs', help='training epochs', type=int, default=EPOCHS)
parser.add_argument('--executions', help='executions per trial', type=int, default=N_EXECUTIONS)
parser.add_argument(= parser.parse_args()
args
= load_data(args.data_dir)
d1_windows, d1_labels = ground_truth_to_categorical(d1_labels, MAPPING)
y = combine_windows(d1_windows, y)
x, y print(x.shape)
= {
data 'csi': (x, y)
} tune_model(data, args.model, args.batch_size, args.epochs, args.executions, args.phase)
The same experimental procedure described in Experimental procedure with the three evaluation approaches is employed using the method presented by Choi in our collected dataset.
The script employed to execute this process is 03_1_multiple-evaluations.py
with the flag --model mlp
.
Code
"""Multiple evaluation script
Performs a cross-validation and an evaluation with different subsets collected at different time frames.
**Example**:
$ python 03_1_multiple_evaluations.py
--data_dir <PATH_OF_DATA>
--reports_dir <PATH_TO_STORE_REPORTS>
--model <MLP,CNN>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
from tensorflow import keras
from tensorflow.keras import layers
from libs.chapter5.pipeline.data_loading import load_data
from libs.chapter5.pipeline.data_grouping import combine_windows, split_train_test
from libs.chapter5.pipeline.ml import cross_validation, evaluate_model
from libs.common.data_loading import ground_truth_to_categorical
from libs.common.utils import save_json, set_seed
= {
MAPPING 'SEATED_RX': 0,
'STANDING_UP_RX': 1,
'WALKING_TX': 2,
'TURN_TX': 3,
'SITTING_DOWN_TX': 4,
'SEATED_TX': 5,
'STANDING_UP_TX': 6,
'WALKING_RX': 7,
'TURN_RX': 8,
'SITTING_DOWN_RX': 9,
}= ['SEATED_RX','STANDING_UP_RX','WALKING_TX','TURNING_TX','SITTING_DOWN_TX', 'SEATED_TX', 'STANDING_UP_TX','WALKING_RX','TURNING_RX','SITTING_DOWN_RX']
LABELS = len(LABELS)
NUM_CLASSES
= ['e01_rx_tx', 'e01_tx_rx', 'e02_rx_tx', 'e02_tx_rx', 'e03_rx_tx', 'e03_tx_rx', 'e04_rx_tx', 'e04_tx_rx',
TRAIN_IDS 'e05_rx_tx', 'e05_tx_rx', 'e06_rx_tx', 'e06_tx_rx', 'e07_rx_tx', 'e07_tx_rx', 'e08_rx_tx', 'e08_tx_rx']
= ['e09_rx_tx', 'e09_tx_rx', 'e10_rx_tx', 'e10_tx_rx']
TEST_IDS
= 32
BATCH_SIZE = 50
EPOCHS = 10
FOLDS
def mlp_model():
set_seed()
= keras.Sequential([
model 128, activation='relu', input_shape=(500,)),
layers.Dense(1024, activation='relu'),
layers.Dense(1024, activation='relu'),
layers.Dense(1024, activation='relu'),
layers.Dense(='softmax')
layers.Dense(NUM_CLASSES, activation
])
compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.0005), metrics=['accuracy'])
model.return model
def cnn_model():
set_seed()
= keras.Sequential([
model =128, kernel_size=(5,25), input_shape=(56, 50, 1)),
layers.Conv2D(filters
layers.BatchNormalization(),'relu'),
layers.Activation(
layers.MaxPooling2D(),
layers.Flatten(),
512, activation='relu'),
layers.Dense(='softmax')
layers.Dense(NUM_CLASSES, activation
])
compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.0001), metrics=['accuracy'])
model.return model
def model_builder(model_type):
if model_type == 'cnn':
return cnn_model
return mlp_model
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--data_dir', help='data directory', type=str, required=True)
parser.add_argument('--reports_dir', help='directory to store the generated classification reports', type=str, required=True)
parser.add_argument('--model', help='optimize hyperparameters for selected model', type=str, choices=['mlp', 'cnn'])
parser.add_argument(= parser.parse_args()
args
= load_data(os.path.join(args.data_dir, 'D1'))
d1_windows, d1_labels = ground_truth_to_categorical(d1_labels, MAPPING)
d1_labels_cat = combine_windows(d1_windows, d1_labels_cat)
x, y
print("Starting 10-fold cross-validation")
= cross_validation(x, y, model_builder(args.model), FOLDS, BATCH_SIZE, EPOCHS, LABELS)
cv_reports 'cv_report.json')
save_json(cv_reports, args.reports_dir,
print("Starting D1T training and D1E evaluation")
= split_train_test(d1_windows, d1_labels_cat, TRAIN_IDS, TEST_IDS)
(x_d1t, y_d1t), (x_d1e, y_d1e) = model_builder(args.model)()
model =BATCH_SIZE, epochs=EPOCHS, verbose=0)
model.fit(x_d1t, y_d1t, batch_size= evaluate_model(model, x_d1e, y_d1e, LABELS)
report 'd1_report.json')
save_json(report, args.reports_dir,
print("Starting D2, D3 and D4 evaluation")
for eval_dataset in ['D2', 'D3', 'D4']:
= load_data(os.path.join(args.data_dir, eval_dataset))
windows, labels = ground_truth_to_categorical(labels, MAPPING)
labels_cat = combine_windows(windows, labels_cat)
x, y = evaluate_model(model, x, y, LABELS)
report f'{eval_dataset.lower()}_report.json')
save_json(report, args.reports_dir,
Verification of the stability of the CSI signal
This section describes the methodology to determine if the CSI data is stable over time carrying out a simple experiment. To do so, a new data collection is executed minimizing the disturbance of the environment by external factors. Then, an evaluation procedure is designed to determine the similarity of CSI samples collected in different time frames using DL classification models.
Methodology
A dataset was collected using a TP-Link Archer C80 (one Tx antenna) and a SparkFun Thing Plus ESP32-S3 WROOM (one Rx antenna) connected to a computer. The Tx and Rx were placed on a table, separated by \(1\) meter in LOS condition.The Tx device was configured to work with the standard IEEE 802.11n in the channel \(6\). The Rx device was configured to connect to the Rx and extract Wi-Fi CSI data from HT-LTF subcarriers generated by ping traffic at \(100\)Hz.
The data collection consisted of capturing CSI data from an unaltered laboratory from the university for several days: from March \(28^{th}\) to April \(1^{st}\) \(2024\), coinciding with the Easter holidays. During these days, no external human factors would have disturbed the environment and thus, the CSI data. The collected CSI samples were labelled regarding the day they were collected (i.e., \(03/29\), \(03/29\), \(03/30\), \(03/31\), \(04/01\)).
The data preparation steps described in Data preparation with minor adaptations were applied to the dataset. More concretely, given the amount of the collected data (\(24\) GB), the windowing procedure was set to arrange windows of size \(100\) without overlapping.
The script employed to execute this process is 01_4_lodo-dataset-processing.py
.
Code
"""Data preprocessing script for LODO dataset.
Processes the raw data by aranging samples in windows and process them using DBSCAN for outlier detection
and 2-level DWT for threshold based filtering.
**Example**:
$ python 01_4_lodo-dataset-processing.py
--input_data_path <PATH_OF_RAW_DATA>
--windowed_data_path <PATH_TO_STORE_RESULTS>
--window_size <WINDOW_SIZE>
"""
import argparse
import copy
import os
import sys
"../../..")
sys.path.append(
import numpy as np
from alive_progress import alive_bar
from libs.chapter5.pipeline.processing import proposed_method
= 100
WINDOW_SIZE
def create_windows(dataset, labels, window_size=100):
= np.arange(window_size, dataset.shape[1], window_size)
splits return np.array(np.split(dataset, splits, axis=1)[:-1]), np.array(np.split(labels, splits, axis=0)[:-1])[:,0]
def process_windows(windows):
= copy.deepcopy(windows)
windows_copy with alive_bar(len(windows_copy), title=f'Processing windows', force_tty=True, refresh_secs=5) as progress_bar:
for i, window in enumerate(i, windows_copy):
= proposed_method(window)
windows_copy[i]
progress_bar()return windows_copy
def save_windowed_data(data, labels, directory):
if not os.path.exists(directory):
os.makedirs(directory)
'{0}_x.npy'), data)
np.save(os.path.join(directory, '{0}_y.npy'), labels)
np.save(os.path.join(directory,
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--input_data_path', help='Path of input data', type=str, required=True)
parser.add_argument('--windowed_data_path', help='Path to store windowed data', type=str, required=True)
parser.add_argument('--window_size', help='Window size', required=True, default=WINDOW_SIZE)
parser.add_argument(= parser.parse_args()
args
= ['amplitudes_03_28.npy', 'amplitudes_03_29.npy', 'amplitudes_03_30.npy', 'amplitudes_03_31.npy', 'amplitudes_04_01.npy']
amplitude_files = ['labels_03_28.npy', 'labels_03_29.npy', 'labels_03_30.npy', 'labels_03_31.npy', 'labels_04_01.npy']
labels_files
for amplitude_file, label_file in zip(amplitude_files, labels_files):
print(f'Processing dataset {amplitude_file}')
= amplitude_file.split('_', 1)[1]
name = np.load(os.path.join(args.input_data_path, amplitude_file))
amplitudes = np.load(os.path.join(args.input_data_path, label_file))
labels
= create_windows(amplitudes, labels, args.window_size)
windows, windows_labels
del amplitudes, labels
= process_windows(windows)
windows_processed save_windowed_data(windows_processed, windows_labels, os.path.join(args.windowed_data_path, name))
As HAR classifier, the model described in HAR classifier was employed, although with minor adaptations in some hyperparameters due to computational limitations caused by the high quantity of data. More concretely, the number of filters, batch size and epochs were set to \(8\), \(512\) and \(30\), respectively.
Finally, the experimental procedure consisted of a \(5\)-fold cross-validation with the processed dataset. Each fold of the cross-validation corresponds to the data collected in one day, which can be named as Leaving-One-Day-Out (LODO). This procedure aims to evaluate how the models classify data from an unseen day, having two possible outputs:
- The samples from a specific day are classified in any of the remaining days. In other words, a specific day’s samples are similar to those of any other day. These results would imply that the CSI data is stable over time.
- The samples from the day \(X_i\) are classified in the day \(X_{i-1}\) or \(X_{i+1}\). In other words, samples from a specific day are similar only to the adjacent days (i.e., samples most close in time). These results would imply that the CSI data is not stable over time.
The script employed to execute this process is 03_3_lodo.py
.
Code
"""Leaving-One-Day-Out validation script
Performs a Leaving-One-Day-Out evaluation on the LODO dataset.
**Example**:
$ python 03_3_lodo.py
--data_dir <PATH_OF_DATA>
--reports_dir <PATH_TO_STORE_REPORTS>
"""
import argparse
import os
import sys
"../../..")
sys.path.append(
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical
from libs.chapter5.pipeline.ml import evaluate_model
from libs.common.utils import save_json, set_seed
= {
MAPPING '03/28': 0,
'03/29': 1,
'03/30': 2,
'03/31': 3,
'04/01': 4,
}
= ['03/28', '03/29', '03/30', '03/31', '04/01']
LABELS = len(LABELS)
NUM_CLASSES
= 512
BATCH_SIZE = 30
EPOCHS
def build_model():
set_seed()
= keras.Sequential([
model =8, kernel_size=(5,25), input_shape=(56, 100, 1)),
layers.Conv2D(filters
layers.BatchNormalization(),'relu'),
layers.Activation(
layers.MaxPooling2D(),
layers.Flatten(),
512, activation='relu'),
layers.Dense(='softmax')
layers.Dense(NUM_CLASSES, activation
])
compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.0001), metrics=['accuracy'])
model.return model
def train_models(datasets, labels, batch_size=BATCH_SIZE, epochs=EPOCHS, verbose=1):
= []
reports
for i in range(len(datasets)):
= [datasets[j] for j in range(len(datasets)) if j != i]
training_datasets = [labels[j] for j in range(len(labels)) if j != i]
training_labels print(f'Training with: {training_labels}')
print(f'Testing with: {labels[i]}')
= np.vstack(training_datasets)
x_train = one_hot_encoding(np.concatenate(training_labels), MAPPING)
y_train
= datasets[i]
x_test = one_hot_encoding(labels[i], MAPPING)
y_test
= build_model()
model =batch_size, epochs=epochs, verbose=verbose)
model.fit(x_train, y_train, batch_size= evaluate_model(model, x_test, y_test, LABELS)
report
reports.append(report)
del x_train
del y_train
del x_test
del y_test
del training_datasets
del training_labels
del model
return reports
def one_hot_encoding(y, mapping):
return to_categorical(list(map(lambda i: mapping[i], y)), num_classes=len(mapping.keys()))
if __name__ == '__main__':
= argparse.ArgumentParser()
parser '--data_dir', help='data directory', type=str, required=True)
parser.add_argument('--reports_dir', help='directory to store the generated classification reports', type=str, required=True)
parser.add_argument(= parser.parse_args()
args
= []
windows = []
labels
for day in ['03_28', '03_29', '03_30', '03_31', '04_01']:
f'{day}_x.npy'))
windows.append(np.load(os.path.join(args.data_dir), f'{day}_y.npy'))
labels.append(np.load(os.path.join(args.data_dir),
= train_models(windows, labels)
reports
'reports.json') save_json(reports, args.reports_dir,