Abstract

Detection of human activities along with the associated context is of crucial importance for various application areas, including assisted living and well-being. To predict a user's context in the daily-life situation a system needs to learn from multimodal data that are often imbalanced, and noisy with missing values. The model is likely to encounter missing sensors in real-life conditions as well (such as a user not wearing a smartwatch), and it fails to infer the context if any of the modalities used for training the model are missing. In this paper, we propose a method based on an adversarial autoencoder (AAE) for handling missing sensory features and synthesizing realistic samples. We empirically demonstrate the capability of our method in comparison with classical approaches for filling in missing values on a large-scale activity recognition dataset collected in-the-wild. We develop a fully-connected classification network by extending an encoder and systematically evaluate its multi-label classification performance when several modalities are missing. Furthermore, we show class-conditional artificial data generation and its visual and quantitative analysis on the context classification task; representing a strong generative power of AAE.

Overview of the proposed framework for robust context classification with missing sensory modalities.

Results

The multimodal AAE is developed to alleviate two problems in multi-label user context detection: (a) the likely issue of losing features of the same modalities all at once, and (b) synthesizing novel labeled samples. Our empirical results demonstrate that the AAE network trained with structured noise can provide a realistic reconstruction of features from the lost modalities as compared to other methods, such as PCA. Similarly, we show that a AAE model trained with supervision to a decoder network produce realistic synthetic data, which further can be used for other applications.

Reconstruction

Root mean squared error (RMSE) for reconstructing each modality's features
given others; averaged over (user-split) 5-folds.


Restoration of an (phone) accelerometer feature values with the AAE and PCA. The entire
modality is dropped and reconstructed using features from the remaining signals.


Averaged evaluation metrics for 51 contextual labels with 5-folds cross-validation. All the features
from the corresponding modality are dropped and imputed with the considered techniques.


Classification results for 5-folds cross-validation with different missing modalities that are restored with a
specific method. The reported metrics are averaged over 51 labels and BA stands for balanced accuracy.


Recall of 51 contextual labels with 5-folds cross-validation. All the features from accelerometer,
gyroscope and audio modalities are dropped to emulate missing features and imputed with different
techniques to train a classifier.

Synthesizing

Performance of 1-layer neural network for context recognition when: (a) both the training and the
test sets are real (Real, first row); (b) a model trained with synthetic data and the test set is real
(TSTR, second row); and (c) the training set is real and the test set is synthetic (TRTS, bottom row).


Obtained balanced accuracy of 51 contextual labels for two classifiers trained with real and
synthetic samples–evaluation is done on real test data with 5-folds cross-validation.


Examples of real (blue, top) and generated (red, bottom) samples of a randomly selected feature with AAE.

Paper

Read Our Paper

Citation

Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien, "Synthesizing and reconstructing missing sensory modalities in behavioral context recognition." Sensors 18.9 (2018): 2967.

BibTeX

@article{saeed2018synthesizing,
    title={Synthesizing and reconstructing missing sensory modalities in behavioral context recognition},
    author={Saeed, Aaqib and Ozcelebi, Tanir and Lukkien, Johan},
    journal={Sensors},
    volume={18},
    number={9},
    pages={2967},
    year={2018},
    publisher={Multidisciplinary Digital Publishing Institute}
}

References

  • Alireza Makhzani et al., "Adversarial autoencoders." arXiv preprint arXiv:1511.05644 (2015).
  • Yonatan Vaizman et al., "Recognizing detailed human context in the wild from smartphones and smartwatches." IEEE Pervasive Computing 16.4 (2017): 62-74.

Various icons used in the figures are created by Anuar Zhumaev, Tim Madle, Shmidt Sergey, Alina Oleynik, Artdabana@Design and lipi from the Noun Project.