Looking for a way to utilize large-scale unlabeled sensory data to improve generalization on downstream task with few-labeled instances? Use: Sense and Learn, a self-supervised learning framework. Read The Paper

Learning general-purpose representations from multisensor data produced by the omnipresent sensing systems (or IoT in general) has numerous applications in diverse use areas. Existing purely supervised end-to-end deep learning techniques depend on the availability of a massive amount of well-curated data, acquiring which is notoriously difficult but required to achieve a sufficient level of generalization on a task of interest. We propose a suite of self-supervised pretext tasks for pre-training deep neural networks without semantic labels for representation learning from raw sensory data. Our auxiliary tasks learn high-level and broadly useful features entirely from unannotated data without any human involvement in the tedious labeling process.

We demonstrate the efficacy of our approach on several publicly available datasets from different domains and in various settings, including linear separability, semi-supervised or few shot learning, and transfer learning. Our methodology achieves results that are competitive with the supervised approaches and close the gap through fine-tuning a network while learning the downstream tasks in most cases. In particular, we show that the self-supervised network can be utilized as initialization to significantly boost the performance in a low-data regime with as few as 5 labeled instances per class, which is of high practical importance to real-world problems. Likewise, the learned representations with self-supervision are found to be highly transferable between related datasets, even when few labeled instances are available from the target domains. The self-learning nature of our methodology opens up exciting possibilities for on-device continual learning.

Figure 1. Illustration of our Sense and Learn representation learning framework.

Self-Supervised Tasks

To learn semantic representations from unannotated sensory data, we develop eight self-supervised surrogate tasks for the deep network.

  • Blend Detection
  • Fusion Magnitude Prediction
  • Feature Prediction from Masked Window
  • Transformation Recognition
  • Temporal Shift Prediction
  • Modality Denoising
  • Odd Segment Recognition
  • Metric Learning with Triplet Loss



We assess the performance of Sense and Learn on 8 publicly available multisensor datasets from diverse domains. The brief description of each utilized data source is summarized in Table 1.

Table 1. Key characteristics of the datasets used in the experiements.

Table 2. Performance evaluation (weighted F-score) of self-supervised representations with a linear classifier.

Figure 2. Contribution of self-supervised pre-training for improving end-task performance with few labeled data.

Figure 3. Generalization of the self-supervised representations under transfer learning setting.


Aaqib Saeed, Victor Ungureanu, and Beat Gfeller. "Sense and Learn: Self-Supervision for Omnipresent Sensors." arXiv preprint arXiv:2009.13233 (2020).


  title={Sense and Learn: Self-supervision for omnipresent sensors},
  author={Saeed, Aaqib and Ungureanu, Victor and Gfeller, Beat},
  journal={Machine Learning with Applications},

Various icons used in the figure are created by Sriramteja SRT, Berkah Icon, Ben Davis, Eucalyp, ibrandify, Clockwise, Aenne Brielmann, Anuar Zhumaev, and Tim Madle from the Noun Project.