Aaqib Saeed

Research Scientist @ Philips Research | Visiting Researcher @ University of Cambridge
Self-Supervised Learning, Ubiquitous Computing, Sensing & On-device ML.

Recognizing Head Gestures and Facial Expressions with Earbuds

Shkurta Gashi, Aaqib Saeed, Alessandra Vicini, Elena Di Lascio, and Silvia Santini @ ACM ICMI 2021

Head gestures and facial expressions -- like, e.g., nodding or smiling -- are important indicators of the quality of human interactions in physical meetings as well as in a computer-mediated environment. The automated systems able to recognize such behavioral cues can support and improve human interactions.

In this work, we consider inertial signals collected from unobtrusive, ear-mounted devices to recognize gestures and facial expressions typically performed during social interactions -- head shaking, nodding, smiling, talking, and yawning. We propose a hierarchical classification approach with transfer learning to improve the generalization and data efficiency of the predictive model using raw IMU data.
Federated Self-Training for Semi-Supervised Audio Recognition

Vasileios Tsouvalas, Aaqib Saeed, Tanir Ozcelebi

Federated Learning is a distributed machine learning paradigm dealing with decentralized and personal datasets. Since data reside on devices like smartphones and virtual assistants, labeling is entrusted to clients or labels are extracted in an automated way for learning models. However, in the case of audio data, acquiring semantic annotations can be prohibitively expensive and time-consuming. As a result, an abundance of audio samples remains unlabeled and unexploited. We propose FedSTAR, a semi-supervised learning approach for audio recognition. FedSTAR leverages unlabeled data via self-training to improve the generalization of audio models.

We show that with as little as 3% labeled data available, FedSTAR on average can improve the recognition rate by 13.28% compared to the fully supervised federated model. We further demonstrate that self-supervised pre-trained models can accelerate the training of on-device models, significantly improving convergence within fewer training rounds.
LumNet: Learning to Estimate Vertical Visual Field Luminance for Adaptive Lighting Control

Prince Songwa, Aaqib Saeed, Sachin Bhardwaj, Thijs Kruisselbrink, Tanir Ozcelebi @ ACM IMWUT 2021 - Ubicomp 2021

We propose a novel approach to estimate desktop luminance using deep learning for adaptive lighting control. Our proposed LumNet model learns visual representations from ceiling-based images, which are collected in indoor spaces within the physical vicinity of the user to predict average desktop luminance as experienced in a real-life setting.

We also present a self-supervised contrastive method for pre-training LumNet with unlabeled data and we demonstrate that the learned features are transferable onto a small labeled dataset which minimizes the requirement of costly data annotations.
Learning Sensory Representations with Minimal Supervision

Aaqib Saeed - PhD Thesis (2021) - Eindoven University of Technology

We develop novel techniques that lie on the intersection of deep learning, ambient sensing, and ubiquitous computing to address issues pertaining to learning from unlabeled sensory data and making models robust to various input artifacts. The research focuses on representation learning with deep neural networks to realize the vision of self-learning for embedded intelligence in everyday devices, such as smartphones, wearables, earables, and more.

Our proposed methods are primarily based on the theme of self-supervised learning to extract generic representations from multi-modal sensory inputs, such as electroencephalogram, audio, accelerometer, and more. Our work enables deep neural networks to learn broadly useful representations that perform well on a spectrum of downstream tasks, are robust to noise and other artifacts, and generalize when transferred to other domains.
Contrastive Learning of General-Purpose Audio Representations

Aaqib Saeed, David Grangier, Neil Zeghidour @ IEEE ICASSP 2021

We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio. We build on top of recent advances in contrastive learning for computer vision and reinforcement learning to design a lightweight, easy-to-implement self-supervised model of audio.

We pre-train embeddings on the large-scale Audioset database and transfer these representations to 9 diverse classification tasks, including speech, music, animal sounds, and acoustic scenes. We show that despite its simplicity, our method significantly outperforms previous self-supervised systems.
Learning from Heterogeneous EEG Signals with Differentiable Channel Reordering

Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour @ IEEE ICASSP 2021

We propose CHARM, a method for training a single neural network across inconsistent input channels. Our work is motivated by Electroencephalography (EEG), where data collection protocols from different headsets result in varying channel ordering and number, which limits the feasibility of transferring trained systems across datasets.

CHARM is differentiable and compatible with architectures (e.g. CNNs) that expect consistent channels. Across different input noising conditions we show its robustness. We also successfully perform transfer learning between datasets collected with different EEG headsets.
Sense and Learn: Self-Supervision for Omnipresent Sensors

Aaqib Saeed, Victor Ungureanu, Beat Gfeller @ Machine Learning with Applications

Looking for a way to utilize large-scale unlabeled sensory (time-series) data to improve generalization on downstream task with few-labeled datapoints? Try: Sense and Learn, a self-supervised learning framework.

We propose a suite of self-supervised pretext tasks for pre-training deep neural networks without semantic labels. We evaluate the quality of learned embedding with our framework on a wide variety of end-tasks with a linear classifier on top of a fixed encoder, effectiveness in the low-data regime, and transfer learning. Our approach opens up exciting possibilities for on-device continual learning without requiring supervision.
Federated Self-Supervised Learning of Multi-Sensor Representations for Embedded Intelligence

Aaqib Saeed, Flora D. Salim, Tanir Ozcelebi, Johan Lukkien @ IEEE Internet of Things Journal 2020

We present a self-supervised method for learning multi-sensor representations in a federated learning setting from unlabeled and decentralized data. Our scalogram-signal correspondence learning (SSCL) technique utilize wavelet transform and a contrastive objective for training the deep network to determine if a given pair of a signal and its complementary view (i.e., a scalogram generated with wavelet transform) align with each other or not.

We extensively assess the quality of learned features with SSCL on diverse public datasets, which comprise signals like electroencephalography, blood volume pulse, accelerometer, and Wi-Fi channel state information. We conduct experiments to demonstrate our approach's effectiveness in both centralized and federated settings through linear classification. Mainly, SSCL significantly improves generalization in the low-data regime by reducing the volume of labeled data required through leveraging self-supervised learning.
On-device Learning of Activity Recognition Networks

Leverage transfer learning for efficiently training activity sensing models directly on the Android device without the need for sending data to the server.

Enabling next-generation privacy-preserving personal informatics apps!
Multi-Task Self-Supervised Learning for Human Activity Detection

Aaqib Saeed, Tanir Ozcelebi, Johan Lukkien @ IMWUT June 2019- Ubicomp 2019

Workshop PaperSelf-supervised Learning Workshop ICML 2019

We've created a Transformation Prediction Network, a self-supervised neural network for representation learning from sensory data that does not require access to any form of semantic labels, e.g., activity classes in human context detection. We demonstrate that simple auxiliary tasks of recognizing signal transformations result in strong supervision for extracting high-level features that generalize well on the down-stream task; substantially improving performance under semi-supervised and transfer learning settings in the low-data regime.
End-to-End Multi-Modal Behavioral Context Recognition in a Real-Life Setting

Aaqib Saeed, Stojan Trajanovski, Tanir Ozcelebi, Johan Lukkien @ Fusion 2019

The automatic and unobtrusive sensing of human context can help develop solutions for assisted living, fitness tracking, sleep monitoring, and several other fields. Towards addressing this issue, we develop a multi-modal neural network capable of multi-label behavioral context recognition. Our empirical evaluation suggests that a deep convolutional network trained end-to-end achieves comparable performance to manual feature engineering with minimal effort.
Synthesizing and Reconstructing Missing Sensory Modalities in Behavioral Context Recognition

Aaqib Saeed, Tanir Ozcelebi, Johan Lukkien @ MDPI Sensors 2018

We propose a method based on an adversarial autoencoder for handling missing sensory features and synthesizing realistic samples. We empirically demonstrate the capability of our approach in comparison with classical techniques for filling-in missing values on a large-scale activity recognition dataset collected in-the-wild.
Model Adaptation and Personalization for Physiological Stress Detection

Aaqib Saeed, Tanir Ozcelebi, Johan Lukkien, Jan van Erp and Stojan Trajanovski @ IEEE DSAA 2018

Long-Term exposure to stressful situations can have negative health consequences, such as the increased risk of cardiovascular diseases and immune system disorder. We utilize a deep reconstruction classification network and multitask learning for domain adaption and personalization of stress recognition models. The proposed methods performed significantly better than baselines on multimodal physiological (time-series) data collected during driving tasks, in both real-world and driving simulator.
Personalized Driver Stress Detection with Multi-Task Neural Networks using Physiological Signals

Aaqib Saeed and Stojan Trajanovski @ ML4H Workshop NeurIPS 2017

Stress can be seen as a physiological response to everyday emotional, mental, and physical challenges. We suggest a subjects-as-tasks approach for multi-task learning based neural network (with hard parameter sharing of mutual representation and task-specific layers) for personalized stress detection using skin conductance and heart rate from wearable devices.
Deep Physiological Arousal Detection in a Driving Simulator using Wearable Sensors

Aaqib Saeed, Stojan Trajanovski, Maurice van Keulen and Jan van Erp @ DMBIH Workshop IEEE ICDM 2017

Driving is an activity that requires considerable alertness. Insufficient attention, imperfect perception, inadequate information processing, and sub-optimal arousal are possible causes of poor human performance. Understanding of these causes and the implementation of effective remedies is of crucial importance to increase traffic safety and improve driver's well-being. For this purpose, we develop an arousal detection algorithm using a temporal convolutional neural network. The model is trained on raw physiological signals, i.e., heart rate, skin conductance, and skin temperature.
#WhoAmI in 160 Characters? Classifying Social Identities Based on Twitter

Anna Priante, Djoerd Hiemstra, Tijs van den Broek, Aaqib Saeed, Michel Ehrenhard and Ariana Need @ NLP and CSS Workshop EMNLP 2016

We combine social theory and NLP methods to classify English-speaking Twitter users' online social identity in profile descriptions. Our study shows how social theory can be used to guide NLP methods, and how such methods provide input to revisit traditional social theory that is strongly consolidated in offline settings.

Acknowledgments

This site was prepared using the Distill template which is adapted and kindly open-sourced by Pierre Sermanet.