In this short blog post, I will cover some of the ideas for sensor fusion i.e. combining data from multiple sensors and different ways to represent (sensor) input data for deep neural networks.

In the last post, we saw how to apply convolutional neural network on accelerometer data for human activity recognition. The input data had three components (x, y and z) from an accelerometer. The sliding window approach was applied to get segments of fixed size with class labels, that fed into a deep net for activity recognition. The depth wise convolution operation was applied to the input, which learns different weights for different input channels, in our case different accelerometer components. The learned features from convolution and pooling layers are then fed into a feed-forward neural network for classification. Another way to represent input data for the convolutional neural network is to keep x, y and z components separate and apply separate convolution and/or pooling operation to learn different features independently [1]. At a later stage, the output of convolution or pooling layers will be flattened and combined. These new features then feed into densely connected layers for classification. Likewise, another idea is to apply FFT or do spectrogram analysis on accelerometer components and feed new representation as input into a deep net. The spectrogram basically represents changes in energy content of a signal as a function of frequency and time. These representations of the raw signal can bring advantages, to learn interesting features by reducing the complexity of the task. To get more information, please consult [2]. Similarly, if you have a dataset with multiple accelerometer sensors having same sampling rate. The 2D segments (like images) can be extracted, where either each row or column can represent x, y and z components from each of the sensor. I would highly recommend interested reader to check following papers [3], [4]. Last but not least, if the data is from multiple sensors having different sampling rates. The first thing to do is to always time align the dataset. Afterwards, a different convolutional neural network can be applied independently on each sensorâ€™s data to learn features. These learned features are then combined and can be feed into LSTM to learn interaction between different sensors. More information on this approach can be found in [5]. I discussed some of the ideas and techniques I picked while reading papers, if you have more interesting thoughts, suggestion or feedback, please comment below.

*Cui, Zhicheng, Wenlin Chen, and Yixin Chen. "Multi-scale convolutional neural networks for time series classification." arXiv preprint arXiv:1603.06995 (2016).**Alsheikh, Mohammad Abu, et al. "Deep activity recognition models with triaxial accelerometers." arXiv preprint arXiv:1511.04664 (2015).**Hammerla, Nils Y., Shane Halloran, and Thomas Ploetz. "Deep, convolutional, and recurrent models for human activity recognition using wearables." arXiv preprint arXiv:1604.08880 (2016).**Yang, Jianbo, et al. "Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition." IJCAI. 2015.**Yao, Shuochao, et al. "DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing." arXiv preprint arXiv:1611.01942 (2016).*