AAA header

Home

The Social Signal Interpretation (SSI) framework offers tools to record, analyse and recognize human behavior in real-time, such as gestures, mimics, head nods, and emotional speech. Following a patch-based design pipelines are set up from autonomic components and allow the parallel and synchronized processing of sensor data from multiple input devices. In particularly SSI supports the machine learning pipeline in its full length and offers a graphical interface that assists a user to collect own training corpora and obtain personalized models. In addition to a large set of built-in components SSI also encourages developers to extend available tools with new functions. For inexperienced users an easy-to-use XML editor is available to draft and run pipelines without special programming skills. SSI is written in C++ and optimized to run on computer systems with multiple CPUs. Binaries and source code are freely available under GPL. Mobile support for SSI is currently in development

ssi-scheme

Hands On

The following article published at ACM SIGMM Records provides a good starting point for developers who want to know more about SSI. Basic concepts and important functions of the framework are introduced and explained by means of a simple pipeline example.

SSI an Open Source Platform for Social Signal Interpretation (ACM SIGMM RECORDS)

Highlights

  • Synchronized reading from multiple sensor devices, e.g. microphone, asio audio interface, web-cam, dv-cam, wiimote, kinect and physiological sensors
  • General filter and feature algorithms, such as image processing, signal filtering, frequency analysis and statistical measurements in real-time
  • Event-based signal processing to combine and interpret high level information, such as gestures, keywords, or emotional user states
  • Pattern recognition and machine learning tools for on-line and off-line processing, including various algorithms for feature selection, clustering and classification
  • Patch-based pipeline design (C++-API or easy-to-use XML editor) and a plug-in system to integrate new components

Recording

recording

SSI supports synchronized recordings from different sensor devices. This allows us to capture user behavior during interaction with a software, a virtual agent or some other type of stimuli. In a typical setting we might decide to use separate cameras for body and face, capture speech from a wireless headset and record user movements with one or more wiimotes. In order to track the user’s physiological condition we also like to include sensors to measure skin conductivity and heartrate. Since the user is interacting with a virtual character, we find it useful to capture the screen to later on connect observed user behavior with certain actions of the agents. Finally, we also apply some real-time signal processing, such as face detection and noise filtering, and store the processed signals along with the raw streams.

Training

training

Once a considerable amount of data has been collected (often including several users recording at different sessions) we are ready to observe the signals. This can be done using a graphical interface that allows us to replay the recorded signals and add description to it. This step is known as annotation. Since the recordings are synchronized we can look for correlations between the signals and share the same annotation between several modalities.

Recognition

recognition

After describing the observed behavior in a set of annotation files, we can now extract models that are able to automatically detect and classify the behavior. To do so, we first apply certain filter and feature extraction methods in order to carve out important characteristics of the signals, e.g. in case of audio we might use a compact representation of the frequency spectrum. Second, we present the feature chunks together with the corresponding description to a classifier. Now, it is the task of a classifier to find a good separation between the categories. Finally, we add the classifier to our processing pipeline to classify user behavior in real-time.

Dissemination

SSI has been presented in September 2011 at INTERSPEECH 2011 in Florence in the course of the special event “Speech Processing Tools”. In July 2013 it was chosen as processing platform for the special neuroscience track as part of the Music Hack Day in Barcelona. In October 2013, SSI was accepted for oral presentation at ACM MM Open Source Software Competition in Barcelona and received a “honorable mention” from the competition judges. SSI has been and is used in several EU funded projects, please visit our project page for details.

To top