home‎ > ‎

Head Gesture - controlling software with head motion



How can we enable somebody to control software with their head? Traditionally we would set up physical buttons or attach sensors to a helmet. Both of these solutions have their issues, so I was asked if it would be possible to set up a 'virtual' system.

The aim is to make controllers to allows the students at Beaumont College to operate their Grid software or to activate one of the many network enabled devices they have - such as lamps using head gesture. Many use physical head switches, but fatigue leads to user dissatisfaction. Head gesture recognition is another tool in bag to enable interaction for people with mobility issues.

I got this running using two technologies - webcams and depth cameras. Webcams we all know about. Every time you turn around, somebody is waving one around on their mobile. Depth cameras are the kind of technology used in the Microsoft Kinect XBox game controller and provide more of a 3D image. I used two type of depth camera (never do once what you can do twice with twice the effort) - the Microsoft Kinect and the Asus Xtion Pro. The Kinect allows for an individual to be selected from a group - for instance where somebody can approach a TV to operate it and other people farther away are filtered out. The long term goal is to enable gesture recognition with other parts of the body, such as by recognising a hand gesture.

I tested implementations of both technologies. The webcam technology holds the most promise for a real-world implementation. Results and videos of both systems are presented below.

Hackaday put up an article on this project here.

Webcam

The webcam based head tracker (headBanger) was used as part of the SETT (Services for Enabling Technology Testing) project at http://producttesting.org.uk/?page_id=35. An article on this work will be published in the Journal of Assistive Technologies. I'll put up a link to the software I wrote once the article has been published.

I found some excellent open source head tracking software aimed at game players - FaceTrackNoIR. I use the joystick output from this with my software to enable simple sideways or nodding head movement to operate two controls. The controls emulate a keystroke. I can easily get the head nodding or twisting information as well as the side to side head movement. The advantages of a webcam over a depth camera are that the webcam is:

  1. Unobtrusive.
  2. Runs from battery.
  3. Can work close to the user.
  4. Usually a part of the laptop or mobile you want to use anyway.

The disadvantage compared with a depth camera is that it cannot discern between different participants. The implementation for this technology would be for a single user, in a wheelchair for example, to help with operating software such as Sensory Software's Grid 2 software.

HeadBanger Using a Webcam


Here's a picture of Kevin testing the system at Beaumont College:

Depth Cameras
The depth camera work preceded the webcam system and used a simpler interface for a proof of concept. I published a research paper on the Asus Xtion implementation in the Journal of Communication Matters, August 2014 'HeadBanger: Tracking Head Position as a Controller'.
Microsoft Kinect controller

Here's a video showing me using the system to control an MP3 player set up using Sensory Software's Grid 2:


HeadBanger in action


Asus Xtion Controller

HeadBanger in action

I did some work using a much simpler system and open source software. But I probably only find it simple having wrestled with the beast that is the Microsoft Kinect SDK and Visual Studio.

We used the Asus Xtion Pro Live(basically a miniature version of the Microsoft Kinect depth camera) to set up head tracking. The picture below shows a happy student interaction with the prototype system during the first phase of testing. The Xtion can be seen on the stand just in front of the laptop and you can just about see the simple music controller grid that was set up for testing. This allows the participant to start, stop and select a track by head movement. The controls can be set in three dimensions, to allow for whatever comfortable head motion the participant can make. For instance, we could put the control in front of the head and have it activated using a nodding motion. 
The images below shows the virtual control software used for initial testing. This was developed by building on example code in Making Things See by Borenstein - which is an excellent book which got me going quickly. The pictures show what the user can see, which is the front view 'depth point cloud' - the student is the yellow guy in the middle. The positions of the two 'hotboxes' can be changed using the three slider bars above each one. The image on the left shows that the student has operated the left hand hotbox. This was programmed to trigger a control in the Sensory Software Grid software that the students use. In this case, to operate a music player or to turn on a network enabled light.

 
 

Now, the depth point cloud that we get out of the depth camera is 3D. So we can get a view as if we are looking vertically down at the student. This allows us to position the hotboxes so they really are to their left and right. The image on the left, below, shows a top view of the user operating the same control as in the images above. This shows how we have moved the hotboxes to be parallel with his head. The image on the right shows how we can move the hotbox to be directly in front of the head, so that a nod activates it.
 
 

The images below show the same hotboxes, but from the front. The image on the left shows the inactive hotbox, the image on the right shows it being activated by the student nodding into it.

 
 
The slider bars in the middle control the distance that the camera will look at - which allows me to remove anything behind the user that could be confusing. I'm lucky that the minimum imaging range of the Xtion is pretty low - actually it is lower than the manufacturer lets on. 

 I knocked up a simple interface to allow multiple users to be set up - each with their own hotbox positions and the maximum and minimum range that the camera will work at. A screen grab of this interface is shown below - I used the G4P library to produce this. This is a fairly simple and basic interface, but it seems to do the job. As always, coding the user interface took a lot longer than it should have! This interface writes and reads to a simple xml file. This xml file also records which was the last user profile used and reloads this when the program starts. 



The Xtion could be mounted on a wheelchair tray and still interact with the head. It is happy to run off USB power. I'll take Processing or Python over the Microsoft software development system anyday! But that's just how my little brain works. Full power to the folk who can use the professional tools.

With thanks for the continuing support of the team at Beaumont College, the Technologists Rohan, Zak, Trevor and Steve and the OT department run by Rachel.



Comments