Gesture-based operation has attracted attention as an alternative to device operation using remote controls or voice recognition. In this research group, we aim to realize such operation using only image information acquired from cameras. When gestures are used, it is important that the relationship between an operation and the corresponding gesture be intuitive for the user, so that the gesture does not become a burden. In this research, we constructed a system that allows a user to operate the power of a home appliance by pointing at the appliance to be controlled. In future work, our group will not only construct new systems based on the relationship between gestures and operations for a wider variety of operations, but also explore approaches using AR glasses.
To associate many home-appliance operation commands with a limited command space fixed on a tabletop, the command space is structured hierarchically. Any number of hierarchical layers can be added. The hierarchy changes when a command space associated with a hierarchy transition is selected, or when a specific home-appliance operation is performed. Users can confirm that the hierarchy has changed through voice guidance.
Command Composition
By grouping multiple operations together and associating them with a single command space, complex home-appliance operations can be performed with fewer actions.
-関連論文-
鹿野 巧, 川村 拓也, 梅田 和昇, "インテリジェントルームにおける複雑な家電操作の実現," 第35回日本ロボット学会学術講演会予稿集, 1I2-06, September 2017.
Command Space
By realizing gesture recognition technologies such as hand-waving recognition using image processing, we are constructing an Intelligent Room that enables users to control various home appliances through gestures. Images from multiple pan-tilt-zoom cameras installed on the ceiling are used to recognize the operator's gestures, enabling operations such as changing TV channels and adjusting volume. The operator can control devices from anywhere in the room using only natural hand and finger gestures, without using a remote controller, sensor glove, or microphone.
At present, we are working on a system in which a command space is virtually arranged relative to the operator. By waving a hand within that space, the user can execute the operation embedded in the space. The three-dimensional position and posture of the person, which are necessary for this system, are obtained using the volume intersection method.
A core technology in our Intelligent Room is hand-waving detection. Hand waving is commonly used in everyday human communication for calling someone's attention, and can therefore be regarded as a natural action for the operator. We have developed a method that can robustly detect hand waving using a simple algorithm based on FFT, or Fast Fourier Transform.
First, the image is reduced in resolution. FFT is then applied to the intensity value of each pixel in the low-resolution image. In a region where hand waving is performed, the intensity values of the low-resolution image oscillate in a pattern close to a sine wave. By evaluating the power spectrum, it is possible to determine whether hand waving is being performed.
-関連論文-
入江耕太,梅田和昇:”濃淡値の時系列変化を利用した画像からの手振りの検出”,日本ロボット学会誌,Vol.21, No.8, pp.923-931, 2003.11.
浅野秀胤,永易 武,織茂達也,寺林賢司,太田 睦,梅田和昇:“フーリエ変換を用いた指振り検出と機器操作への応用”,精密工学会誌,Vol.79, No.6, pp.565-570, 2013.6.
In addition, we have conducted the following research on core technologies for building intelligent rooms.
Our laboratory focused on hand waving, a simple periodic motion, and developed a method for recognizing hand-waving motion from images in order to communicate the operator's position to a machine. However, this method only recognizes whether a repetitive motion, such as waving a hand left and right, is present, and therefore can convey only one type of gesture.
In this research, we extend this method and propose a periodic gesture recognition method that enables multiple simple intentions to be communicated. Specifically, the input grayscale image is first reduced in resolution, and FFT is applied in the time-axis direction to each pixel. Periodic motion regions are detected using the resulting amplitude spectrum. The hand-motion region is then extracted from the periodic motion regions, and the type of periodic gesture is identified using the phase spectrum within that region.
The proposed method enables intuitive and non-contact operation. By using grayscale images, it can robustly recognize periodic gestures despite differences in lighting conditions and individual skin color. In addition, it does not require image processing such as extracting hand regions in advance using skin-color information, enabling simple processing. Therefore, the method is expected to contribute to system miniaturization and hardware implementation, and to realize a versatile interface.
At our laboratory, we focused on hand waving, a simple periodic motion, and developed an image-based method for recognizing hand-waving motions in order to communicate the operator’s position to a machine. However, this method only determines whether a repetitive motion, such as waving a hand from side to side, is present; therefore, it can convey only one type of gesture.
In this study, we extend this method and propose a periodic-gesture recognition method that enables multiple simple intentions to be communicated. Specifically, each pixel in a low-resolution grayscale image is subjected to an FFT along the temporal axis, and periodic-motion regions are detected using the resulting amplitude spectrum. Next, the motion region is extracted from the periodic-motion regions, and the type of periodic gesture is identified using the phase spectrum within that region.
The proposed method enables direct, contactless operation. Because it uses grayscale images, it can robustly recognize periodic gestures despite variations in lighting conditions and individual skin color. In addition, it does not require image processing such as extracting hand regions in advance using skin-color information, making high-speed processing possible. Therefore, the method is expected to contribute to system miniaturization and hardware implementation, and to realize a versatile interface.
Sensor Configuration
Acquired Image