Recent Publications
Recent Publications
S. Phon-Amnuaisuk "Transformer-based Neuro-Animator for Qualitative Simulation of Soft Body Movement"
Technical Report: Media Informatics SIG, Universiti Teknologi Brunei (2024) https://arxiv.org/pdf/2408.15258
The human mind effortlessly simulates the movements of objects governed by the laws of physics, such as a fluttering, or a waving flag under wind force, without understanding the underlying physics. This suggests that human cognition can predict the unfolding of physical events using an intuitive prediction process. This process might result from memory recall, yielding a qualitatively believable mental image, though it may not be exactly according to real-world physics. Drawing inspiration from the intriguing human ability to qualitatively visualize and describe dynamic events from past experiences without explicitly engaging in mathematical computations, this paper investigates the application of recent transformer architectures as a neuro-animator model. The visual transformer model is trained to predict flag motions at the t+1 time step, given information of previous motions from t-n ... t time steps. The results show that the visual transformer-based architecture successfully learns temporal embedding of flag motions and produces reasonable quality simulations of flag waving under different wind forces.
S. Phon-Amnuaisuk et al., "Multi-agent Traffic Light Controls with SUMO," 2023 8th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand, 2023, pp. 824-830, doi: 10.1109/ICBIR57571.2023.10147743.
(https://ieeexplore.ieee.org/document/10147743 10.1109/ICBIR57571.2023.10147743)
This study focuses on the optimization of Traffic Light System (TLS) control through the use of adaptive agents. The performance of adaptive cycle TLS was compared with fixed cycle TLS. Two different adaptive cycle TLS agents were investigated, reactive and Deep Q-Network (DQN) agents. The reactive agent adjusted its control signals based on traffic measures such as queue length, while the DQN agent employed a deep reinforcement learning algorithm to determine its control signals. Two sets of simulations were conducted to evaluate the performance of both approaches. The results showed that the adaptive cycle TLS was more effective in reducing waiting times and increasing traffic throughput than the fixed cycle TLS. Among the adaptive agents, the reactive agent outperformed the DQN agent, due to the difficulty in learning an optimal policy in the traffic control domain, which has a non-stationary and complex nature. This preliminary study showed the potential benefits of using adaptive agents for traffic light control, and further studies in various areas such as employing more advanced TLS control methods, expanding the scale of the study, and applying real-demand in the simulation, could be carried out in future work.
Somnuk Phon-Amnuaisuk
Exploring Graph Representation of Chorales. (http://arxiv.org/abs/2201.11745v1)
This work explores areas overlapping music, graph theory, and machine learning. An embedding representation of a node, in a weighted undirected graph G, is a representation that captures the meaning of nodes in an embedding space. In this work, 383 Bach chorales were compiled and represented as a graph. Two application cases were investigated in this paper (i) learning node embedding representation using Continuous Bag of Words (CBOW), skip-gram, and node2vec algorithms, and (ii) learning node labels from neighboring nodes based on a collective classification approach. The results of this exploratory study ascertains many salient features of the graph-based representation approach applicable to music applications.
Somnuk Phon-Amnuaisuk, Peter David Shannon, Saiful Omar
Learning Robot Arm Controls Using Augmented Random Search in Simulated Environments. MIWAI 2021: 118-128
We investigate the learning of continuous action policy for controlling a six-axes robot arm. Traditional tabular Q-Learning can handle discrete actions well but less so for continuous actions since the tabular approach is constrained by the size of the state-value table. Recent advances in deep reinforcement learning and policy gradient learning abstract the look-up table using function approximators such as artificial neural networks. Artificial neural networks abstract look-up policy tables as policy networks that can predict discrete actions as well as continuous actions. However, deep reinforcement learning and policy gradient learning were criticized for their complexity. It was reported in recent works that Augmented Random Search (ARS) has a better sample efficiency and a simpler hyper-parameter tuning. This motivates us to apply the technique to our robot-arm reaching tasks. We constructed a custom simulated robot arm environment using Unity Machine Learning Agents game engine, then designed three robot-arm reaching tasks. Twelve models were trained using ARS techniques. Another four models were trained using the state-of-the-art PG learning technique i.e., proximal policy optimization (PPO). Results from models trained using PPO provide a baseline from the policy gradient technique. Empirical results of models trained using ARS and PPO were analyzed and discussed.
Somnuk Phon-Amnuaisuk, Ken T. Murata, La-or Kovavisaruch, Tiong-Hoo Lim, Praphan Pavarangkoon, Takamichi Mizuhara
Visual-Based Positioning and Pose Estimation. ICONIP (4) 2020: 597-605
Recent advances in deep learning and computer vision offer an excellent opportunity to investigate high-level visual analysis tasks such as human localization and human pose estimation. Although the performances of human localization and human pose estimation have significantly improved in recent reports, they are not perfect, and erroneous estimation of position and pose can be expected among video frames.
Studies on the integration of these techniques into a generic pipeline robust to those errors are still lacking. This paper fills the missing study. We explored and developed two working pipelines that suited visual-based positioning and pose estimation tasks. Analyses of the proposed pipelines were conducted on a badminton game. We showed that the concept of tracking by detection could work well, and errors in position and pose could be effectively handled by linear interpolation of information from nearby frames. The results showed that the Visual-based Positioning and Pose Estimation could deliver position and pose estimations with good spatial and temporal resolutions.
Somnuk Phon-Amnuaisuk, Shiqah Hadi, Saiful Omar
Exploring Spatiotemporal Features for Activity Classifications in Films. ICONIP (4) 2020: 410-417
Humans are able to appreciate implicit and explicit contexts in a visual scene within a few seconds. How we obtain the interpretations of the visual scene using computers has not been well understood, and so the question remains whether this ability could be emulated. We investigated activity classifications of movie clips using 3D convolutional neural network (CNN) as well as combinations of 2D CNN and long short-term memory (LSTM). This work was motivated by the concepts that CNN can effectively learn the representation of visual features, and LSTM can effectively learn temporal information. Hence, an architecture that combined information from many time slices should provide an effective means to capture the spatiotemporal features from a sequence of images. Eight experiments run on the following three main architectures were carried out: 3DCNN, ConvLSTM2D, and a pipeline of pre-trained CNN-LSTM. We analyzed the empirical output, followed by a critical discussion of the analyses and suggestions for future research directions in this domain.
Shiqah Hadi, Somnuk Phon-Amnuaisuk, Soon-Jiann Tan:
Semantic Instance Segmentation in a 3D Traffic Scene Reconstruction task. SICE 2020: 186-191
We research into a 3D Traffic Scene Reconstruction (3DTSR) task. 3DTSR aims to reconstruct a 3D traffic scene from video footage captured from a car's dash-camera. The 3D traffic scene provides a new platform for various services to exploit, for example, self-driving cars, driving behavior analysis, and traffic accident analysis. In our approach, we resort to a passive sensing approach which detects objects and their positions based on visual information. Spatial positions of objects in a 2D scene are lifted into a 3D scene based on information from multi-sources: (i) semantic instance segmentation, (ii) spatial position and volume estimation through orthogonal images, and (iii) prior knowledge concerning shape and volume of objects. In this paper, we focus on semantic instance segmentation, the first phase of the proposed 3DTSR method. The semantic instance segmentation task is accomplished with the Mask R-CNN model pre-trained on COCO dataset. We report the performances of the semantic segmentation task from different Detectron2 models undergone the transfer learning process using information from various datasets. We show that it is feasible to obtain the shape and appearance of objects in the road scene using our proposed segmentation process
Edge Computing for Road Safety Applications
Hadi, S.N., Murata, K.T., Phon-Amnuaisuk, S., Mizuhara, T., Jiann, T.S.
ICSEC 2019 - 23rd International Computer Science and Engineering Conferencethis link is disabled, 2019, pp. 170–175, 8974789
Modern technologies are being developed to address the alarming rise in road accidents caused by drivers’ errors. We leverage computer vision and deep learning at the edge (i.e, in a car) to detect vehicles and pedestrian that are in the surroundings. This information can then be employed to direct driver’s attention to relevant information, minimizing the effects of human errors. This work explores various deep learning pre-trained models: Intel open model zoo and TensorFlow detection model zoo to run inference on Intel Movidius to employ edge computing. We analyze the performance to determine the practicality of using the pre-trained model for road safety purposes. The experiments conducted examine the various SSD-based network models. The accuracy that we obtained by the harmonic average of the precision and recall on the models, the inference time and low demand in computing power determined that TensorFlow detection model zoo is a practical object detector that we can implement to tackle road safety issue.
Somnuk Phon-Amnuaisuk, Ken T. Murata, Praphan Pavarangkoon, Takamichi Mizuhara, Shiqah Hadi
Children Activity Descriptions from Visual and Textual Associations. MIWAI 2019, pp. 121-132
Augmented visual monitoring devices with the ability to describe children’s activities, i.e., whether they are asleep, awake, crawling or climbing, open up possibilities for various applications in promoting safety and well being amongst children. We explore children’s activity description based on an encoder-decoder framework. The correlations between semantic of the image and its textual description are captured using convolution neural network (CNN) and recurrent neural network (RNN). Encoding semantic information as activation patterns of CNN and decoding textual description using probabilistic language model based on RNN can produce relevant descriptions but often suffer from lack of precision. This is because a probabilistic model generates descriptions based on the frequency of words conditioned by contexts. In this work, we explore the effects of adding contexts such as domain specific images and adding pose information to the encoder-decoder models.
Somnuk Phon-Amnuaisuk
Exploring Music21 and Gensim for Music Data Analysis and Visualization. DMBD 2019: 3-12
Computational musicology has been garnering attention since the 1950s. Musicologists appreciate the utilisation of computing power to look for patterns in music. The bottlenecks in the early days were attributed to the lack of standardization of computer representation of music and the lack of computing techniques specialized for the music domain. However, due to the increase in computing power, advances in music technology and machine learning techniques in recent years; the field of computational musicology has been revitalized. In this paper, we explored Music21 toolkit and Gensim, the recent open-source data analytical tool which includes the Word2Vec model, for an analysis of Bach Chorales. The tools and techniques discussed in this paper have revealed many interesting exploratory fronts such as the semantic analogies of musical concepts which deserve a detailed investigation by computational musicologists.
Somnuk Phon-Amnuaisuk
Image Synthesis and Style Transfer
https://arxiv.org/abs/1901.04686
Affine transformation, layer blending, and artistic filters are popular processes that graphic designers employ to transform pixels of an image to create a desired effect. Here, we examine various approaches that synthesize new images: pixel-based compositing models and in particular, distributed representations of deep neural network models. This paper focuses on synthesizing new images from a learned representation model obtained from the VGG network. This approach offers an interesting creative process from its distributed representation of information in hidden layers of a deep VGG network i.e., information such as contour, shape, etc. are effectively captured in hidden layers of neural networks. Conceptually, if Φ is the function that transforms input pixels into distributed representations of VGG layers h, a new synthesized image X can be generated from its inverse function, X=Φ−1(h). We describe the concept behind the approach, present some representative synthesized images and style-transferred image examples.
Somnuk Phon-Amnuaisuk, Noor Deenina Hj Mohd Salleh,Siew-Leing Woo
Pixel-Based LSTM Generative Model
CIIS 2018: 203-212
Applying computational intelligence techniques to create generative models of digits or alphabets has received somewhat little attention as compared to classification task. It is also more challenging to create a generative model that could successfully capture styles and detailed characteristics of symbols. In this paper, we describe the application of the Long Short-Term Memory (LSTM) model trained using a supervised learning approach for generating a variety of the letter A. LSTM is a recurrent neural network with a strong salient feature in its ability to handle long range dependencies, hence, it is a popular choice for building intelligent applications for speech recognition, conversation agent and other problems in time series domains. To formulate the problem as a generative task, all the pixels in a 2D image representing an alphabet (i.e., the letter A in this study) are flattened into a long vector to train the LSTM model. We have shown that LSTM has successfully learned to generate new letters A showing many coherent stylistic features with the original letters from the training sets.
Somnuk Phon-Amnuaisuk, Ken T. Murata, Praphan Pavarangkoon, Kazunori Yamamoto, Takamichi Mizuhara
Exploring the Applications of Faster R-CNN and Single-Shot Multi-box Detection in a Smart Nursery Domain
https://arxiv.org/abs/1808.08675
The ultimate goal of a baby detection task concerns detecting the presence of a baby and other objects in a sequence of 2D images, tracking them and understanding the semantic contents of the scene. Recent advances in deep learning and computer vision offer various powerful tools in general object detection and can be applied to a baby detection task. In this paper, the Faster Region-based Convolutional Neural Network and the Single-Shot Multi-Box Detection approaches are explored. They are the two state-of-the-art object detectors based on the region proposal tactic and the multi-box tactic. The presence of a baby in the scene obtained from these detectors, tested using different pre-trained models, are discussed. This study is important since the behaviors of these detectors in a baby detection task using different pre-trained models are still not well understood. This exploratory study reveals many useful insights into the applications of these object detectors in the smart nursery domain.
Somnuk Phon-Amnuaisuk
Learning to Play Pong using Policy Gradient Learning
http://arxiv.org/abs/1807.08452
Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following parameters: state values, V; state-action values, Q; and policy, pi. These parameters are commonly implemented as an array. Scaling up the problem means scaling up the size of the array and this will quickly lead to a computational bottleneck. To get around this, the RL problem is commonly formulated to learn a specific task using hand-crafted input features to curb the size of the array. In this report, we discuss an alternative end-to-end Deep Reinforcement Learning (DRL) approach where the DRL attempts to learn general task representations which in our context refers to learning to play the Pong game from a sequence of screen snapshots without game-specific hand-crafted features. We apply artificial neural networks (ANN) to approximate a policy of the RL model. The policy network, via Policy Gradients (PG) method, learns to play the Pong game from a sequence of frames without any extra semantics apart from the pixel information and the score. In contrast to the traditional tabular RL approach where the contents in the array have clear interpretations such as V or Q, the interpretation of knowledge content from the weights of the policy network is more illusive. In this work, we experiment with various Deep ANN architectures i.e., Feed forward ANN (FFNN), Convolution ANN (CNN) and Asynchronous Advantage Actor-Critic (A3C). We also examine the activation of hidden nodes and the weights between the input and the hidden layers, before and after the DRL has successfully learnt to play the Pong game. Insights into the internal learning mechanisms and future research directions are then discussed.
Somnuk Phon-Amnuaisuk
What Does a Policy Network Learn After Mastering a Pong Game?
MIWAI 2017: 213-222
Activities in reinforcement learning (RL) revolve around learning the following components: state values V , state-action values Q, policy or the model of the Markov decision process (MDP). Due to high computational cost, the reinforcement learning problem is commonly formulated for learning task specific representations with hand-crafted input features. In this report, we discuss an alternative end-to-end approach where the RL attempts to learn general task representations, in this context, learning how to play the Pong game from a sequence of screen snap shots. We apply artificial neural networks to approximate a policy of a reinforcement learning model. The policy network learns to play the game from a sequence of frames without any extra semantics apart from the pixel information and the score. Many games are simulated using different network architectures and different parameters settings. We examine the activation of hidden nodes and the weights between the input and the hidden layers, before and after the RL has successfully learned to play the game. Insights into the internal learning mechanisms and future research directions are discussed.
Mariatul Kiptiah binti Ariffin, Shiqah Hadi, Somnuk Phon-Amnuaisuk
Evolving 3D Models Using Interactive Genetic Algorithms and L-Systems.
MIWAI 2017: 485-493
The modeling of 3D objects is popularly obtained using a shell/boundary approach. This involves manipulating vertices and planes in a three-dimensional space using computers. Manually creating a 3D model in this way allows a designer full control over the creative processes but at the expense of long working hours. In this work, we explore the hybrid framework between the Interactive Genetic Algorithm (IGA) and the L-system. The L-system generates a 3D model from its production rules and the IGA evolves the 3D model by evolving the L-system’s production rules. In this study, we investigate whether the approach can successfully steer the 3D model design using subjective preference feedback from users. We analyze and discuss the creative processes in the proposed hybrid system and present the models generated by our approach.
Peter David Shannon, Chrystopher L. Nehaniv, Somnuk Phon-Amnuaisuk
Enhancing Exploration and Exploitation of NSGA-II with GP and PDL.
ICSI (1) 2017: 349-361
In this paper, we show that NSGA-II can be applied to GP and the Process Description Language (PDL) and describe two modifications to NSGA-II. The first modification removes individuals which have the same behaviour from GP populations. It selects for de-duplication by taking the result of each objective fitness function together to make a comparison. NSGA-II is designed to expand its Pareto front of solutions by favouring individuals who have the highest or lowest value (boundary points) in a front, for any objective. The second modification enhances exploitation by preferring individuals who occupy an extreme position for most objective fitness functions. The results show, for the first time, that NSGA-II can be used with PDL and GP to successfully solve a robot control problem and that the suggested modifications offer significant improvements over an algorithm used previously with GP and PDL and unmodified NSGA-II for our test problem.
Somnuk Phon-Amnuaisuk, Soo-Young Lee
Investigating a Dictionary-Based Non-negative Matrix Factorization in Superimposed Digits Classification Tasks.
ICONIP (3) 2016: 335-343
Human visual system can recognize superimposed graphical components with ease while sophisticated computer vision systems still struggle to recognize them. This may be attributed to the fact that the image recognition task is framed as a classi.cation task where a classification model is commonly constructed from appearance features. Hence, superimposed components are perceived as a single image unit. It seems logical to approach the recognition of superimposed digits by employing an approach that supports construction/deconstruction of superimposed components. Here, we resort to a dictionary-based non-negative matrix factorization (NMF). The dictionary-based NMF factors a given superimposed digit matrix, V , into the combination of entries in the dictionary matrix W. The H matrix from V = WH can be interpreted as corresponding superimposed digits. This work investigates three different dictionary representations: pixels' intensity, Fourier coefficients and activations from RBM hidden layers. The results show that (i) NMF can be employed as a classi.er and (ii) dictionary-based NMF is capable of classifying superimposed digits with only a small set of dictionary entries derived from single digits.
Somnuk Phon-Amnuaisuk, Soo-Young Lee
Classification of Distorted Handwritten Digits by Swarming an Affine Transform Space.
ICSI (2) 2016: 179-186
Given an affine transform image having a distorted appearance, if a transform function is known, then an inverse transform function can be applied to the image to produce the undistorted original image. However, if the transform function is not known, can we estimate its values by searching through this large affine transform space? Here, an unknown affine transform function of a given digit is estimated by searching through the affine transform space using the Particle Swarm Optimization (PSO) approach. In this paper, we present important concepts of the proposed approach, describe the experimental design and discuss our results which favorably support the potential of the approach. We successfully demonstrate the potential of this novel approach that could be used to classify a large set of unseen distorted affine transform digits with only a small set of digit prototypes.
Somnuk Phon-Amnuaisuk, Ahmad Rafi, Thien-Wan Au, Saiful Omar, Nyuk-Hiong Voon
Crowd Simulation in 3D Virtual Environments.
MIWAI 2016: 162-172
Realistic animation of agents' activities in a 3D virtual environment has many useful applications, for examples, creative industries, urban planning, military simulation and disaster management. It is tedious to manually pre-program each agent's actions, its interactions with other agents and with the environment. Simulation is a good approach in this kind of domain since complex global behaviors emerge from the local interactions. We simulate a crowd movement using a multi-agent approach where each agent is situated in the virtual environment. An agent can perceive and interact with other agents and with the environment. Complex behaviors emerging from these interactions are from local rules and without any central control. These behaviors reveal the complexity of the domain without explicitly programming the system. In this work, we investigate (i) the navigation of the agents and (ii) the corresponding animations of each agent's behaviors. Simulation results under different parameters are presented and discussed.
Somnuk Phon-Amnuaisuk, Saiful Omar, Thien-Wan Au, Rudy Ramlie
Mathematics Wall: Enriching Mathematics Education Through AI. ICSI (3) 2015: 309-317
We present the progress of our ongoing research titled Mathematics Wall which aims to create an interactive problem solving environment where the system intelligently interacts with users. In its full glory, the wall shall provide answers and useful explanations to the problem-solving process using artificial intelligence (AI) techniques. In this report, we discuss the following components: the digital ink segmentation task, the symbol recognition task, the structural analysis task, the mathematics expression recognition, the evaluation of the mathematics expressions and finally present the results. We then present and discuss the design decisions of the whole framework and subsequently, the implementation of the prototypes. Finally, future work on the explanation facility is discussed.
Somnuk Phon-Amnuaisuk
Investigating a hybrid of Tone-Model and Particle Swarm Optimization techniques in transcribing polyphonic guitar sound.
Appl. Soft Comput. 29: 211-220 (2015)
In this article, we describe a novel polyphonic analysis that employs a hybrid of Tone-Model (TM) and Particle Swarm optimization (PSO) techniques. This hybrid approach exploits the strengths of model-based and heuristic search approaches. The correlations between each monophonic Tone-Model and the polyphonic input are used to predict relevant pitches such that the aggregations of the pitches' Tone-Models are able to describe the harmonic contents of the polyphonic input. These aggregations are then re.ned using PSO. PSO heuristically searches for a local optimal aggregation in which some Tone-Models suggested earlier may be excluded from the final best aggregation. We present and discuss the design of our approach. The experimental results from the proposed hybrid approach are compared and contrasted with the Non-Negtive Matrix factorization (NMF) technique. A performance comparison between synthesized guitar sound and acoustic guitar sound is discussed. The experimental results con.rm the potential of TM-PSO in polyphonic transcription task.
Noor Deenina Haji Mohamed Salleh, Somnuk Phon-Amnuaisuk
Quantifying aesthetic beauty through its dimensions: a case study on trochoids.
IJKESDP 5(1): 51-64 (2015)
This paper examines the aesthetic dimensions of the patterns generated from trochoids. A virtual trochoid was implemented and its parameters were varied to generate various patterns. These patterns were evaluated by a group of 101 participants. Inspired by Birkhoffs concept of measuring aesthetics, order to complexity, we calculated order as the quality related to composition and complexity as the quality related to the intricacy of the structure. The result of this experiment suggests that the aesthetic measure was able to predict at least half of the preferred patterns by the participants and that those with art experience were more developed in considering a balanced composition. There were also certain patterns that both the experienced and non-experienced participants agreed on. This was further analysed using Arnheims theory of compositional weight, where size, colour, negative space and central perceptual force determines whether the trochoids had a sense of unity in its composition. To quantify these factors, the patterns ink density, ink distribution and ink gradient were measured. Our findings suggest that Birkhoffs aesthetic measure reveals useful aesthetic information of trochoid patterns and that patterns that are cohesive appears to be more appealing.
Somnuk Phon-Amnuaisuk
Evolving and Discovering Tetris Gameplay Strategies.
KES 2015: 458-467
This work is motivated by one of the important characteristics of an intelligent system: the ability to automatically discover new knowledge. This work employs an evolutionary technique to search for good solutions and then employs a data mining technique to extract knowledge implicitly encoded in the evolved solutions. In this paper, Genetic Algorithm (GA) is employed to evolve a solution for randomly generated tetromino sequences. In contrast to previous works in this area where an evolutionary strategy was employed to evolve weights (i.e., preferences) of predefi.ned evaluation functions which were then used to determine players' actions, we directly evolve the gameplay actions. Each chromosome represents a plausible gameplay strategy and its .fitness is evaluated by simulating the actual gameplay using gameplay instructions from each chromosome. In each simulation, 13 attributes relevant to the gameplay, i.e., contour patterns and actions of each tetromino, are recorded from the best evolved games. This produces 6583 instances which we then apply Apriori algorithm to extract association patterns from them. The result illustrates that sensible gameplay strategies can be successfully extracted from evolved games even though the GA was not informed about these gameplay strategies.
Somnuk Phon-Amnuaisuk, Ramaswamy Palaniappan
Exploring Swarm-based Visual Effects
IES 2015: 333-341
In this paper, we explore the visual effects of animated 2D line strokes and 3D cubes. A given 2D image is segmented into either 2D line strokes or 3D cubes. Each segmented object (i.e., line stroke or each cube) is initialised with the position and the colour of the corresponding pixel in the image. The program animates these objects using the boid framework. This simulates a flocking behavior of line strokes in a 2D space and cubes in a 3D space. In this implementation the animation runs in a cycle from the disintegration of the original image to a swarm of line strokes or 3D cubes, then the swarm moves about and then integrates back into the original image.
Azhan Ahmad, Somnuk Phon-Amnuaisuk, Peter David Shannon
Emulating Pencil Sketches from 2D Images.
SCDM 2014: 571-580
In this paper we present a pixel-based approach to the production of pencil sketch style images. Input pixels are mapped, using their intensity via a texture-map, to the output sketches. Conceptually, pixels are grouped into regions and the texture obtained from the Texture-map is applied to the output image for a given region. The hatchings and cross-hatchings textures give the resultant images the likeness of pencil sketches. By altering the texture-map applied during the transformation, good results can be obtained, often closely mimicking human sketches. We present details of our approach and give example of sketches. In future work, we wish to enrich the texture-maps so that the texture could better reflect or hint the surface properties of objects in the scene (e.g., hardness, softness, etc.).
Somnuk Phon-Amnuaisuk
GA-Tetris Bot: Evolving a better Tetris gameplay using adaptive evolution scheme
ICONIP (3) 2014:579-586
Genetic Algorithm (GA) is employed to evolve a solution for any given tetromino sequence. In contrast to previous works in this area where an evolutionary strategy was employed to evolve weights (i.e., preferences) of predefined evaluation functions which then were used to determine players’ actions, we directly evolve the actions. Each chromosome represents a plausible gameplay strategy and its fitness is evaluated by simulating the game and rating the gameplay quality using two fitness evaluation approaches: evaluating the whole board at once and evaluating local parts of the board in which they will be expanded to the whole board as the evolution progresses. We compare the results of these two evaluation tactics and also compare the evolved gameplay with actual human gameplay.
Somnuk Phon-Amnuaisuk
Handling a Dynamic Mixture of Sources in Blind Source Separation Tasks.
TAAI 2013: 211-216
We investigate an audio scene consisting of two main sound sources: (i) instrumental music and (ii) speech sound. To date, independent component analysis (ICA) has emerged as a powerful technique for blind source separation tasks. However, ICA does not handle a dynamic mixture of sources. In this paper, we investigate this issue and propose a two-pass framework: in the first pass, the system segments the mixed-source input into different chunks based on the similarity of the audio features, in the second pass, the system applies ICA to each segmented chunk. We argue that different mixtures of sources have different audio characteristics. These characteristics can be extracted using machine learning techniques. The extracted features are used to segment the mixed-source input into different chunks. Performing source separation on these chunks yields a better extraction of the original sources than performing a source separation without segmentation. We present the framework, experimental design and results from our proposed approach.
Somnuk Phon-Amnuaisuk
Transcribing Bach Chorales Limitations and Potentials of Non-Negative Matrix Factorisation.
EURASIP J. Audio, Speech and Music Processing 2012: 11 (2012)
https://asmp-eurasipjournals.springeropen.com/track/pdf/10.1186/1687-4722-2012-11.pdf
This article discusses our research on polyphonic music transcription using non-negative matrix factorisation (NMF). The application of NMF in polyphonic transcription offers an alternative approach in which observed frequency spectra from polyphonic audio could be seen as an aggregation of spectra from monophonic components. However, it is not easy to find accurate aggregations using a standard NMF procedure since there are many ways to satisfy the factoring of V ≈ WH. Three limitations associated with the application of standard NMF to factor frequency spectra are (i) the permutation of transcription output; (ii) the unknown factoring r; and (iii) the factoring W and H that have a tendency to be trapped in a sub-optimal solution. This work explores the uses of the heuristics that exploit the harmonic information of each pitch to tackle these limitations. In our implementation, this harmonic information is learned from the training data consisting of the pitches from a desired instrument, while the unknown effective r is approximated from the correlation between the input signal and the training data. This approach offers an effective exploitation of the domain knowledge. The empirical results show that the proposed approach could significantly improve the accuracy of the transcription output as compared to the standard NMF approach.
Edwin Hui Hean Law, Somnuk Phon-Amnuaisuk
Learning and Generating Folk Melodies Using MPF-Inspired Hierarchical Self-Organising Maps.
SEAL 2012: 371-380
One of the elements in human music creativity results from certain features in the brain that allows it to make predictions of events based on information learnt from past music experiences. Inspired by the Memory Prediction Framework (MPF) theory, we propose a method to learn and generate new melodies based on the MPF concept. We first show how an MPF-inspired Hierarchical Self Organizing Map (MPF-HSOM) is used to capture these important features of the brain in the perspective of MPF. This MPF-HSOM is then trained with a selection of melodies taken from a corpus of folk melodies. We then show that by using a prediction algorithm, we are able to generate new melodies based on the trained MPF-HSOM of old melodies. The system proposed here is an abstraction of the features of the brain according to MPF. The results indicate that the system is able to learn and to produce novel melodies of reasonable quality.
Kok-Chin Khor, Choo-Yee Ting, Somnuk Phon-Amnuaisuk
A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection.
Appl. Intell. 36(2): 320-329 (2012)
Network intrusion detection research work that employed KDDCup 99 dataset often encounter challenges in creating classifiers that could handle unequal distributed attack categories. The accuracy of a classification model could be jeopardized if the distribution of attack categories in a training dataset is heavily imbalanced where the rare categories are less than 2% of the total population. In such cases, the model could not efficiently learn the characteristics of rare categories and this will result in poor detection rates. In this research, we introduce an efficient and effective approach in dealing with the unequal distribution of attack categories. Our approach relies on the training of cascaded classifiers using a dichotomized training dataset in each cascading stage. The training dataset is dichotomized based on the rare and non-rare attack categories. The empirical findings support our arguments that training cascaded classifiers using the dichotomized dataset provides higher detection rates on the rare categories as well as comparably higher detection rates for the non-rare attack categories as compared to the findings reported in other research works. The higher detection rates are due to the mitigation of the influence from the dominant categories if the rare attack categories are separated from the dataset.
Choo-Yee Ting, Somnuk Phon-Amnuaisuk
Properties of Bayesian student model for INQPRO.
Appl. Intell. 36(2): 391-406 (2012)
Employing a probabilistic student model in a scientific inquiry learning environment often presents two challenges. First, what constitute the appropriate variables for modeling scientific inquiry skills in such a learning environment, considering the fact that it practices exploratory learning approach? Following exploratory learning approach, students are granted the freedom to navigate from one GUI to another. Second, do causal dependencies exist between the identified variables, and if they do, how should they be defined? To tackle the challenges, this research work attempted the Bayesian Networks framework. Leveraging on the framework, two student models were constructed to predict the acquisition of scientific inquiry skills for INQPRO, a scientific inquiry learning environment developed in this research work. The student models can be differentiated by the variables they modeled and the causal dependencies they encoded. An on-field evaluation involving 101 students was performed to assess the most appropriate structure of the INQPRO’s student model. To ensure fairness in model comparison, the same Dynamic Bayesian Network (DBN) construction approach was employed. Lastly, this paper highlights the properties of the student model that provide optimal results for modeling scientific inquiry skill acquisition in INQPRO.
Somnuk Phon-Amnuaisuk, Jirapat Panjapornpon
Controlling Generative Processes of Generative Art.
INNS-WC 2012: 43-52
Computers have now become a crucial component in the process of digital media contents creation. Computer programs have been written to generate artistic artifacts, for examples, poetry, painting and music. These artifacts could be classified along the spectrum of algorithmic complexity of the programs, where order is at one end and disorder is at the other end. The artifacts classified toward the order end of the spectrum possess a clear structure (e.g., symmetry and tiling) while the artifacts classified toward the disorder end do not have any structure at both local and global levels (e.g., randomization). Highly ordered or disordered generative art artifacts are generated from efficient algorithms that are simpler than those used to generate artifacts classified as lying between the order and disorder extremes. Control is embedded in the programs and it expresses the intention and strategy of the creative process. In this paper, we investigate the issue of control in generative art. We argue that the control expressed in the programs is a crucial component in a generative process. Hence, the ability to exert control is important in guiding the creative processes to intentionally generate complex and interesting artifacts. We describe the nature of the control observed in the generative process of computer generative art techniques. We then present examples of computer generated painting and discuss the control employed in the generative processes.
Somnuk Phon-Amnuaisuk
Learning Chasing Behaviours of Non-Player Characters in Games Using SARSA.
EvoApplications (1) 2011: 133-142
In this paper, we investigate the application of reinforcement learning in the learning of chasing behaviours of non-player characters (NPCs). One popular method for encoding intelligent behaviours in game is by scripting where the behaviours on the scene are predetermined. Many popular games have their game intelligence encoded in this manner. The application of machine learning techniques to learn non-player character behaviours is still being explored by game AI researchers. The application of machine learning in games could enhance game playing experience. In this report, we investigate the design and implementation of reinforcement learning to learn the chasing behaviours of NPCs. The design and the simulation results are discussed and further work in this area is suggested.
Abdollah Dezhangi, Somnuk Phon-Amnuaisuk
Fold prediction problem: The application of new physical and physicochemical-based features
Protein and Peptide Letters 18 (2), 174-185 (2011)
One of the most important goals in bioinformatics is the ability to predict tertiary structure of a protein from its amino acid sequence. In this paper, new feature groups based on the physical and physicochemical properties of amino acids (size of the amino acids' side chains, predicted secondary structure based on normalized frequency of β-Strands, Turns, and Reverse Turns) are proposed to tackle this task. The proposed features are extracted using a modified feature extraction method adapted from Dubchak et al. To study the effectiveness of the proposed features and the modified feature extraction method, AdaBoost.M1, Multi Layer Perceptron (MLP), and Support Vector Machine (SVM) that have been commonly and successfully applied to the protein folding problem are employed. Our experimental results show that the new feature groups altogether with the modified feature extraction method are capable of enhancing the protein fold prediction accuracy better than the previous works found in the literature.
Somnuk Phon-Amnuaisuk
Exploring Particle-Based Caricature Generations
International Conference on Informatics Engineering and Information Science ICIEIS 2011 pp 37-46 (2011)
Computer generated caricatures are commonly created using either line drawing or image warping technique. Two main paradigms employed to automate the caricature generation process are variance exaggeration which exaggerates facial components that deviate from norms; and example-based generation which exaggerates facial components according to provided templates. In this paper, we explore a novel application of an interactive particle-based technique for generating a caricature of a given face. This does not require prior examples but relies on users’ feedback to explore the caricature face space. In this approach, facial feature points are represented as particles and their movements are used to incrementally warp a given face until the desired exaggerations are achieved. We have shown that the proposed approach could successfully provide an interactive means for generating good quality facial caricatures.
Kian Chin Lee, Somnuk Phon-Amnuaisuk, Choo Yee Ting
A comparison of HMM, Naive Bayesian and Markov model in exploiting knowledge content in digital ink: A case study on handwritten music notation recognition
ICME 2010: 202-297 (2010)
The performance of a model is dependent not only on the amount of knowledge available to the model but also on how the knowledge is exploited. We investigate the recognition of handwritten musical notation based on three related probabilistic inference techniques: Hidden Markov Models (HMMs), Markov Models (MMs) and Naïve Bayes (NBs). Music notes are written on a tablet. A sequence of ink patterns representing this symbol is captured and subsequently employed for constructing the models of HMMs, MMs and NBs. The proposed approach exploits both global and local information derived from ink patterns which we have demonstrated the exploitation of this information via different features employed in different HMMs. The specificity and sensitivity measures of these classification models are compared using unseen test datasets. The findings show that HMM outperformed MM and NB models, due to the ability of HMM in exploiting both transitional probability (transition matrix A) and the overall likelihood of the observed events (emission matrix B). Also, HMMs with more hidden states outperformed those with less states, since a larger model has more capacity. In conclusion, our approach demonstrated that HMM can better exploit information extracted from ink patterns than models of MM or NB, and therefore is an optimal inference technique to encoding useful information for musical notation representation.
Adham Atyabi, Somnuk Phon-Amnuaisuk, Chin Kuan Ho
Navigating a robotic swarm in an uncharted 2D landscape
Applied Soft Computing 10(1): 149-169 (2010)
Navigation is a major issue in robotics due to the necessity for the robots’ course of movement. Navigation consists of two essential components known as localization and planning. Localization in robotics refers to one’s location with reference to a well known position inside the map. Planning is considered as the computation of a path through a map which represents the environment. This given path would be chosen based on the potential of the problem so that the expected destination would be achieved. As such, a reliable map is essential for navigation without which robots would not be able to accomplish the goals. In navigational approaches, reliability of the map would be challenged due to the dynamic and unpredictable nature of real-world applications. It is, consequently, crucial to implement solutions for searching such environments—those affected by dynamic and noisy constraints. In the present study, two enhanced versions of particle swarm optimization (PSO) called area extension PSO (AEPSO) and cooperative AEPSO (CAEPSO) are employed. During the study, AEPSO and CAEPSO are employed as decision-makers and movement controllers of simulated robots (hereafter referred to as agents). The agents’ task is to seek for survivors in realistic simulations based on real-world hostile situations. This study examines the feasibility of AEPSO and CAEPSO on uncertain and time-dependent simulated environments. The simulations follow two phases of training and testing model. Agents use past knowledge gathered during the training phase in their testing phase. The study addresses the impacts of past knowledge, homogeneity and heterogeneity in robotic swarm search. The results demonstrate the feasibility of CAEPSO as robot controller and decision-maker.
Somnuk Phon-Amnuaisuk
Investigating Music Pattern Formations from Heterogeneous Cellular Automata
Journal of New Music Research 39(3):253-267 (2010)
https://www.tandfonline.com/doi/abs/10.1080/09298215.2010.481360?src=recsys&journalCode=nnmr20
Patterns observed from melody lines, harmonic movements, textual appearances and formal structures in music pieces characterize individual composition. Composers exploit a large number of possible patterns and creatively compose a new piece of music by weaving various patterns together in a musically intelligent manner. To emulate high-level intelligent behaviours such as music composition skills, knowledge-intensive algorithmic composition approaches have been investigated with some successful outcomes. Nevertheless, a knowledge intensive approach has its limitations in a knowledge elicitation process. This paper discusses the applications of heterogeneous cellular automata (hetCA) in generating chorale melodies and Bach chorales harmonization. The machine learning approach is exploited in the learning of rewrite-rules for cellular automata. Rewrite-rules are learned from music examples using a time-delay neural network (TDNN). After each TDNN (hetCA model) has successfully learned musical patterns from the given examples, new patterns are generated from the hetCA model.
Somnuk Phon-Amnuaisuk
Learning Cooperative Behaviours in Multiagent Reinforcement Learning.
ICONIP (1) 2009: 570-579
We investigated the coordination among agents in a goal finding task in a partially observable environment. In our problem formulation, the task was to locate a goal in a 2D space. However, no information related to the goal was given to the agents unless they had formed a swarm. Further more, the goal must be located by a swarm of agents, not a single agent. In this study, cooperative behaviours among agents were learned using our proposed \emph{context dependent multiagent SARSA} algorithms (CDM-SARSA). In essence, instead of tracking the actions from all the agents in the Q-table i.e., $Q(s,{\bf a})$, the CDM-SARSA tracked only action $a_i$ of agent $i$ and the context $c$ resulting from the actions of all the agents, i.e., $Q_i(s,a_i,c)$. This approach reduced the size of the state space considerably. Tracking all the agents' actions was impractical since the state space increased exponentially with every new agent added into the system. In our opinion, tracking the context abstracted away unnecessary details and this approach was a logical solution for multiagent reinforcement learning task. The proposed approach for learning cooperative behaviours was illustrated using a different number of agents and with different grid sizes. The empirical results confirmed that the proposed CDM-SARSA could learn cooperative behaviours successfully.
Somnuk Phon-Amnuaisuk
Classify Event-Related Motor Potentials of Cued Motor Actions.
ICONIP (1) 2008: 153-160
Motor related potentials are generated when an individual is engaged in a task involving motor actions. The transient post-synaptical potential could be observed from the recorded electroencephalogram (EEG) signal. Properties derived from time domain and frequency domain such as event-related motor potential and suppression in band power could be useful EEG features. In this report, \emph{lateralised motor potential (LMP)} and \emph{band power ratio (BPR)} are used to classify cued left-fingers and right-fingers movements. Two classifiers are employed in this experiment: minimum distance classifier (MDC) and normal density Bayes classifier (NDBC). The results show that the features from LMP has more discriminative power than band power ratio. They also show that NDBC has a perfect performance in this task.
Somnuk Phon-Amnuaisuk, Edwin Law Hui Hean, Ho Chin Kuan
Evolving Music Generation with SOM-Fitness Genetic Programming.
EvoWorkShops 2007: 557-566
Most real life applications have huge search spaces. Evolutionary Computation provides an advantage of parallel exploration of many parts of the search space. In this report, Genetic Programming is the technique we used to search for good melodic fragments. It is generally accepted that knowledge is a crucial factor to guide search. Here, we show that SOM can be used to facilitate the encoding of domain knowledge into the system. The SOM was trained with music of desired quality and was used as fitness functions. In this work, we are not interested in music with complex rules but with simple music employed in computer games. We argue that this technique provides a flexible and adaptive means to capture the domain knowledge in the system.
Somnuk Phon-Amnuaisuk, Alan Smaill, Geraint Wiggins
Chorale Harmonuzation: A view from search control perspective
Journal of New Music Research 35(4): 279-305
https://www.tandfonline.com/doi/abs/10.1080/09298210701458835
Chorale harmonization is one of the most popular problem domains for AI-music researchers. The problem has been approached with various techniques ranging from a knowledge intensive approach on one end to a data intensive approach on the other end. Various approaches offer different strengths and pose different weaknesses. In this report, we explain our knowledge intensive approach. Here, we view chorale harmonization from a search control perspective. In this perspective, the harmonization activities are discretely captured as states. These states form a state space, which cannot be exhaustively examined since it is intractable by nature. To overcome the intractability problem, we propose a careful knowledge engineering approach. The approach offers a useful language specialized for the chorale harmonization task. This language controls the search at the meta-level through its three primitives, namely: rules, tests and measures. The harmonization outputs obtained from this method are very promising. The approach also offers a very promising application in the AI-education area.
Keh-Siong Chee, Somnuk Phon-Amnuaisuk:
Intelligent Learning Environment: Building Hybrid System from Standard Application and Web Application.
ICALT 2005: 506-510
In this paper, we explore the idea of intelligent learning environment (ILE) by building a system for teaching and learning music in a Web-based environment. Our system is a framework that utilises existing Web architecture. Instead of using a standard Web browser as our client tool, we are going to develop our own. By following this, some design issues regarding the shifting and their solutions are addressed. In a nutshell, this environment allows us to gain more control in monitoring students' learning activities within a single environment while keeping instructors away from a bundle of computer codes.
Somnuk Phon-Amnuaisuk
Control Language for Harmonisation Process.
ICMAI 2002: 155-167
An approach to the automatic harmonisation of chorales is described. This involves using musical rules about this style of music, and controlling the use of these rules when searching for a harmonisation. To gain more control over the search process, control knowledge is explicitly represented, allowing a hierarchical decomposition of the harmonisation process. When the control knowledge is made explicit, the control structures can be modified with ease and flexibility. When the control is hierarchically structured, the effects of its application are clear. Our explicitly structured control model provides flexibility, and automatically generates natural harmonisations. In this paper, we present our control language and some harmonisation outputs to show the flexibility of the language.