There exists the nature of imitative in the widely deployed personally identifiable technology. In the case of fingerprint recognition, it can be imitated simply with silicone fingerprints. In the simplest form, a personal ID card is disabled through transfer. I focused on security aspects with the goal of personal identification.
I also wanted to make sure that the system could not be exploited intentionally. I had the experience of attending university when I was absent. This is the problem of a security system induced from convenience. I considered the aspect of convenience in the process of preventing intentional abuse. If an individual can get the benefit of something like highway on the highway, then there is no reason to abuse it.
The goal of this project is the personal identification through arm movement pattern while walking. Imagine you are in the transit gate at airport or book loan entrance. Identification is completed by just walking for a while. Without taking out the card.
Most people have smartphone. I will implement recognition of waving arms using everyone's daily android sensor data. The main goals are these.
1. Ensure security through personal identification that can not be imitated.
2. Ensure convenience through distinguishable individual identification.
3. Ensure security that can not be intentionally exploited.
I set the specific goal for the project. Identify anyone before walking the whole ten steps.
The input data is the big-data of the azimuth value and the acceleration value of the Android according to the arm motion.
After analyzing the motion pattern through machine learning, the weight value of each individual is obtained as a result, and finally the individual is identified.
The arm length difference due to the height difference, the acceleration difference due to the walking speed or the arm angle difference are factors that will make a difference which can be distinguished.
2.1.1 Transferring acceleration data
Android sends real-time acceleration values to Windows server via Bluetooth communication.
2.1.2 Get approval from the server
Waits for an approval signal from the server. Then stores the identification parameter received from the server.
2.2.1 Communicate with android using bluetooth
It configures communication channels with various Android.
2.2.2 Storing and managing data by device ID
Configure a profile to manage acceleration data for each device. Contrast with profile when connecting device.
2.2.3 Pass the parameters to the deep learning server
Pass the parameters to the deep learning server. Choose whether to create a profile or identify this device.
2.3.1 Build the learner
Data was processed with test set and test label to learn the matrix of acceleration values.
2.3.2 Saving and loading learned models
Store the learned model to save time and recall it when each device connected.
2.3.3 Calculate the loss
After receiving the walk data, the loss is calculated by comparing with the model.
3.1.1 Modeling singleton activities
Although activity context is changing dynamically, communication channel should be prevented. So I chose the way not using the constructor that sequentially calls 'onCreate()', 'onStart()', 'onResume()' defined by activity life cycle. Instead, I applied singleton design pattern that every activity is initialized only once then maintain its instance.
Profile information or state of acceptance are stored in main singleton activity. Then each activities take that in 'onResume' procedures.
3.2.1 Passing parameters to jupyter server
Windows Server creates a profile from the client and files it. Then the execution command is sent to the deep learning server and the parameters are transmitted.
3.3.1 RNN test sets labeling
This part was the core to deep learning. In order to learn by deep learning, data should be composed of sets and its corresponding label sets. I made the following learning data with the acceleration matrix created from android.
I set the unit step size that will be analyzed at unit time. After that I made the test sets using transpose matrix. Because label set implies predicting the next stride value, I concatenated one value after test set. That is, label set is the next proper value corresponding the test set.
Now test sets and label sets are built having number of columns equal to the size of stride unit.
Finally, label[n] is a matrix that forces the next value of the test[n] to be expected. The test and label matrices corresponding to each index are learned. The RNN learning is performed so as to minimize the difference for all the test values.
3.3.2 Saving and loading RNN learned models
The data model of the RNN is created for each of the X, Y, and Z axes, and each axes have three parameters. When epoch, hidden layer and learning rate are set, learning starts with propagation.
scheme of the model is as follows.
3.3.3 Calculating the loss
Finally, the loss between the model and actual walking are calculated. The predicted value for the loss calculation is obtained by bringing the walking data for the length of (walking unit - 1), and then concatenate the predicted value for the remaining length 1. After that, the loss was calculated by the difference between the actual value and the predicted value. The loss were summed then divided by the total range of acceleration, expressed as a percentage
In the first attempt, I used logarithm to reduce the percentile difference of accuracy. Using logarithm, there was almost no difference between users. Most of users showed 98.5% accuracy with each model. It was no use of discriminating user identifications.
Then I applied the square of the loss. After the squared losses are obtained and summed, I divided it by the range of acceleration observed. While this approach was much better to identify the users, it still did not extract the feature-specific data sufficiently.
The last method was to divide the sum of the squares of the loss to range of the minimum and the maximum values of current data rather than the range of the total acceleration. I could derive much more precise result. I experimented several times with three distinct people.
The following table compares the data of three persons with distinct arm movement characteristics when walking.
Joong-soo: In the case of joong-soo, the loss was the least. The greater the arm motion, the better the identification was. The difference from the slightest movements is greater and is more noticeably separated by the loss-squared approach.
Hyun-ryung: Hyun-ryung of the smallest arm movement showed a steady high recognition rate anywhere. The models with most values near the average showed high recognition rate in any model. Failed to distinguish!
Young-sik: I was able to recognize a medium degree with young-sik. As with joong-soo, there was the smallest loss with his model. It succeeded in distinguishing at medium degree.
As a result, the greater the arm motion, the higher the recognition rate. This shows that the loss-squared calculation method was successful. The larger the difference between the actual data and the expected data, the greater the squared value. People with big motions got satisfying results.
Moreover,I thought that the Android sensor value was not accurate enough to analyze fine data. Using the Android sensor, I felt like the value passing cycle of Android seemed to be non-periodic. When the acceleration value changes rapidly, the acceleration is sent more frequently, but when the acceleration change is small, the acceleration period is longer. Accuracy of the sensor should be constant and that a more precise sensor was needed in order to analyze the fine steps perfectly.
Analyzing stride requires a lot of data and learning time. I was sorry to have done the testing at the end of the schedule. If I have done a lot of time-consuming testing... It was a long time that builds one model and then another and file up their data. Deep learning is also very very important to find the right learning model . I have once again felt that it is also very very important to find the optimal value by learning for a long time.
The Android system is also applicable to smart watches and wearable devices. Precise motion analysis systems will process and manipulate big data for sophisticated customization. Using Tensorflow, and NLTK (Natural Language Toolkit) and etc, you will be able to create numerically sophisticated programs. This will require learning mathematical knowledge necessary for deep learning and data processing.
My goal is to develop a intelligent security solution. I would like to develop an intrusion detection system that will automatically block an attack of an existing pattern and learn to apply those patterns to the next attack. It can also be used for malicious code analysis. I think that the key is the machine learning, or deep running. Through this project, I would like to learn professional AI technologies and make it a stepping stone for future development.
- docker for windows download (+ other OS)
- http://pyrasis.com/Docker/Docker-HOWTO
- running jupyter server
- https://plot.ly/python/ipython-notebook-tutorial/
- RNN tutorial
- http://aikorea.org/blog/rnn-tutorial-1/
- install NLTK
- http://www.nltk.org/install.html
- install theano
- http://deeplearning.net/software/theano/install.html
- "모두를 위한 머신러닝/딥러닝 강의" | hunkim (kim sung hun)
- http://hunkim.github.io/ml/
Good lecture videos in Korean. The lecture notes are a lot of references to Stanford lecture slides.
I have done a good job of organizing from the beginning of machine learning.
- Stanford Computer Engineering Artificial Intelligence Instruction 'CS231n'
- http://cs231n.stanford.edu/syllabus.html
It is systematic and can learn as a Stanford curriculum, including slides and hands-out assignments.
I was so envy of the good lecture, in good university.
- RNN introduction | Original : colah's blog
- https://brunch.co.kr/@chris-song/9
- http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Good to build the concept of RNN(Recurrent Neural Network).
You can understand the basic knowledge such as ReLu without knowing the mathematical formula. The structure of LSTM among RNN is explained in detail and easy to understand.
- A site that beautifully visualizes the process of machine learning.
- http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
It is always beautiful to look down to the end.
- The utility of LSTM
- http://karpathy.github.io/2015/05/21/rnn-effectiveness/
This is original link. In addition to the RNN description, it shows the technological achievements, especially when using LSTM (Long Short-Term Memory). There are sentences derived from models that learned Shakespeare's writing. Visually show various applications. The original text was difficult, so I watched it over time and the contents are good.
- Speeding up with Theano & GPU
- http://www.wildml.com/2015/09/speeding-up-your-neural-network-with-theano-and-the-gpu/
Shows the theoretical performance such as the computation graph of Theano. I skipped the performance because that was not my problem. It is very detailed.