3-Year Plan 2016-2018
These 3 years have been dedicated to the Knomee project.
Knomee is an iOS mobile app for "self-tracking with sense": Knomee's ambition is to make self-tracking "a fun journey to self-discovery". There are multiple challenges:
- Design & Usability challenges. Self-tracking is good for you, this is a fact proven by science, but it is boring and most people stop after a few attempts.
- Statistical and methodological challenges with a small amount of data that is self-collected hence submitted to biases.
- Machine Learning challenges to attempt to forecast from a very short time-series. Forecasting from time series is always challenging but twice more so when data comes from personal data (very often, random data with no structure) and from short time series (50 to 200 data points) because of the obvious risk of over-fitting.
This project has been so intense that I am writing this 3-year plan in 2018 as a summary. It was a big adventure (writing a mobile app is a big project) with many scientific challenges. As I write these lines, I can say that I have learned a lot, but I would not qualify Knomee a success. However, this is a long-run race ... and the app has been very useful to me personally. The other side-effect of the workload on Knomee is that I have dropped my previous book project about Enterprises, Complexity and Serious Gaming.
The research agenda for 2016-2016 has three parts: the science/methodology part, how to write an iOS app and how to perform robust machine learning on small time series.
1. Quantified Self and Causality
The first theme of my research is to better understand causality and what can be done/expected from self-tracking data.
The key reference for my research is "The Book of Why: the new science of cause and effect" by Judea Pearl. The central concept of Knomee is a quest, which is a set of one major (target) time series and up to three (factors) time series. A quest is a simple causality diagram (using Pearl concept), that is, it is a hypothesis that your target tracker can be explained (partially) with the factor trackers.
There are three possible situations with a quest:
- the data does not support your causality hypothesis. It may be true or false, but your data is not different from random noise. The first goal of Knomee is to detect this situation since many quests are not grounded. Let me restate this because it is critical: the first behavior that is expected from self-tracking data analysis is to recognize noise.
- There are patterns in your data, hence if your causality diagram is correct (your hypothesis) the correlation may be seen as "a consequence of causality". This is interesting to know but it should come with a warning.
- The patterns in the data are strong enough to be used for forecasting (at a weak level of confidence and precision). This is a further step since it may be seen as a confirmation that your causality diagram is correct. It it tightly linked with Granger causality.
The challenge here is that the self-tracking time series are very small, which means that statistical measures such as correlation are imprecise and that testing hypotheses based on sub-sampling is close to impossible.
Here are the other books that have influenced me:
- "Why: A Guide to Finding and Using Causes" by Samantha Kleinberg.
- "Positive Computing - Technology for Wellbeing and Human Potential" by Rafael Calvo and Dorian Peters.
- "Machine Learning for the Quantified Self" by Mark Hoogendoorn and Burkhardt Funk
2. How to develop an iOS mobile application
This was totally new for me: I have been developing software since 1984 on a desktop computer, but I had never touched the mobile phone personally, even though I have been involved with mobile app development at Bouygues Telecom and AXA for the past 10 years. This was actually the first reason for this adventure : to get a first-hand experience of mobile app development that would help me do my job better as a manager.
This is not a research project per se, it is more of an enabler for the two other parts which have a science/research component, especially the next part about machine learning. However, since software has always been a major topic of interest for me, I find that gathering this first-hand experience about mobile development was necessary for the years (and books) to come.
I have picked iOS as a mobile development platform for many reasons:
- Writing code with Swift : Swift is closer to CLAIRE than Java, and moving from CLAIRE to Swift was easier.
- iPhones are more popular with the intended market target for a self-tracking application.
- The UI libraries are more integrated and the Apple guidelines makes it easier for me (I am an "algorithm geek" with close to none UI experience)
- Xcode is a wonderfully integrated tool, which is very convenient for someone like me who has many lives and very little time for my "pet software development projects" however exciting they might be.
I learned about mobile app development from many sources:
- I attended a few MOOC, namely the Stanford one (CS 193A)
- I bough a couple of technical books - they were instrumental to my success :)
- I spent a lot of time on StackOverflow !
The key areas that I had to master:
- How to design a graphical UI with native features (as opposed to the XCode UI library). The most critical component is the horizontal/vertical slider that makes entering data more efficient.
- HealthKit integration. Knomee imports data from trackers (e.g. Apple Watch) or other sources of data that are integrated into HealthKit
- Geolocation, iCloud, and a few other Apple services.
3. Machine Learning with EMLA
This last part is the more "research" oriented one, and a follow-up on previous research themes from the last 15 years.
EMLA stands for Evolutionary Machine Learning Agents, and is a child of GTES that was developed the previous years.
There have been three steps with the development of ML for Knomee:
- 2016 was the year to develop a first version of EMLA, that inherited from old work done 15 years ago when trying to forecast TV audience, while also reusing the algebra framework for "Learning hybrid algorithms for vehicle routing problems".
- 2017 was spent improving the robustness of forecasting, as more data became available. It became clear that the 2016 approach was influenced too much by my OR background and suffered badly from overfitting. I introduced regularization and simplified the algebra a little bit. I moved away from the "genetic" evolutionary algorithm of 2016 (stationary and too complex) and implemented RIES (Randomized Incremental Evolutionary Search) as as simpler (and incremental) alternative.
- 2018 is the last piece of the opus. Because the application has been available on the app store and running well for a year, I have accumulated more data. I have further simplified the algebra, the randomized algorithm and added new forms of control for algebra terms generation ... which turned out to be a bad idea.
The cornerstone contribution was the development of ITP (Iterative Training Prototol) which is how I evaluate the various ML algorithm. This is nothing more than simulating a Knomee's user sequence of sessions : I feed the incremental algorithms with a sequence of measures and see how good their forecast is (based on the previous data from the past).
I have also implemented three classical algorithms to compare with the variants from EMLA: Linear Regression (using time features), k-means clustering, and ARMA (AutoRegressive Moving Average). The good news is that EMLA does better, the bad news (but not surprising) is that the forecasting performance (when supported by the data series) is weak.
My goal is to write a research paper this year to summarize these three years of experience with Quantified Self Machine Learning.