Stuart Reynolds, PhD


Machine Learning and AI Researcher

About

Currently I'm working as a Machine Learning Scientist for AiLive Inc. (Mountain View, CA) where our super-cool work putting machine learning to work into computer games drives and realizes the visions of some of the largest entertainment companies in the world. Our big products include:

  • the MotionPlus, which has delivered the motion controlled gaming vision to Nintendo. The MotionPlus turned into one of the biggest consumer electronics items of 2009, and led to some the fastest and bestselling game titles (ever, and on any platform);
  • LiveMove, which learns to recognize motions made with the Wii Remote Controller (or any motion sensitive device). LiveMove is the enabling technology for game developers working with motion controllers. As well as determining what type of motion is being performed, LiveMove can also track the position and orientation of the controllers, allows a players hand motions to mapped 1:1 to what goes on in-game; and
  • LiveAI, which learns how to imitate a game player's in-game by watching how players control them in-game. LiveAI is the most advanced and entertaining AI product anywhere.

LiveMove and LiveAI, both learn in real-time -- you can see results after a few seconds. Both work in-game (within the limitations of a games console) and as developer tools.

Projects

(2012) LiveMove powered Just Dance 2 (Wii) still at #6 in global game sales (all platforms), and #1 Wii title. Just Dance 3 is #28 (global), #7 (Wii). LiveMove enabled Just Dance Series: >28M units sold (http://www.vgchartz.com/gamedb/?name=just+dance). MotionPlus enabled Wii Sports Resort, ~30M units http://www.vgchartz.com/gamedb/?name=wii+sports+resort.

(2011) AiLive's motion technology powered 4 out of the top 20 games (all platforms) and in 6 out the top 11 of the Wii titles. http://www.vgchartz.com/yearly/2011/Global/

(2010) Taking Games Beyond Whack and Tilt  by Anupam Chakravorty, Rob Kay, Stuart Reynolds, Ian Wright. 

Gamasutra feature on applying motion control to games.

(2010) LiveMove 2 for PlayStation 3 Move Controller. Gamasutra news article.

(2008-2009) LiveMove 2 and MotionPlus. AiLive designed the hardware and software drivers for the Wii MotionPlus. LiveMove 2 provides 6D (position and orientation) tracking and superior gesture recognition of the Wii Remote when the MotionPlus is attached. Its the technology behind Wii Sports Resort (which outsold all other games on all other platforms during its launch), and Tiger Woods 10 (#1 in US, #8 worldwide.)

Motion Plus News:

(2007) LiveMove Pro, AiLive. LiveMove, better, faster, synchronizes player motion to game animation. LiveMove was recently used to drive the controls for We Cheer 1&2 -- probably the most advanced use of motion control on the Wii.


(2006) LiveMove, AiLive. LiveMove is gesture capture for games. LiveMove recognizes any arbitrary player motions made with a motion sensitive device (such as the Wii Remote). The system is configured by providing examples of the types of motions you want to recognize. An association from controller data to motion type is learned.

(2003-Current) LiveCombat/LiveAI, AiLive. LiveCombat is behavior capture for games. You play the game and it builds a statistical model of your play in real-time (no lag). Develop NPC players in seconds. Play to train sidekicks.

(2001-2002) Pensor, Mindlathe. AI software development kit for the games industry. API for behavior achitectures, steering, planning, path finding, scripting, multithreading. Support for anytime algorithms.

Publications

Patents

Various granted and pending.

Reviewed

  • Stuart I. Reynolds, Reinforcement Learning with Exploration. PhD Thesis, Technical Report, School of Computer Science, The University of Birmingham. December 2002. [pdf] [Abstract]
  • Stuart I. Reynolds, The Stability of General Discounted Reinforcement Learning with Linear Function Approximation, In Proceedings of the UK Workshop on Computational Intelligence (UKCI-02) Birmingham, UK, September 2002. Pages 139--146. ISBN: 0704423685. [pdf] [slides] ABSTRACT: This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the value-function or Q-function. The results are significant insofar as examples of divergence of the value-function exist where similar linear function approximators are trained using a similar incremental gradient descent rule. A different gradient descent error criterion is used to produce a training rule which has a non-expansion property and therefore cannot possibly diverge. This training rule is found to be commonly used for reinforcement learning.
  • Stuart I. Reynolds, The Curse of Optimism, Proceedings of the Fifth European Workshop on Reinforcement Learning, Utrecht, The Netherlands, October 5-6, 2001. [pdf]
  • Stuart I. Reynolds, Experience Stack Reinforcement Learning: An Online Forward lambda-Return Method, Proceedings of the Fifth European Workshop on Reinforcement Learning, Utrecht, The Netherlands, October 5-6, 2001. [pdf]
  • Stuart I. Reynolds, Optimistic Initial Q-values and the max Operator, In Proceedings of the UK Workshop on Computational Intelligence (UKCI-01) Edinburgh, UK, September 2001. [pdf] [slides] ABSTRACT: This paper provides a surprising new insight into the role of the max operator used by reinforcement learning algorithms to estimate the future return available to an agent. It is shown how optimistic Q-value estimates prevent learning updates from being effective at quickly minimising the error in the predicted available maximum future return. Experimental results show that, when the effect of optimism on the agent's exploration strategy is accounted for, learning generally proceeds more quickly if non-optimistic initial Q-values are provided. In existing work, optimistic Q-values are frequently used when agents need to manage a tradeoff between exploration and exploitation. This paper presents a simple way to avoid the learning problems this can cause.
  • Stuart I. Reynolds, Adaptive Representation Methods for Reinforcement Learning, Advances in Artificial Intelligence, (14th Biennial Conference of the Canadian Society for Computational Studies of Intelligence (AI-2001), Ottawa, Canada, June 2001, Proceedings). Lecture Notes in Artificial Intelligence Series (LNAI 2056), Springer Verlag. (pp. 345-348) [pdf]
  • Stuart I. Reynolds, Decision Boundary Partitioning: Variable Resolution Model-Free Reinforcement Learning, Proceedings of the 17th International Conference on Machine Learning, 2000, Morgan Kaufmann. (ICML-2000), pp. 783--790. [pdf]

    An earlier version of this paper is available as a technical report.
    ABSTRACT: This paper presents a method to refine the resolution of a continuous state Q-function. Q-functions serve as an estimate of return for model-free reinforcement learning agents and are modified as a result of an agent's interaction with the environment. Traditional (non-adaptive) methods of approximating this function are bound by the parameters and resources with which they are initially provided. To overcome these limitations, the method presented here starts with a coarse discrete representation of the Q-function and refines those areas which are most important for the purposes of decision making. 

  Technical Reports

  • Tim Kovacs and Stuart I. Reynolds, Population Based Reinforcement Learning. Technical Report CSTR-03-001, Department of Computer Science, University of Bristol, Bristol, UK. January 2003. [pdf] ABSTRACT: We propose novel ways of solving Reinforcement Learning tasks (that is, stochastic optimal control tasks) by hybridising Evolutionary Algorithms with methods based on value functions. We call our approach Population-Based Reinforcement Learning. The key idea, from Evolutionary Computation, is that parallel interacting search processes (in this case Reinforcement Learning or Dynamic Programming algorithms) can aid each other, and produce improved results in less time than the same number of search processes running independently. This is a new and general direction in RL research, and is complementary to other directions as it can be combined with them. We briefly compare our approach to related ones.
  • Stuart I. Reynolds and Marco A. Wiering, Fast Q(lambda) Revisited. Technical Report CSRP-02-02, School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK. May 2002. (18 pages) [pdf] [black and white pdf] ABSTRACT: Fast Q(lambda) is a model free reinforcement learning technique for precisely implementing Q(lambda) and other eligibility trace learning algorithms at a hugely reduced computational cost. This report highlights some subtleties in the original description of Fast Q(lambda) and that are likely to lead to it being incorrectly applied. We propose changes to Fast Q(lambda), without which the behaviour of the algorithm can be significantly different (and inferior) to Q(lambda). With these changes the algorithm behaves precisely as Q(lambda) to a very high degree of precision. We also report on an empirical validation of the algorithm and also provide an exploration insensitive version (an analogue of Watkins' Q(lambda)).
  • Stuart I. Reynolds, Experience Stack Reinforcement Learning for Off-Policy Control. Technical Report CSRP-02-1, School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK. January 2002. (37 pages) [pdf] [black and white pdf] ABSTRACT: This paper introduces a novel method for allowing backwards replay to be applied as an online learning algorithm. The general technique can be adapted to provide analogues of most existing eligibility trace algorithms. The new method remains as computationally cheap as current techniques but, as it directly employs lambda-return estimates in value updates, remains significantly simpler. The paper concentrates on multi-step off-policy control methods (such as Watkins' Q(lambda)) as a theoretically and practically important class of algorithms that are underutilised in practice. Experimental results show improvements upon existing eligibility trace methods across a wide range of parameter settings and also highlight the importance of the initial Q-function upon the performance of several reinforcement learning algorithms.
  • Stuart I. Reynolds, A Description of State Dynamics and Experiment Parameters for the Hoverbeam Task, School of Computer Science, The University of Birmingham, April 2000 (4 pages). (addendum to ICML-2000 paper) [pdf]
  • Stuart I. Reynolds, Decision Boundary Partitioning: Variable Resolution Model-Free Reinforcement Learning, Technical Report CSRP-99-15, School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK. July 1999. (A revised version of this paper appeared in ICML-2000/ICML-2k). [pdf]
  • Stuart Reynolds, A Spatial Skills Tool, Computer Science/Software Engineering BSc Dissertation, School of Computer Science, The University of Birmingham, April 1997 (125 Pages). ABSTRACT: This dissertation gives an account of the development of the "Exercise Workshop", a Spatial Skills Tools [for use in the diagnosis of Clinical Psychology patients. The initial aim of the project was to produce a computer tool to be used in schools to test and exercise spatial skills. The dissertation is exceptional not only in the fact that it deals with an account of a "real-world" problem, not an academic one, but also gives an account of the collaboration with potential real users of the system. Comments are made with repect to the user's effect upon the development of the system and how their feedback was used to creae and test the final product.

Talks, Professional Activities

Speaker:
  • GDC 09
  • NVISION 08
  • Nintendo Wii Day 08
  • EWRL-5
  • EWRL-4 
  • AI'2002
  • UKCI-02
  • UKCI-01
  • ICML-2K

Reviewer / Program Committee Member:

  • IEEE ADPRL Symposium (2007), ETRI Journal (2006), Machine Learning Journal (1999 2005), IJCAI (2003)

Conference Organizer:

  •  U.K. Workshop on Computational Intelligence, 2002 (UKCI-02)
Member:
  • International Game Developers Association (IDGA)
  • AI Game Programmer's Guild (AIGPG)

Contact Information

AiLive Inc.                                        
1200 Villa St
Mountain View, CA 94041
USA

View Stuart Reynolds's profile on LinkedIn

stu@stureynolds.com

Technologies

Machine learning, imitation learning, statistical modelling, artificial intelligence, reinforcement learning, gesture recognition, pattern recognition, computer games.

Busy On The Weekends...