Pattern Recognition

Training in 38C

Introduction

This theme introduces pattern recognition of sport performance data.

Richard Duda, Peter Hart and David Stork (2001:1) define pattern recognition as "the act of taking in raw data and making an action based on the category of the pattern". They observe:

The ease with which we recognize a face, understand spoken words, read handwritten characters, identify our car keys in our pocket by feel, and decide whether an apple is ripe by its smell belies the astoundingly complex processes that underlie these acts of pattern recognition.

They add:

It is natural that we should seek to design and build machines that can recognize patterns ... it is clear that reliable accurate pattern recognition by machine would be immensely useful. (2001:1)

We introduce pattern recognition here in order to explore how insights from data science might help coaches and athletes to transform performance. We see this as an important step from the study of human real-time pattern recognition (see, for example, Nicholas Smeeton, Paul Ward and Mark Williams, 2004; Christopher Moore and Sean Muller, 2014) and of decision-making (Joe Causer and Paul Ford, 2014).

Vincent Granville (2017) provides an overview of some of the terminology used in the process of pattern recognition. These terms include 'data science', data mining, knowledge discovery in databases, and machine learning. Karl Broman (2013) reminds us:

If you're analyzing data, you're doing statistics. You can call it data science or informatics or analytics oe whatever, but it's still statistics.

Segundo Guzman and his colleagues (2016) discuss some of the neurophysiological mechanisms involved in pattern recognition and provide insights into the synaptic mechanisms of pattern completion. Esko Kilpi (2017) explores the social dimensions of pattern recognition in the context of 'emergent interaction'.

Data Science

Gil Press (2013) points out:

The term “Data Science” has emerged only recently to specifically designate a new profession that is expected to make sense of the vast stores of big data. But making sense of data has a long history and has been discussed by scientists, statisticians, librarians, computer scientists and others for years.

Gil provides a timeline that traces the evolution of the term "Data Science". This includes:

1947 John W. Tukey coined the term 'bit'.

1948 Claude Shannon A Mathematical Theory of Communications

1962 John W. Tukey The Future of Data Analysis

1968 IFIP Guide to Concepts and Terms in Data Processing

1974 Peter Naur Concise Survey of Computer Methods

1977 John W. Tukey Exploratory Data Analysis

1977 International Association for Statistical Computing

1989 1st Knowledge Discovery in Databases Workshop

1996 IFCS Conference Data Science, Classification, and Related Methods

1996 From Data Mining to Knowledge Discovery in Databases

1997 Jeff Wu Statistics = Data Science?

1997 Data Mining and Knowledge Discovery Journal

2001 William Cleveland Data Science: An Action Plan

Gil's review details 21st century development up to 2013.

D J Patil (2011) discussed building data science teams. He proposed these characteristics of a data scientist: technical expertise; curiosity; storytelling; and cleverness.

David Donoho (2015) considers a century of data science with a look back to 1962 and a look forward to 2065. He offers a vision of data science based on the activities of people who are ‘learning from data’, and describes an academic field dedicated to improving that activity in an evidence-based manner that is an academic enlargement of statistics and machine learning (2015:3).

Longbing Cao (2017) provides an overview of challenges and directions in data science from a complex-systems perspective.

Vincent Granville (2015a) shares this view of the precursors to data science:

Vincent Glanville (2015b) proposes that "a new category of data scientists emerging: data scientists with strong statistical knowledge, just we already have a category of data scientists with significant engineering experience (Hadoop)".

He adds:

what makes data scientists different from computer scientists is that they have a much stronger statistics background, especially in computational statistics, but sometimes also in experimental design, sampling, and Monte Carlo simulations.

Michael Hochster (2014) identified two types of data scientists, Type A and Type B.

Type A is an Analyst: "very similar to a statistician (and may be one) but knows all the practical details of working with data that aren't taught in the statistics curriculum: data cleaning, methods for dealing with very large data sets, visualization, deep knowledge of a particular domain, writing well about data, and so on". Type A "may be an expert in experimental design, forecasting, modeling, statistical inference, or other things typically taught in statistics departments".

Type B is a Builder. They share some statistical background with the Analyst "but they are also very strong coders and may be trained software engineers".

Alex Castrounis (2017) identifies four pillars of data science expertise:

  • Business domain
  • Statistics and probability
  • Computer science and software programming
  • Written and verbal communication

He proposes this definition of a data scientist:

a data scientist is a person who should be able to leverage existing data sources, and create new ones as needed in order to extract meaningful information and actionable insights. These insights can be used to drive business decisions and changes intended to achieve business goals.

Ryan Swanstrom (2017) suggests there are three stages "of a truly mature data science organization":

1. Dashboards

(Requires investment in data storage in a single location, extract transform load (ETL) tools, and reporting tools)

2. Machine Learning

(A focus on estimated causal outcome of potential events.)

3. Actions

(Uses the results generated in stage 2 to take 'appropriate' actions.)

David Taylor (2016) has discussed the visualisation of data science and points out "As a field full of data nerds with a penchant for visualization, it's also unsurprising that a lot of them use Venn diagrams".

David and Alex both point to Stephan Kolassa's (2015) visualisation as one of their favourite visualisations:

Stephen's post about his diagram includes the R code used to generate the ellipses.

Thomson Nguyen (2014) considers the differences between data scientists and data analysts in this video:

Gregory Piatesky (2013) connects data science and data mining and uses both terms interchangeably:

You can best learn data mining and data science by doing, so start analyzing data as soon as you can! However, don't forget to learn the theory, since you need a good statistical and machine learning foundation to understand what you are doing and to find real nuggets of value in the noise of Big Data.

Gregory provided an overview of the analytics industry in a presentation made in 2011.

Stephanie Hicks and Rafael Irizarry (2016) present a guide to teaching data science and share a case study of an introduction to data science that is organised around three themes: creating, connecting, computing.

Andrew Therriault (2017) presents a guide to data security. He cautions:

Everyone who creates, manages, analyzes, or even just has access to data is a potential point of failure in an organization’s security plan. So if you use data which is at all sensitive — that is, any data you wouldn’t freely give out to any random stranger on the internet — then it’s your responsibility to make sure that data is protected appropriately.

Shane Brennan (2017) notes ten fallacies of data science. He argues:

There exists a hidden gap between the more idealized view of the world given to data-science students and recent hires, and the issues they often face getting to grips with real-world data science problems in industry. All these new college courses in data analytics (they’re almost all newly-minted courses) aim at teaching students the basics of coding, statistics, data wrangling etc. However, the kind of challenges you’re expected to overcome in an actual data science job within industry are greatly under-represented.

George Krasadakis (2017) alerts us to the importance of data quality in an age of artificial intelligence.

Josh Devins (2017) shares an example of an enterprise discussion (at Soundcloud) about data science processes. See also, this 2016 discussion of data informed decision making at Soundcloud.

Jake Moody (2017) created an infographic to draw distinctions between data engineering and data science. Data engineer responsibilities on left column, data scientist on the right column.

Steph de Silva (2016a, 2016b) shared her insights into asking questions about data analysis. She shared these two infographics:

Source: 2016a

Source: 2016b

Data Mining

Tom Mitchell (1999:1) describes data mining as the use of historical data "to discover regularities and improve future decisions". This discovery of patterns in data requires:

  • A database
  • Data formatting and cleansing
  • Data visualisation and summary
  • Machine learning algorithms
  • Human expert domain knowledge

Ian Witten and Eibe Frank (2005: xxiii) define data mining as "the extraction of implicit, previously unknown, and potentially useful information from data". Data mining is "the process of discovering patterns in data" that uses automatic or semiautomatic" processes.

Machine Learning

In her introduction to machine learning, Omoju Miller (2017) observes:

At its core, machine learning is not a new concept. The term was coined in 1959 by Arthur Samuel, a computer scientist at IBM, and it’s been widely used in software since the 1980s.

In his 1959 paper, Arthur notes:

We have at our command computers with adequate data-handling ability and with sufficient computational speed to make use of machine-learning techniques, but our knowledge of the basic principles of these techniques is still rudimentary. Lacking such knowledge, it is necessary to specify methods of problem solution in minute and exact detail, a time-consuming and costly procedure. Programming computers to learn from experience should eventually eliminate the need for much of this detailed programming effort. (1959:71)

(Note that the first operating checker program for the IBM 701 was written in 1952, recoded in 1954, completed in 1955 and demonstrated on television in February 1956.)

Arthur provided an update of his work in a 1967 paper.

Tom Mitchell (2006) points out:

Over the past 50 years the study of Machine Learning has grown from the efforts of a handful of computer engineers exploring whether computers could learn to play games, and a field of Statistics that largely ignored computational considerations, to a broad discipline that has produced fundamental statistical-computational theories of learning processes, has designed learning algorithms that are routinely used in commercial systems for speech recognition, computer vision, and a variety of other tasks, and has spun off an industry in data mining to discover hidden regularities in the growing volumes of online data.

He proposes that the discipline of machine learning ("a natural outgrowth of the intersection of Computer Science and statistics") seeks to answer the question 'How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?'

He adds:

we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc. (2006:1) (Original emphasis)

Stephen Marsland (2015:4) proposes that machine learning "is about making computers modify or adapt their actions ... so that these actions get more accurate, where accuracy is measured by how well the chosen actions reflect the correct ones". (Original emphasis)

Alex Castrounis (2016) says of machine learning:

Machine learning is a subfield of computer science, but is often also referred to as predictive analytics, or predictive modeling. Its goal and usage is to build new and/or leverage existing algorithms to learn from data, in order to build generalizable models that give accurate predictions, or to find patterns, particularly with new and unseen similar data. (Original emphasis)

The exploration of machine learning gives us an opportunity to think about the intersection of informatics and analytics (Lasse Holstrom and Petri Koistinen, 2010; Ethem Appaydin, 2011).

Wikipedia has a comprehensive machine learning portal. Raul Garreta (2015) has provided a gentle introduction to machine learning that presents "some initial concepts to invite the reader to continue investigating". R2D3 have a visual introduction to machine learning. Ophir Tanz and Cambron Carter (2017) have provided a conversational introduction to machine learning.

Daniel Tunkelang (2017) shares a list of ten things everyone should know about machine learning. His list:

  • Machine learning means learning from data
  • Machine learning is about data and algorithms, but mostly data
  • Unless you have a lot of data, you should stick to simple models
  • Machine learning can only be as good as the data you use to train it
  • Machine learning only works if your training data is representative
  • Most of the hard work for machine learning is data transformation
  • Deep learning is a revolutionary advance, but it isn’t a magic bullet
  • Machine learning systems are highly vulnerable to operator error
  • Machine learning can inadvertently create a self-fulfilling prophecy
  • AI is not going to become self-aware, rise up, and destroy humanity

You might find this glossary of machine learning terms helpful as you explore the machine learning literature.

For a discussion of Machine Learning with R and Python, see Tinniam Ganesh (2017a, 2017b).

Aliva Smith's (2017) visualisation of the knowledge discovery process.

For an introduction to decision trees as a machine learning algorithm that can be used for classification or regression, see Mohit Deshpande (2017).

Decision trees are one of the ten machine learning algorithms discussed by Sidath Asiri (2017).

Artificial Intelligence

In 1955, a proposal for funding was made to host "a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College". There is a Wikipedia description of the workshop. Daniel Cervier (1993:49) has argued that "the conference is generally recognized as the official birthdate of the new science". (For more background information about the emergence of artificial intelligence as a field of study in the 1950s, see the National Research Council's (1999) account of government support for computing research.

John McCarthy and Patrick Hayes (1969:2) observe:

The idea of an intelligent machine is old, but serious work on the artificial intelligence problem or even serious understanding of what the problem is awaited the stored program computer. We may regard the subject of artificial intelligence as beginning with Turing’s article Computing Machinery and Intelligence (Turing, 1950) and with Shannon’s (1950) discussion of how a machine might be programmed to play chess.

John and Patrick distinguish between the epistemological and heuristic aspects of artificial intelligence. They propose:

an entity is intelligent if it has an adequate model of the world (including the intellectual world of mathematics, understanding of its own goals and other mental processes), if it is clever enough to answer a wide variety of questions on the basis of this model, if it can get additional information from the external world when required, and can perform such tasks in the external world as its goals demand and its physical abilities permit. (1969:4)

Joseph Licklider (1960) was one of the pioneers of interactive computing. In his discussion of man-computer symbiosis, he observes:

it seems worthwhile to avoid argument with (other) enthusiasts for artificial intelligence by conceding dominance in the distant future of cerebration to machines alone. There will nevertheless be a fairly long interim during which the main intellectual advances will be made by men and computers working together in intimate association. (1960:5)

Terry Winograd & Fernando Flores published Understanding computers and cognition: A new foundation for design in 1986. They take an explicit philosophical approach to the design of computer technology and suggest "theories about the nature of biological existence, about language and about the nature of human action have a profound influence on the shape of what we build and how we use it" (1986:xii).

Stuart Russell and Peter Norvig (1995:3) suggest that the field of artificial intelligence (AI) attempts to understand intelligent entities. They add "but unlike philosophy and psychology, which are also concerned with intelligence, AI strives to build intelligent entities as well as understand them". They have four categories of definition of artificial intelligence ((1995:5):

Jerry Kaplan (2016:5) suggests that the essence of artificial intelligence is "the ability to make appropriate generalizations in a timely fashion based on limited data".

Stefan van Duin and Naser Bakhshi (2017) visualise artificial intelligence thus:

In their model, "the concept of intelligence refers to some kind of ability to plan, reason and learn, sense and build some kind of perception of knowledge and communicate in natural language".

Raksham Pandey (2017) has visualised developments in artificial intelligence:

David Silver and his colleagues (2017) have published a paper on mastering the game of Go without human knowledge. They report:

Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

Algorithms

Navneet Alang (2016) observed:

Every age has its organizing principles. The nineteenth century had the novel, and the twentieth had TV; in our more modern times, they come and go more quickly than ever—on Web 1.0 it was the website, for example, and a few years later, for 2.0, it was the app. And now, another shift is underway: Today’s organizing principle is the algorithm.

There is debate about the definition of what constitutes an 'algorithm'. Mosha Yardi (2012:5), for example, observes "the fact hat we have an intuitive notion of what an algorithm is does not mean that we have a formal notion". He suggests that problems of definition relate to an algorithmic duality that "seems to be a fundamental principle of computer science":

An algorithm is both an abstract state machine and a recursor, and neither view by itself fully describes what an algorithm is. (2012:5)

Robin Hill (2016) worked through computer science and philosophical literature to explore this duality and proposed that:

An algorithm is a finite, abstract, effective, compound control structure, imperatively given, accomplishing a given purpose under given provisions.

In very basic terms, "the algorithm is the thing that programs implement, the thing that gets data processing and other computation done" (Hill, 2016). The Association for Computing Machinery (2017) define an algorithm as "a self-contained step-by-step set of operations that computers and other 'smart' devices carry out to perform calculation, data processing, and automated reasoning tasks".

Vaidehi Joshi (2017) takes a more direct view in her definition:

an algorithm is a really fancy name with a bad rap. They’re not nearly as scary as they sound. An algorithm is just fancy term for a set of instructions of what a program should do, and how it should do it. In other words: it’s nothing more than a manual for your code. (Original emphases.)

In this course, we propose to: explore some of the issues that arise in the age of the algorithm (Lee Raine & Janna Anderson, 2017); consider the emergence of performance monitoring tools and the algorithms that are used (including discussions about 'algorithmic skin'); and reflect on some of the ethical issues that arise from black box acceptance of algorithms. In doing so, our aim is to contribute to conversations about 'algorithmic transparency' (Association for Computing Machinery, 2017).

For a detailed discussion of algorithms see Jason Brownlee's (2013) tour of machine learning algorithms. For a discussion of machine learning in R see Jason Brownlee (2014).

Mark van Rijmenam (2017), amongst others, has drawn attention to ethical issues surrounding the use of algorithms. He notes that algorithms have two major flaws. They are:

  • Extremely literal; they pursue their (ultimate) goal literally and do exactly what is told while ignoring any other, important, consideration.
  • Black boxes; whatever happens inside an algorithm is only known to the organisation that uses it, and quite often not even.

Mark argues for a transparent approach to the use of algorithms that Michael van Lent (2004) defined as 'explainable artificial intelligence' (XAI). Michael notes Edward Shortliffe and his colleagues' (1975) exposition of how a program can "explain its recommendations when queried". More recently, Pat Langley, Ben Meadows, Mohan Sridharan & Dongkyu Choi (2017) have discussed the importance of 'explainable agency'. They argue "we must take seriously the need to communicate the reasons for agents’ decisions to human partners".

Patterns

This topic explores two specific aspects pattern discovery and pattern recognition (Hand, 2004) in sport:

1. The systematic observation of real-time and lapsed-time behaviour.

2. The use of supervised machine learning techniques to analyse data.

We believe that the disciplined observation of performance can lead to careful consideration of how computers can enrich our understanding of that performance through the generation of 'interesting' insights.

An example of such work is Gilbert Kotzbek's use of de-identified geographic information system (GIS) data in football game analysis. In a 2015 paper, Gilbert, and his PhD supervisor Wolfgang Kainz, describe their approach to GIS in detail. This short video is an example of their system output.

A 2016 paper described their use of these data to analyse scoring attempts in football. David Sumpter (2017b) has discussed the significance of this GIS work for the future of football analytics.

David Sumper (2017a) draws attention to other uses of player tracking data to map the geometry of football formations. These include Voronoi diagrams and Delauney triangulations. An example is the work of Jaime Sampaoi and his colleagues at the CreativeLab in Vila Real, Portugal.

Another example of this pattern recognition is presented in this Disney Research Hub (2017) video:

Artificial Neural Networks

James Anderson (1995) provides an introduction to neurocomputing and neural network algorithms. James Anderson & Edward Rosenfeld (1998) present further background to artificial neural networks in seventeen interviews with scientists involved in the theory and practice of neural networks.

The quest to recognise patterns in data includes the use of artificial neural network methods. Brian Ripley (1996:2) notes that "artificial neural networks have been developed by a community which was originally biologically motivated". He regarded a 'neural network' as a method "which arose or was popularized by the neural network community and has been or could be used for pattern recognition" (1996:2).

Imad Basheer & M. Hajmeer (2000:3) define these networks as "structures comprised of densely inter-connected adaptive simple processing element (called artificial neurons or nodes) that are capable of performing massively parallel computations for data processing and knowledge representation".

Cristian Randieri (2017) says of neural networks:

An Artificial Neural Networks is a simplified mathematical model of the biological ones. Similarly, it use nodes rather than neurons but building the same sorts of complex interconnections between them (synapses). Rather than storing all data in a huge pool to be analyzed as a whole, neural networks are able to memorize and so remember associations between concepts, streamlining the process of retrieval and analysis.

This allows computer scientists to make algorithms for “deep learning,” which arranges ideas as layers of definitions. Small concepts collectively define larger ones, which define larger ones, and so on. With enough input information, a sufficiently detailed neural network can learn quite deeply indeed.

Cristian shares this visualisation of a neural network.

Natolie Wochover (2017) has provided an overview of developments in deep neural networks. Amongst others she discusses in detail the work of Naftali Tishby and his account of deep learning and information bottlenecks.

Emil Wallner (2017) presents six snippets of code that made deep learning what it is today.

Roger Bartlett (2006) reports the use of artificial neural networks in sport. Mark Pfeiffer and Andreas Hohmann (2012) consider their use in training science and provide examples from swimming and handball. Ivars Namatevs, Ludmila Aleksejeva & Inese Polaka (2016) explore the application of neural network modelling to sports performance classification.

Jürgen Perl has written extensively about the use of artificial neural networks in sport. With Peter Dauscher (2006) he provided a review of dynamic pattern recognition in sport. Pedro Passos and his colleagues (2006) have extended this discussion.

An introductory example of the use of neural networks in sport can be found in Antonio Silva and his colleagues' (2007) account of modelling swimming performance. More recent examples include: cricket team selection (Subramanian Iyer & Ramesh Shard, 2009); weight training (Hristo Novatchkov & Arnold Baca, 2013); 400m hurdle performance (Krzysztof Przednowek et al., 2014); basketball (Matthias Kempe, Andreas Grunz & Daniel Memmert, 2015); deep convolutional neural networks (Martin Wagennar, 2016); and football (Daniel Memmert, Koen Lemmick & Jaime Sampaio, 2017).

This is Brandon Rohrer's introduction to neural networks:

Daniel Holden (2017) provides a helpful guide to troubleshooting when neural networks are not working.

Grant Sanderson (2017a) presents this introduction to neural networks and works through an example:

Alarije (2017) suggests some important points while designing a neural network are:

  • Input Data: must be composed of a representative number of data, with sufficiently different information, to avoid over-optimisation (overfitting). It is common to use 70% of the data to train the neuron, 20% to test the result and 10% as validation outside the sample.
  • Control the number of neurons and levels: too few, and the process becomes general. If they are too many, there will be too much data adjustment.
  • Chosen Functions: start with the simplest function and then complicate it according to the observations and further requirements.

Andrej Karpathy (2017) proposes "Neural networks are not just another classifier, they represent the beginning of a fundamental shift in how we write software. They are Software 2.0". He adds:

A large portion of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feeds neural networks.

Ben Gorman (2017a) provides and introduction to neural networks and a worked example (2017b).

Hackathons

Gerard Briscoe and Catherine Mulligan (2014) note that a hackathon is "an event in which computer programmers and others involved in software development, including interface designers, graphic designers and project managers, collaborate intensively over a short period of time". The occurrence of hackathons from their origins in the 1990s has had "a significant impact of the culture of digital innovation" (Briscoe & Mulligan, 2014).

The NBA hosted its first basketball analytics hackathon in September 2016. 210 students responded to the four prompts shared on the day of the hackathon:

  • Develop a new method or tool for evaluation of defensive performance in the NBA.
  • Develop a new method or tool for evaluation of the effectiveness of timeouts as an offensive or defensive strategy.
  • Build a too and / or model to predict the outcome of shots attempted.
  • Open topic: participants are allowed to pursue a creative, original topic (approved by the NBA League Office).

The 2017 NBA hackathon added a second stream, business analytics, to its challenge to participants. The rules for the event included these provisions:

  • Code used during the hackathon must be written during the designated contest period.
  • All software used by participants must be publicly and widely available.
  • Submissions become the property of the NBA.
  • You agree not to publicly disclose any submission without the prior consent of the NBA.

In February 2017, the Western Bulldogs Australian Rules Football Club hosted a hackathon in Ballarat Library, Victoria. One of the three challenges for the hackathon was the analysis of player tracking data and tactical behaviours. The hackathon provided access to data that had not been in the public domain previously. Use of some of the data provided required a non-disclosure agreement.

The announcement of the event included this introduction:

The Western Bulldogs in partnership with City of Ballarat look forward to welcoming participants of the inaugural Western Bulldogs Ballarat Hackathon. The event, which will run from the 24th to the 26th of February at the Ballarat Library, represents one of a number of initiatives implemented by the Bulldogs in ensuring they remain at the forefront of innovation in the quest for sustained success following the club’s 2016 AFL Premiership triumph. Teams at the event will have the opportunity to work on three challenges; one of these will focus on football performance & sport science.

Sam Robertson, Head of Research and Innovation at the club, observed:

The club sees this as a great opportunity to invite some of the best and brightest sports analysts around the country to address a few of the current key challenges faced by high performance sporting organisations. I am confident that the challenges presented at the Hackathon will produce some fantastic solutions from participating teams as well as provide a useful platform for individuals to showcase their work in front of a professional sporting club.

This hackathon was an excellent example of what (Briscoe & Mulligan, 2014) describe as a focus-centric, applied hackathon. Such hackathons "target software development to address or contribute to a social issue or a business objective".

ESPN hosted a hackathon at the 2017 Sloan Sports Analytics Conference. This was the third time the event had taken place at a Sloan conference. (News of the hackathons in 2015 and 2016.) The theme of the 2017 event was:

Sports analytics is often criticized for ignoring intangibles such as chemistry, leadership, heart, and instinct. Participants in the Hackathon will be asked to start pushing back on that by clearly defining and then measuring an aspect of on court performance that was previously talked about as an intangible attribute. Participants will utilize basketball player tracking data to facilitate their measurements and will be judged on completeness of their definition, measurement approach, and results of their process.

FC Nordsjaælland hosted a two-day Tracking Data hackathon in March 2017. "Allocated into teams, attendees will look to combine tracking data and event data to create insights aimed to improve post-match analysis and preparation for the next game". At their hackathon "attendees will be provided with the data from FC Nordsjaælland and Brøndby IF's 3 most recent games". These data include: Ball Events; Ball Tracking; and Player Tracking. Each attendee at the hackathon "will sign an Non Disclosure Agreement to prohibit the distribution of the datasets to non-attendees". Mladen Sormaz and Dan Nichol provide an example of the work produced at this event.

Source: Joe Mulberry (Twitter)

Students in the Sports Analytics Club at Simon Fraser University organised a hackathon in July 2017. Their hackathon description was:

Participants will compete in teams of 5 beginning on July 8th and finishing with a 5 minute presentation of their work on July 9th. The structure of the data sets will be published early to allow teams to have time to understand the data sets. We hope that this structure will allow teams to produce a high quality product rather than creating something rushed in a 24 hour period. Hockey, Soccer, and Basketball data sets will be provided. More specific details about the data will be coming soon. All Intellectual Property from the hackathon will be property of the data providers.

The hackathon was hosted at Simon Fraser University’s Harbour Centre campus in Vancouver.

In August 2017, the STATS company made available a basketball dataset which contained the x,y location of the players and ball. The sharing of these data coincided with the publication of a paper that explored trajectories using deep hierarchical networks. The company released football data too. One dataset contained the player positions at ten frames per second in addition to ball events. A second dataset focussed on goal-scoring events which enabled users to develop their own expected goal value model (xG).

MySwimPro held their third annual hackathon in December 2017. Each participant prepared a design challenge statement modelled on design thinking methodology.

The fourth Hackathon at the 2018 MIT Sloan Sports Analytics Conference will take place on 22 February. Participants "will be given a choice of four prompts to work from to utilize the complex NBA player tracking data to tell a novel story and answer a question" that could not previously be answered from a quantitative perspective.

In January 2018, Tennis Australia's Game Insights Group announced "a world first Tennis Hackathon" titled 'From AO to AI'. Tennis Australia partnered with CrowdANALTIX to host a hackathon throughout the Australian Open 2018. The Game Insights Group provided 10,000 points of Grand Slam tennis tracking data.

Algorun 18 hackathon was hosted by the Bogazici University’s Computer Club in April 2018. Participants were provided with NBA data.

Recommended Reading

Aman Agarwal (2017). How DeepMind taught AI to play video games.

Wale Akinfaderin (2017). The mathematics of machine learning.

Talal Alsubaie (2008). Pattern Recognition.

American Statistician (2015). Special Issue on Statistics and the Undergraduate Curriculum. American Statistician, 69(4), 259-424.

Mara Averick (2017). Beyond basic bracketology: a March-Madness deep dive.

Gianluca Baio & Marta Blangiardo (2010). Bayesian hierarchical model for the prediction of football results.

Bryan Berend (2017). A magical introduction to classification algorithms.

Melissa Bierly (2016). 10 useful Python visualization libraries for any discipline.

Heidi Blake & John Templon (2016). The Tennis Racket. [Blog post.] (Github Python code.)

Joel Bock (2017). Empirical prediction of turnovers in NFL football. Sports, 5(1), 1.

Andrew Borrie, Gundberg Jonsson & Magnus Magnusson (2002). Temporal pattern analysis and its applicability in sport: an explanation and exemplar data. Journal of Sports Sciences, 20(10), 845-852.

Leo Breiman (2001). Statistical modeling: the two cultures.

Nicholas Carr (2017). A brutal intelligence: AI, chess, and the human mind.

Marti Casals & Caroline Finch (2016). Sports Biostatistician: a critical member of all sports science and medicine teams for injury prevention. Injury Prevention.

Maurizio Casarrubea et al (2015). T-pattern analysis for the study of temporal structure of animal and human behaviour. Journal of Neuroscience Methods, 239, 34-46.

Paolo Cintia, Michele Coscia & Luca Pappalardo (2016). The Haka Network: Evaluating Rugby Team Performance with Dynamic Graph Analysis. IEEE/ACM ASONAM, San Francisco, August.

Alex Castrounis (2016). Machine Learning: An In-Depth, Non-Technical Guide.

Lars-Erik Cederman & Nils Weidemann (2017). Predicting armed conflict: Time to adjust our expectations? Science 355(6324), 474-476.

Thomas Cormen, Charles Leirson, Ronald Rivest & Clifford Stein (2009). Introduction to Algorithms. Cambridge, MA: MIT Press.

Ami Drory, Gao Zhu, Hongdong Li & Richard Hartley (2017). Automated detection and tracking of slalom paddlers from broadcast image sequences using cascade classifiers and discriminative correlation filters. Computer Vision and Image Understanding, 159: 116-127.

Hubert Dreyfus (1963). What Computers Can't Do. New York: Harper Row.

Hubert Dreyfus (1991). What Computers Still Can't Do. Cambridge, MA: The MIT Press.

Richard Duda, Peter Hart & David Stork (2001). Pattern Classification (Second Edition). New York: Wiley.

Martin Eastwood (2017). Analysing footballers' decisions in and around the penalty box.

Zyad Enam (2016). Why is machine learning hard?

Panna Felsen & Patrick Lucey (2017). 'Body shots': analyzing shooting styles in the NBA using body pose.

Lawrence Fisher (2017). Siri, Who is Terry Winograd?

Alexander Franks, Alexander D'Amour, Daniel Cervone & Luke Bornn (2016). Meta-Analytics: Tools for Understanding the Statistical Properties of Sports Metrics.

Ronald Gallimore (2004). What a coach can teach a teacher, 1975-2004: Reflections and reanalysis of John Wooden's teaching practices. The Sport Psychologist, 18, 119-137.

Nicolas Gakrelidz (2017). Predicting London Crime Rates Using Machine Learning.

Adam Geitgey (2014a). Machine Learning is Fun (Part 1).

Adam Geitgey (2016a). Machine Learning is Fun (Part 2).

Adam Geitgey (2016b). Machine Learning is Fun (Part 3).

Garry Gelade (2016). An Identikit for Shot Selection.

Peter Gleeson (2017). How machines make sense of big data: an introduction to clustering algorithms.

Segundo Guzman, Alois Schlogl, Michael Frotscher & Peter Jonas (2016). Synaptic mechanisms of pattern completion in the hippocampal CA3 network. Science, 353(6304), 1117-1123.

Garrett Grolemund & Hadley Wickham (2016). R for Data Science.

Thomas Grund (2012). Network structure and team performance: The case of English Premier League soccer teams. Social Networks, 34, 682-690.

Joachim Gudmundsson & Michael Horton (2016). Spatio-Temporal Analysis of Team Sports - A Survey.

David Hand (2004). Pattern recognition. Journal of Applied Statistics, 31(8), 883–884.

Jake Hofman, Amit Sharma & Duncan Watts (2017). Prediction and explanation in social systems. Science 355(6324), 486-488.

Vaihedi Joshi (2017). Sorting Out The Basics Behind Sorting Algorithms.

Benjamin Kadoch, Wouter Bos & Kai Schneider (2017). Directional change of fluid particles in two-dimensional turbulence and of football players. Physical Review Fluids.

KD Nuggets: Data Mining, Analytics, Big Data and Tata Science.

Ujjwal Karn (2017). Machine Learning Tutorials.

Swati Kashyap (2016). 30 Top videos, tutorials and courses o machine learning and artificial intelligence from 2016.

George Kassabgi (2017). Deep learning in 7 lines of code.

Adam Kelleher (2016). Causal Data Science.

Robert Kelley (2017). Machine Learning Explained: Algorithms Are Your Friend.

Dilan Kiley et al. (2016). The game story space of professional sports: Australian Rules Football.

Esko Kilpi (2017). The Essential Skill of Pattern Recognition.

James Kirkpatrick et al. (2017a). Overcoming catastrophic forgetting in neural networks.

James Kirkpatrick et al. (2017b). Enabling Continual Learning in Neural Networks.

Will Knight (2016). Google's AI masters the game of Go a decade earlier than expected. MIT Technology Review. (See Gary Marcus's (2016) response.)

Stephanie Kovalchik (2016). Charting Serve Locations.

Hoang Le, Peter Carr, Yisong Yue & Patrick Lucey (2017). Data-driven ghosting using deep imitation learning.

Hoang Le, Yisong Yue, Peter Carr & Patrick Lucey (2017). Coordinated multi-agent imitation learning.

Richard Lewis (2015). The Pomelo Problem.

Scott Locklin (2016). Predicting with confidence: the best machine learning idea you never heard of.

Noah Lorang (2016b). Practical skills that practical data scientists need.

Noah Lorang (2016a). Data scientists mostly just do arithmetic and that's a good thing.

Keith Lyons (2016). R Resources.

Machine Learning and Data Mining for Sports Analytics (2016). Proceedings ECML/PKDD Workshop, September.

Mehrtash Manafifard, Hamid Ebadi & Abrishami Moghaddam (2017). A survey on player tracking in soccer videos. Computer Vision and Image Understanding, 159: 19-46.

Kevin Markham (2014). In-depth introduction to machine learning.

Stephen Marsland (2015). Machine learning: an algorithmic perspective. Boca Raton: CRC Press.

MathWorks (nd). Supervised Learning Workflows and Algorithms.

MathWorks (nd). Statistics and Machine Learning Toolbox.

Luis Martins (2011). Introduction to Pattern Recognition.

Alan McCall, Maurizio Fanchini & Aaron Coutts (2017). Prediction: the modern day sports science/medicine 'quest for the Holy Grail'. International Journal of Sports Physiology and Performance.

Nazanin Merasa, Yatao Zhong, Frederick Tung, Luke Bornn & Greg Mori (2017). Learning Person Trajectory Representations for Team Activity Analysis. arXiv:1706.00893.

Daniel Memmert & Jurgen Perl (2009). Game Creativity Analysis Using Neural Networks. Journal of Sports Science, 27(2), 139–149.

Bill Mills (2016). Writing Data: an introduction to choosing & using data formats.

Tom Mitchell (2006). The Discipline of Machine Learning.

Tom Mitchell (1999). Machine Learning and Data Mining.

Gareth Morgan, Bob Muir & Andy Abraham (2014). Systematic observation. In Lee Nelson, Ryan Groom & Paul Potrac (Eds.), Research Methods in Sports Coaching. Abingdon: Routledge.

Nafrondel (2017). Artificial Intelligence.

Ivan Namatevs, Ludmila Aleksejeva & Inese Polak (2016). Neural network modelling for sports performance classification as a complex socio-technical system.

Michael Nielsen (2016). Neural Networks and Deep Learning.

NNS (2016). Bayesian statistics explained to beginners.

Bahadorreeza Ofoghi, John Zeleznikow, Clare MacMahon & Markus Raab (2013). Data mining in elite sports: a review and framework. Measurement in Physical Education and Exercise Science, 17(3), 171-186.

Tony Ojeda (2017). Data exploration with Python.

Cathy O'Neil (2016). Weapons of Math Destruction. London: Allen Lane.

Muneaki Ohshima, Ning Zhong, Y Yao & Shinichi Murata (2004). Peculiarity oriented analysis in multi-people tracking images. In PAKDD, 508-518.

George Papadourakis (nd). Introduction to Neural Networks. (Accessed online 10 February 2015.)

Luca Pappalardo & Paolo Cintia (2017). Quantifying the relation between performance and success in soccer.

Jurgen Perl & Daniel Memmert (2012). Editorial: network approaches in complex environments. Human Movement Science, 31(2), 267–270.

Mark Pesce (2017). Disruptive machine learning.

Petbugs (2017). Applying CUSUM to hockey prediction models.

Mark Pfeiffer & Andreas Hohman (2012). Applications of Neural Networks in Training Science. Human Movement Science, 31(2), 344–359.

Du Phan (2017). On decision and confidence.

Paul Power, Hector Ruiz, Xinyu Wei & Patrick Lucey (2017). Not all passes are created equal: objectively measuring the risk and reward of passes in soccer from tracking data.

r2d3.us (2015a). A Visual introduction to Machine Learning: Part One.

Raghu Ramakrishnan & Bee-Chung Chen (2007). Exploratory mining in cube space.

Robert Rein, Dominik Raabe & Daniel Memmert (2017). 'Which pass is better?' Novel approaches to assessing passing effectiveness in elite soccer.

Robert Rein & Daniel Memmert (2016). Big data and tactical analysis in elite soccer: future challenges and opportunities for sports science.

Vito Reno et al (2017). A technology platform for automatic high-level tennis game analysis. Computer Vision and Image Understanding, 159: 164-175.

Brian Ripley (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.

Brando Rohrer (2016). How Bayesian Inference Works.

Alessio Rossi et al. (2017). Effective injury prediction in professional soccer

with GPS data and machine learning.

Hector Ruiz, Paul Power, Xinyu Wei & Patrick Lacey (2017). “The Leicester City Fairytale?”: Utilizing New Soccer Analytics Tools to Compare Performance in the 15/16 & 16/17 EPL Seasons.

Arthur Samuel (1953). Computing Bit by Bit or Digital Computers Made Easy. Proceedings of the IRE, 41(10), 1223-1230.

Arthur Samuel (1959). Some studies in machine learning using the game of checkers. IBM Journal of research and development, 3(3), 210-229.

Sharon Sazia (2016). Use it or lose it: the search for enlightenment in dark data.

Sharp Sights Lab (2015). How to start learning data science.

Antonio Silva, Aldo Costa, Paulo Oliveira, Victor Reis, Jose Saavedra, Jurgen Perl,

Abel Rouboa & Daniel Marinho (2007). The Use of Neural Network Technology to Model Swimming Performance. Journal of Sports Science & Medicine 6(1), 117–125.

David Silver et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–489.

Herbert Simon (1962). The Architecture of Complexity. Proceedings of the American Philosophical Society, 106, (6), 467-482.

467-482.

Brian Skinner (2010). The price of anarchy in basketball. Journal of Quantitative Analysis in Sports, 6(1).

Alicia Smith (2016). The 7 Fundamental Steps to Complete a Data Project.

Anubhav Srivastava (2016). The best known machine learning algorithms?

Manuel Stein et al (2017). How to Make Sense of Team Sport Data: From Acquisition to Data Modeling and Research Aspects. Data, 2(1), 2.

David Sumpter (2016). Soccernomics. Oxford: Bloomsbury Publishing.

David Sumpter (2017a). The geometry of attacking football.

David Sumpter (2017b). Football analytics of the future.

David Sumpter (2017c). Automatically measuring decision-making on the pitch.

David Sumpter (2017d). Using Markov chains to evaluate football players' contributions.

David Sumpter (2017e). How an algorithm can measure defence and press in football.

Paul Taylor (2016). The Concept of 'Cat Face'. London Review of Books, 38(16), 30-32.

Jake Vanderplas (2016). Python Data Science Handbook.

Jan Van Haaren, Medi Kaytou & Jesse Davis (2016). Machine Learning and Data Mining for Sports Analytics. Proceedings of the Workshop on Machine Learning and Data Mining for Sports Analytics 2016, Riva del Garda, Italy, September.

Fjodor Van Veen (2017). Neural network zoo.

Analytics Vidhya (2016). Tree based modeling.

Toby Walsh (2017). Know when to fold 'em: AI beats world's top poker players.

Xinyu Wei, Patrick Lucey, Stuart Morgan, Machar Reid & Sridha Sridharan (2016). “The Thin Edge of the Wedge”: Accurately Predicting Shot Outcomes in Tennis using Style and Context Priors. Paper presented at the MIT Sloan Sports Analytics Conference, March.

Daniel Weitzenfeld (2014). A Hierarchical Bayesian Model of the Premier League.

Geoffrey West (2017). Scaling: the surprising mathematics of life and civilisation.

Richard Whittall (2016a). How to build a simple football scouting algorithm, part 1.

Richard Whittall (2016b). How to build a simple football scouting algorithm, part 2.

Richard Whittall (2016c). How to build a simple football scouting algorithm, part 3.

Richard Whittall (2016d). How to build a simple football scouting algorithm, conclusion.

Terry Winograd & Fernando Flores (1986). Understanding Computers and Cognition. Norwood, NJ: Ablex.

Stephanie Yee & Tony Chu (2015). A visual Introduction to Machine Learning.

Yisong Yue, Yisong, Patrick Lucey, Peter Carr, Alina Bialkowski, & Iain Matthews (2014). Learning fine-grained spatial models for dynamic sports play prediction. In Data Mining (ICDM), 2014 IEEE International Conference.

Stephen Zheng, Yisong Yue & Patrick Lucey (2016). Generating long-term trajectories using deep hierarchical networks.

Suggested Reading

Jim Albert, Mark Glickman, Tim Swartz & Ruud Koning (2016). Handbook of Statistical Methods and Analyses in Sports. Boca Raton, Fl: CRC Press.

Cornelius Arndt & Ulf Brefeld (2016). Predicting the future performance of soccer players. Statistical Analysis and Data Mining, 9(5), 373-382.

Charles Babcock (2015). IBM Cognitive Colloquium Spotlights Uncovering Dark Data.

Robert Barker & Ted Kwartler (2015). Sport Analytics Using Open Source Logistic Regression Software to Classify Upcoming Play Type in the NFL. Journal of Applied Sport Management, 7(2).

Vinnay Bettadapura, Caroline Pantofaru & Irfan Essa (2016). Leveraging Contextual Cues for Generating Basketball Highlights.

Per Harald Borgen (2016). Machine Learning in a Year.

Mike Bostock (2014). Visualizing Algorithms.

Edward Boydon (2017). Hybrid intelligence: coupling AI and the human brain.

Colin Brewer & Rob Jones (2002). A five-stage process for establishing contextually valid systematic observation instruments: the case of rugby union. Sport Psychologist, 16(20), 138-159.

Joel Brooks, Matthew Kerr & John Guttag (2016). Using machine learning to draw inferences from pass location data in soccer. Statistical Analysis and Data Mining, 9(5), 338-349.

Jason Brownlee (2017). How to handle missing data with Python.

Alfredo Canziani, Adam Paszke & Eugenio Culurciello (2017). An Analysis of Deep Neural Network Models for Practical Applications.

Jamie Coles (2017). A beginner's guide to predictive machine learning algorithms: an Alteryx infographic.

Victor Cordes & Lorne Olfman (2016). Sports Analytics: Predicting Athletic Performance with a Genetic Algorithm.

Chris Cushion, Stephen Harvey, Bob Muir & Lee Nelson (2012). Developing the Coach Analysis and Intervention System (CAIS): Establishing validity and reliability of a computerised systematic observation instrument. Journal of Sports Sciences, 30(2),210-216.

Chris Cushion & Rob Jones (2001). A systematic observation of professional top-level youth soccer coaches. Journal of Sport Behaviour, 24(4).

DL4J (2016). What is Deeplearning4j?

Cory Doctorow (2016). Weapons of Math Destruction: invisible, ubiquitous algorithms are ruining millions of lives.

Sam Edgemon (2016). What does a winning thoroughbred horse look like?

Iztok Fister et al (2015). Computational intelligence in sports: Challenges and opportunities within a new research domain. Applied Mathematics and Computation, 262, 178-186.

Floorball Analytics (2017a). How to score - differences in goal scoring in Sweden/Finland.

Floorball Analytics (2017b). Corsi and Fenwick - advanced stats in Floorball.

Sofia Fonseca, João Milho, Bruno Travassos, & Duarte Araújo (2012). Spatial dynamics of team sports exposed by Voronoi diagrams. Human Movement Science, 31(6), 1652-1659.

Simon Fothergill, Robert Harle & Sean Holden (2008). Modeling the Model Athlete: Automatic Coaching of Rowing Technique. Joint IAPR Workshops on Structural & Syntactic and Statistical Pattern Recognition, Springer.

Alexander Franks, Alexander D'Amour, Daniel Cervone & Luke Bornn (2016). Meta-Analytics: Tools for Understanding the Statistical Properties of Sports Metrics.

Jan V Haaren, Albrecht Zimmermann & Jesse Davis (2016). MLSA15 - Proceedings of "Machine Learning and Data Mining for Sports Analytics", workshop @ ECML/PKDD 2015.

Nils Hammerla (2015). Activity recognition in naturalistic environments using body-worn sensors. PhD thesis, Newcastle University.

Amr Hassan, Norbert Schrapf, Wael Ramadan & Markus Tilp (2016). Evaluation of tactical training in team handball by means of artificial neural networks.

Geoffrey Hinton, Simon Osindero & Yee-Whye Teh (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.

Hamel Husein (2017). Automated Machine Learning — A Paradigm Shift That Accelerates Data Scientist Productivity @ Airbnb.

Aarshay Jain (2016). A Complete Tutorial to work on Big Data with Amazon Web Services (AWS).

Punit Jajodia (2017). Removing outliers using standard deviation in Python.

Will Knight (2017). Poker Is the Latest Game to Fold Against Artificial Intelligence.

Gunjan Kumar (2013). Machine Learning for Soccer Analytics.

Leonardo Lamas, Junior Barrera, & Guilherme Otranto (2014). Invasion team sports: strategy and match modeling. International Journal of Performance Analysis in Sport, 14, 307-329.

Hoang Le (2017). Beyond deep learning: a case study in sports analytics.

Stephen Levy (2017). We are all Kasparov.

Brett Lieblich (2017). Presenting adjusted Pythagorean Theorem.

Henry Lin & Max Tegmark (2016). Why does deep and cheap learning work so well?

António Lopes,, Sofia Fonseca, Roland Leser, & Arnold Baca (2015). Using Voronoi diagrams to describe tactical behaviour in invasive team sports: an application in basketball. Cuadernos de Psicologia del Deporte, 15(1), 123-130.

Thomas Loridan (2016). Simulating the English Premier League season.

Vishal Maini (2017). Machine learning for humans.

Fumito Masui et al. (2015). Toward curling informatics — Digital scorebook development and game information analysis. Proceedings of IEEE Conference on Computational Games, September, 481-488.

Annalyn Ng & Kenneth Soo (2016). Random Forest Tutorial.

Jason O'Rawe (2016). R or Python for data science?

Sunil Ray (2016). Essentials of Machine Learning Algorithms.

Chris Rawles (2017). Data Science How-To: Using Apache Spark for Sports Analytics.

Robert Rein,, Dominik Raabe, Jürgen Perl, & Daniel Memmert (2016). Evaluation of changes in space control due to passing behavior in elite soccer using Voronoi-cells. In Proceedings of the 10th International Symposium on Computer Science in Sports (ISCSS), 179-183. Springer International Publishing: Berlin.

Jeff Sackmann (2016). Github Tennis Data.

Todd Schneider (2016). A Tale of Twenty-Two Million Citi Bike Rides: Analyzing the NYC Bike Share System.

Rajiv Shah & Rob Romijnders (2016). Applying deep learning to basketball trajectories.

Oliver Sutton (2012). Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction.

David Sumpter (2017). Decentralised football is more effective than focusing on one or two players.

Martin Theuwissen (2015). R vs Python for data Science?

Shantnu Tiwari (2015). Machine Learning for Beginners. [Video presentation].

Anton van den Hengel (2017). Can machines really tell us if we're sick?

Matt Woolman. Visual Complexity.

Steven Wu & Luke Bornn (2017). Modeling offensive player movement in professional basketball.

Xplenty (2017a). Introduction to data integration or what is ETL?

Xplenty (2017b). 5 reasons to use an ETL tool rather than 'script your own'.

Photo Credit

Pre-season (Keith Lyons, CC BY 4.0).