Pattern Recognition

Introduction

This theme introduces pattern recognition of sport performance data.

Richard Duda, Peter Hart and David Stork (2001:1) define pattern recognition as "the act of taking in raw data and making an action based on the category of the pattern". They observe:

The ease with which we recognize a face, understand spoken words, read handwritten characters, identify our car keys in our pocket by feel, and decide whether an apple is ripe by its smell belies the astoundingly complex processes that underlie these acts of pattern recognition.

They add:

It is natural that we should seek to design and build machines that can recognize patterns ... it is clear that reliable accurate pattern recognition by machine would be immensely useful. (2001:1)

We introduce pattern recognition here in order to explore how insights from data science might help coaches and athletes to transform performance. We see this as an important step from the study of human real-time pattern recognition (see, for example, Nicholas Smeeton, Paul Ward and Mark Williams, 2004; Christopher Moore and Sean Muller, 2014) and of decision-making (Joe Causer and Paul Ford, 2014).

Vincent Granville (2017) provides an overview of some of the terminology used in the process of pattern recognition. These terms include 'data science', data mining, knowledge discovery in databases, and machine learning. Karl Broman (2013) reminds us:

If you're analyzing data, you're doing statistics. You can call it data science or informatics or analytics oe whatever, but it's still statistics.

Segundo Guzman and his colleagues (2016) discuss some of the neurophysiological mechanisms involved in pattern recognition and provide insights into the synaptic mechanisms of pattern completion. Esko Kilpi (2017) explores the social dimensions of pattern recognition in the context of 'emergent interaction'.

Data Science

Gil Press (2013) points out:

The term “Data Science” has emerged only recently to specifically designate a new profession that is expected to make sense of the vast stores of big data. But making sense of data has a long history and has been discussed by scientists, statisticians, librarians, computer scientists and others for years.

Gil provides a timeline that traces the evolution of the term "Data Science". This includes:

1947 John W. Tukey coined the term 'bit'.

1948 Claude Shannon A Mathematical Theory of Communications

1962 John W. Tukey The Future of Data Analysis

1968 IFIP Guide to Concepts and Terms in Data Processing

1974 Peter Naur Concise Survey of Computer Methods

1977 John W. Tukey Exploratory Data Analysis

1977 International Association for Statistical Computing

1989 1^st Knowledge Discovery in Databases Workshop

1996 IFCS Conference Data Science, Classification, and Related Methods

1996 From Data Mining to Knowledge Discovery in Databases

1997 Jeff Wu Statistics = Data Science?

1997 Data Mining and Knowledge Discovery Journal

2001 William Cleveland Data Science: An Action Plan

Gil's review details 21^st century development up to 2013.

D J Patil (2011) discussed building data science teams. He proposed these characteristics of a data scientist: technical expertise; curiosity; storytelling; and cleverness.

David Donoho (2015) considers a century of data science with a look back to 1962 and a look forward to 2065. He offers a vision of data science based on the activities of people who are ‘learning from data’, and describes an academic field dedicated to improving that activity in an evidence-based manner that is an academic enlargement of statistics and machine learning (2015:3).

Longbing Cao (2017) provides an overview of challenges and directions in data science from a complex-systems perspective.

Vincent Granville (2015a) shares this view of the precursors to data science:

Vincent Glanville (2015b) proposes that "a new category of data scientists emerging: data scientists with strong statistical knowledge, just we already have a category of data scientists with significant engineering experience (Hadoop)".

He adds:

what makes data scientists different from computer scientists is that they have a much stronger statistics background, especially in computational statistics, but sometimes also in experimental design, sampling, and Monte Carlo simulations.

Michael Hochster (2014) identified two types of data scientists, Type A and Type B.

Type A is an Analyst: "very similar to a statistician (and may be one) but knows all the practical details of working with data that aren't taught in the statistics curriculum: data cleaning, methods for dealing with very large data sets, visualization, deep knowledge of a particular domain, writing well about data, and so on". Type A "may be an expert in experimental design, forecasting, modeling, statistical inference, or other things typically taught in statistics departments".

Type B is a Builder. They share some statistical background with the Analyst "but they are also very strong coders and may be trained software engineers".

Alex Castrounis (2017) identifies four pillars of data science expertise:

Business domain
Statistics and probability
Computer science and software programming
Written and verbal communication

He proposes this definition of a data scientist:

a data scientist is a person who should be able to leverage existing data sources, and create new ones as needed in order to extract meaningful information and actionable insights. These insights can be used to drive business decisions and changes intended to achieve business goals.

Ryan Swanstrom (2017) suggests there are three stages "of a truly mature data science organization":

1. Dashboards

(Requires investment in data storage in a single location, extract transform load (ETL) tools, and reporting tools)

2. Machine Learning

(A focus on estimated causal outcome of potential events.)

3. Actions

(Uses the results generated in stage 2 to take 'appropriate' actions.)

David Taylor (2016) has discussed the visualisation of data science and points out "As a field full of data nerds with a penchant for visualization, it's also unsurprising that a lot of them use Venn diagrams".

David and Alex both point to Stephan Kolassa's (2015) visualisation as one of their favourite visualisations:

Stephen's post about his diagram includes the R code used to generate the ellipses.

Thomson Nguyen (2014) considers the differences between data scientists and data analysts in this video:

Gregory Piatesky (2013) connects data science and data mining and uses both terms interchangeably:

You can best learn data mining and data science by doing, so start analyzing data as soon as you can! However, don't forget to learn the theory, since you need a good statistical and machine learning foundation to understand what you are doing and to find real nuggets of value in the noise of Big Data.

Gregory provided an overview of the analytics industry in a presentation made in 2011.

Stephanie Hicks and Rafael Irizarry (2016) present a guide to teaching data science and share a case study of an introduction to data science that is organised around three themes: creating, connecting, computing.

Andrew Therriault (2017) presents a guide to data security. He cautions:

Everyone who creates, manages, analyzes, or even just has access to data is a potential point of failure in an organization’s security plan. So if you use data which is at all sensitive — that is, any data you wouldn’t freely give out to any random stranger on the internet — then it’s your responsibility to make sure that data is protected appropriately.

Shane Brennan (2017) notes ten fallacies of data science. He argues:

There exists a hidden gap between the more idealized view of the world given to data-science students and recent hires, and the issues they often face getting to grips with real-world data science problems in industry. All these new college courses in data analytics (they’re almost all newly-minted courses) aim at teaching students the basics of coding, statistics, data wrangling etc. However, the kind of challenges you’re expected to overcome in an actual data science job within industry are greatly under-represented.

George Krasadakis (2017) alerts us to the importance of data quality in an age of artificial intelligence.

Josh Devins (2017) shares an example of an enterprise discussion (at Soundcloud) about data science processes. See also, this 2016 discussion of data informed decision making at Soundcloud.

Jake Moody (2017) created an infographic to draw distinctions between data engineering and data science. Data engineer responsibilities on left column, data scientist on the right column.

Steph de Silva (2016a, 2016b) shared her insights into asking questions about data analysis. She shared these two infographics:

Source: 2016a

Source: 2016b

Data Mining

Tom Mitchell (1999:1) describes data mining as the use of historical data "to discover regularities and improve future decisions". This discovery of patterns in data requires:

A database
Data formatting and cleansing
Data visualisation and summary
Machine learning algorithms
Human expert domain knowledge

Ian Witten and Eibe Frank (2005: xxiii) define data mining as "the extraction of implicit, previously unknown, and potentially useful information from data". Data mining is "the process of discovering patterns in data" that uses automatic or semiautomatic" processes.

Machine Learning

In her introduction to machine learning, Omoju Miller (2017) observes:

At its core, machine learning is not a new concept. The term was coined in 1959 by Arthur Samuel, a computer scientist at IBM, and it’s been widely used in software since the 1980s.

In his 1959 paper, Arthur notes:

We have at our command computers with adequate data-handling ability and with sufficient computational speed to make use of machine-learning techniques, but our knowledge of the basic principles of these techniques is still rudimentary. Lacking such knowledge, it is necessary to specify methods of problem solution in minute and exact detail, a time-consuming and costly procedure. Programming computers to learn from experience should eventually eliminate the need for much of this detailed programming effort. (1959:71)

(Note that the first operating checker program for the IBM 701 was written in 1952, recoded in 1954, completed in 1955 and demonstrated on television in February 1956.)

Arthur provided an update of his work in a 1967 paper.

Tom Mitchell (2006) points out:

Over the past 50 years the study of Machine Learning has grown from the efforts of a handful of computer engineers exploring whether computers could learn to play games, and a field of Statistics that largely ignored computational considerations, to a broad discipline that has produced fundamental statistical-computational theories of learning processes, has designed learning algorithms that are routinely used in commercial systems for speech recognition, computer vision, and a variety of other tasks, and has spun off an industry in data mining to discover hidden regularities in the growing volumes of online data.

He proposes that the discipline of machine learning ("a natural outgrowth of the intersection of Computer Science and statistics") seeks to answer the question 'How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?'

He adds:

we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc. (2006:1) (Original emphasis)

Stephen Marsland (2015:4) proposes that machine learning "is about making computers modify or adapt their actions ... so that these actions get more accurate, where accuracy is measured by how well the chosen actions reflect the correct ones". (Original emphasis)

Alex Castrounis (2016) says of machine learning:

Machine learning is a subfield of computer science, but is often also referred to as predictive analytics, or predictive modeling. Its goal and usage is to build new and/or leverage existing algorithms to learn from data, in order to build generalizable models that give accurate predictions, or to find patterns, particularly with new and unseen similar data. (Original emphasis)

The exploration of machine learning gives us an opportunity to think about the intersection of informatics and analytics (Lasse Holstrom and Petri Koistinen, 2010; Ethem Appaydin, 2011).

Wikipedia has a comprehensive machine learning portal. Raul Garreta (2015) has provided a gentle introduction to machine learning that presents "some initial concepts to invite the reader to continue investigating". R2D3 have a visual introduction to machine learning. Ophir Tanz and Cambron Carter (2017) have provided a conversational introduction to machine learning.

Daniel Tunkelang (2017) shares a list of ten things everyone should know about machine learning. His list:

Machine learning means learning from data
Machine learning is about data and algorithms, but mostly data
Unless you have a lot of data, you should stick to simple models
Machine learning can only be as good as the data you use to train it
Machine learning only works if your training data is representative
Most of the hard work for machine learning is data transformation
Deep learning is a revolutionary advance, but it isn’t a magic bullet
Machine learning systems are highly vulnerable to operator error
Machine learning can inadvertently create a self-fulfilling prophecy
AI is not going to become self-aware, rise up, and destroy humanity

You might find this glossary of machine learning terms helpful as you explore the machine learning literature.

For a discussion of Machine Learning with R and Python, see Tinniam Ganesh (2017a, 2017b).

Aliva Smith's (2017) visualisation of the knowledge discovery process.

For an introduction to decision trees as a machine learning algorithm that can be used for classification or regression, see Mohit Deshpande (2017).

Decision trees are one of the ten machine learning algorithms discussed by Sidath Asiri (2017).

Artificial Intelligence

In 1955, a proposal for funding was made to host "a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College". There is a Wikipedia description of the workshop. Daniel Cervier (1993:49) has argued that "the conference is generally recognized as the official birthdate of the new science". (For more background information about the emergence of artificial intelligence as a field of study in the 1950s, see the National Research Council's (1999) account of government support for computing research.

John McCarthy and Patrick Hayes (1969:2) observe:

The idea of an intelligent machine is old, but serious work on the artificial intelligence problem or even serious understanding of what the problem is awaited the stored program computer. We may regard the subject of artificial intelligence as beginning with Turing’s article Computing Machinery and Intelligence (Turing, 1950) and with Shannon’s (1950) discussion of how a machine might be programmed to play chess.

John and Patrick distinguish between the epistemological and heuristic aspects of artificial intelligence. They propose:

an entity is intelligent if it has an adequate model of the world (including the intellectual world of mathematics, understanding of its own goals and other mental processes), if it is clever enough to answer a wide variety of questions on the basis of this model, if it can get additional information from the external world when required, and can perform such tasks in the external world as its goals demand and its physical abilities permit. (1969:4)

Joseph Licklider (1960) was one of the pioneers of interactive computing. In his discussion of man-computer symbiosis, he observes:

it seems worthwhile to avoid argument with (other) enthusiasts for artificial intelligence by conceding dominance in the distant future of cerebration to machines alone. There will nevertheless be a fairly long interim during which the main intellectual advances will be made by men and computers working together in intimate association. (1960:5)

Terry Winograd & Fernando Flores published Understanding computers and cognition: A new foundation for design in 1986. They take an explicit philosophical approach to the design of computer technology and suggest "theories about the nature of biological existence, about language and about the nature of human action have a profound influence on the shape of what we build and how we use it" (1986:xii).

Stuart Russell and Peter Norvig (1995:3) suggest that the field of artificial intelligence (AI) attempts to understand intelligent entities. They add "but unlike philosophy and psychology, which are also concerned with intelligence, AI strives to build intelligent entities as well as understand them". They have four categories of definition of artificial intelligence ((1995:5):

Jerry Kaplan (2016:5) suggests that the essence of artificial intelligence is "the ability to make appropriate generalizations in a timely fashion based on limited data".

Stefan van Duin and Naser Bakhshi (2017) visualise artificial intelligence thus:

In their model, "the concept of intelligence refers to some kind of ability to plan, reason and learn, sense and build some kind of perception of knowledge and communicate in natural language".

Raksham Pandey (2017) has visualised developments in artificial intelligence:

David Silver and his colleagues (2017) have published a paper on mastering the game of Go without human knowledge. They report:

Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

Algorithms

Navneet Alang (2016) observed:

Every age has its organizing principles. The nineteenth century had the novel, and the twentieth had TV; in our more modern times, they come and go more quickly than ever—on Web 1.0 it was the website, for example, and a few years later, for 2.0, it was the app. And now, another shift is underway: Today’s organizing principle is the algorithm.

There is debate about the definition of what constitutes an 'algorithm'. Mosha Yardi (2012:5), for example, observes "the fact hat we have an intuitive notion of what an algorithm is does not mean that we have a formal notion". He suggests that problems of definition relate to an algorithmic duality that "seems to be a fundamental principle of computer science":

An algorithm is both an abstract state machine and a recursor, and neither view by itself fully describes what an algorithm is. (2012:5)

Robin Hill (2016) worked through computer science and philosophical literature to explore this duality and proposed that:

An algorithm is a finite, abstract, effective, compound control structure, imperatively given, accomplishing a given purpose under given provisions.

In very basic terms, "the algorithm is the thing that programs implement, the thing that gets data processing and other computation done" (Hill, 2016). The Association for Computing Machinery (2017) define an algorithm as "a self-contained step-by-step set of operations that computers and other 'smart' devices carry out to perform calculation, data processing, and automated reasoning tasks".

Vaidehi Joshi (2017) takes a more direct view in her definition:

an algorithm is a really fancy name with a bad rap. They’re not nearly as scary as they sound. An algorithm is just fancy term for a set of instructions of what a program should do, and how it should do it. In other words: it’s nothing more than a manual for your code. (Original emphases.)

In this course, we propose to: explore some of the issues that arise in the age of the algorithm (Lee Raine & Janna Anderson, 2017); consider the emergence of performance monitoring tools and the algorithms that are used (including discussions about 'algorithmic skin'); and reflect on some of the ethical issues that arise from black box acceptance of algorithms. In doing so, our aim is to contribute to conversations about 'algorithmic transparency' (Association for Computing Machinery, 2017).

For a detailed discussion of algorithms see Jason Brownlee's (2013) tour of machine learning algorithms. For a discussion of machine learning in R see Jason Brownlee (2014).

Mark van Rijmenam (2017), amongst others, has drawn attention to ethical issues surrounding the use of algorithms. He notes that algorithms have two major flaws. They are:

Extremely literal; they pursue their (ultimate) goal literally and do exactly what is told while ignoring any other, important, consideration.
Black boxes; whatever happens inside an algorithm is only known to the organisation that uses it, and quite often not even.

Mark argues for a transparent approach to the use of algorithms that Michael van Lent (2004) defined as 'explainable artificial intelligence' (XAI). Michael notes Edward Shortliffe and his colleagues' (1975) exposition of how a program can "explain its recommendations when queried". More recently, Pat Langley, Ben Meadows, Mohan Sridharan & Dongkyu Choi (2017) have discussed the importance of 'explainable agency'. They argue "we must take seriously the need to communicate the reasons for agents’ decisions to human partners".

Patterns

This topic explores two specific aspects pattern discovery and pattern recognition (Hand, 2004) in sport:

1. The systematic observation of real-time and lapsed-time behaviour.

2. The use of supervised machine learning techniques to analyse data.

We believe that the disciplined observation of performance can lead to careful consideration of how computers can enrich our understanding of that performance through the generation of 'interesting' insights.

An example of such work is Gilbert Kotzbek's use of de-identified geographic information system (GIS) data in football game analysis. In a 2015 paper, Gilbert, and his PhD supervisor Wolfgang Kainz, describe their approach to GIS in detail. This short video is an example of their system output.

A 2016 paper described their use of these data to analyse scoring attempts in football. David Sumpter (2017b) has discussed the significance of this GIS work for the future of football analytics.

David Sumper (2017a) draws attention to other uses of player tracking data to map the geometry of football formations. These include Voronoi diagrams and Delauney triangulations. An example is the work of Jaime Sampaoi and his colleagues at the CreativeLab in Vila Real, Portugal.

Another example of this pattern recognition is presented in this Disney Research Hub (2017) video:

Artificial Neural Networks

James Anderson (1995) provides an introduction to neurocomputing and neural network algorithms. James Anderson & Edward Rosenfeld (1998) present further background to artificial neural networks in seventeen interviews with scientists involved in the theory and practice of neural networks.

The quest to recognise patterns in data includes the use of artificial neural network methods. Brian Ripley (1996:2) notes that "artificial neural networks have been developed by a community which was originally biologically motivated". He regarded a 'neural network' as a method "which arose or was popularized by the neural network community and has been or could be used for pattern recognition" (1996:2).

Imad Basheer & M. Hajmeer (2000:3) define these networks as "structures comprised of densely inter-connected adaptive simple processing element (called artificial neurons or nodes) that are capable of performing massively parallel computations for data processing and knowledge representation".

Cristian Randieri (2017) says of neural networks:

An Artificial Neural Networks is a simplified mathematical model of the biological ones. Similarly, it use nodes rather than neurons but building the same sorts of complex interconnections between them (synapses). Rather than storing all data in a huge pool to be analyzed as a whole, neural networks are able to memorize and so remember associations between concepts, streamlining the process of retrieval and analysis.

This allows computer scientists to make algorithms for “deep learning,” which arranges ideas as layers of definitions. Small concepts collectively define larger ones, which define larger ones, and so on. With enough input information, a sufficiently detailed neural network can learn quite deeply indeed.

Cristian shares this visualisation of a neural network.

Natolie Wochover (2017) has provided an overview of developments in deep neural networks. Amongst others she discusses in detail the work of Naftali Tishby and his account of deep learning and information bottlenecks.

Emil Wallner (2017) presents six snippets of code that made deep learning what it is today.

Roger Bartlett (2006) reports the use of artificial neural networks in sport. Mark Pfeiffer and Andreas Hohmann (2012) consider their use in training science and provide examples from swimming and handball. Ivars Namatevs, Ludmila Aleksejeva & Inese Polaka (2016) explore the application of neural network modelling to sports performance classification.

Jürgen Perl has written extensively about the use of artificial neural networks in sport. With Peter Dauscher (2006) he provided a review of dynamic pattern recognition in sport. Pedro Passos and his colleagues (2006) have extended this discussion.

An introductory example of the use of neural networks in sport can be found in Antonio Silva and his colleagues' (2007) account of modelling swimming performance. More recent examples include: cricket team selection (Subramanian Iyer & Ramesh Shard, 2009); weight training (Hristo Novatchkov & Arnold Baca, 2013); 400m hurdle performance (Krzysztof Przednowek et al., 2014); basketball (Matthias Kempe, Andreas Grunz & Daniel Memmert, 2015); deep convolutional neural networks (Martin Wagennar, 2016); and football (Daniel Memmert, Koen Lemmick & Jaime Sampaio, 2017).

This is Brandon Rohrer's introduction to neural networks:

Daniel Holden (2017) provides a helpful guide to troubleshooting when neural networks are not working.

Grant Sanderson (2017a) presents this introduction to neural networks and works through an example:

Alarije (2017) suggests some important points while designing a neural network are:

Input Data: must be composed of a representative number of data, with sufficiently different information, to avoid over-optimisation (overfitting). It is common to use 70% of the data to train the neuron, 20% to test the result and 10% as validation outside the sample.
Control the number of neurons and levels: too few, and the process becomes general. If they are too many, there will be too much data adjustment.
Chosen Functions: start with the simplest function and then complicate it according to the observations and further requirements.

Andrej Karpathy (2017) proposes "Neural networks are not just another classifier, they represent the beginning of a fundamental shift in how we write software. They are Software 2.0". He adds:

A large portion of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feeds neural networks.

Ben Gorman (2017a) provides and introduction to neural networks and a worked example (2017b).

Hackathons

Gerard Briscoe and Catherine Mulligan (2014) note that a hackathon is "an event in which computer programmers and others involved in software development, including interface designers, graphic designers and project managers, collaborate intensively over a short period of time". The occurrence of hackathons from their origins in the 1990s has had "a significant impact of the culture of digital innovation" (Briscoe & Mulligan, 2014).

The NBA hosted its first basketball analytics hackathon in September 2016. 210 students responded to the four prompts shared on the day of the hackathon:

Develop a new method or tool for evaluation of defensive performance in the NBA.
Develop a new method or tool for evaluation of the effectiveness of timeouts as an offensive or defensive strategy.
Build a too and / or model to predict the outcome of shots attempted.
Open topic: participants are allowed to pursue a creative, original topic (approved by the NBA League Office).

The 2017 NBA hackathon added a second stream, business analytics, to its challenge to participants. The rules for the event included these provisions:

Code used during the hackathon must be written during the designated contest period.
All software used by participants must be publicly and widely available.
Submissions become the property of the NBA.
You agree not to publicly disclose any submission without the prior consent of the NBA.

In February 2017, the Western Bulldogs Australian Rules Football Club hosted a hackathon in Ballarat Library, Victoria. One of the three challenges for the hackathon was the analysis of player tracking data and tactical behaviours. The hackathon provided access to data that had not been in the public domain previously. Use of some of the data provided required a non-disclosure agreement.

The announcement of the event included this introduction:

The Western Bulldogs in partnership with City of Ballarat look forward to welcoming participants of the inaugural Western Bulldogs Ballarat Hackathon. The event, which will run from the 24th to the 26th of February at the Ballarat Library, represents one of a number of initiatives implemented by the Bulldogs in ensuring they remain at the forefront of innovation in the quest for sustained success following the club’s 2016 AFL Premiership triumph. Teams at the event will have the opportunity to work on three challenges; one of these will focus on football performance & sport science.

Sam Robertson, Head of Research and Innovation at the club, observed:

The club sees this as a great opportunity to invite some of the best and brightest sports analysts around the country to address a few of the current key challenges faced by high performance sporting organisations. I am confident that the challenges presented at the Hackathon will produce some fantastic solutions from participating teams as well as provide a useful platform for individuals to showcase their work in front of a professional sporting club.

This hackathon was an excellent example of what (Briscoe & Mulligan, 2014) describe as a focus-centric, applied hackathon. Such hackathons "target software development to address or contribute to a social issue or a business objective".

ESPN hosted a hackathon at the 2017 Sloan Sports Analytics Conference. This was the third time the event had taken place at a Sloan conference. (News of the hackathons in 2015 and 2016.) The theme of the 2017 event was:

Sports analytics is often criticized for ignoring intangibles such as chemistry, leadership, heart, and instinct. Participants in the Hackathon will be asked to start pushing back on that by clearly defining and then measuring an aspect of on court performance that was previously talked about as an intangible attribute. Participants will utilize basketball player tracking data to facilitate their measurements and will be judged on completeness of their definition, measurement approach, and results of their process.

FC Nordsjaælland hosted a two-day Tracking Data hackathon in March 2017. "Allocated into teams, attendees will look to combine tracking data and event data to create insights aimed to improve post-match analysis and preparation for the next game". At their hackathon "attendees will be provided with the data from FC Nordsjaælland and Brøndby IF's 3 most recent games". These data include: Ball Events; Ball Tracking; and Player Tracking. Each attendee at the hackathon "will sign an Non Disclosure Agreement to prohibit the distribution of the datasets to non-attendees". Mladen Sormaz and Dan Nichol provide an example of the work produced at this event.

Source: Joe Mulberry (Twitter)

Students in the Sports Analytics Club at Simon Fraser University organised a hackathon in July 2017. Their hackathon description was:

Participants will compete in teams of 5 beginning on July 8th and finishing with a 5 minute presentation of their work on July 9th. The structure of the data sets will be published early to allow teams to have time to understand the data sets. We hope that this structure will allow teams to produce a high quality product rather than creating something rushed in a 24 hour period. Hockey, Soccer, and Basketball data sets will be provided. More specific details about the data will be coming soon. All Intellectual Property from the hackathon will be property of the data providers.

The hackathon was hosted at Simon Fraser University’s Harbour Centre campus in Vancouver.

In August 2017, the STATS company made available a basketball dataset which contained the x,y location of the players and ball. The sharing of these data coincided with the publication of a paper that explored trajectories using deep hierarchical networks. The company released football data too. One dataset contained the player positions at ten frames per second in addition to ball events. A second dataset focussed on goal-scoring events which enabled users to develop their own expected goal value model (xG).

MySwimPro held their third annual hackathon in December 2017. Each participant prepared a design challenge statement modelled on design thinking methodology.

The fourth Hackathon at the 2018 MIT Sloan Sports Analytics Conference will take place on 22 February. Participants "will be given a choice of four prompts to work from to utilize the complex NBA player tracking data to tell a novel story and answer a question" that could not previously be answered from a quantitative perspective.

In January 2018, Tennis Australia's Game Insights Group announced "a world first Tennis Hackathon" titled 'From AO to AI'. Tennis Australia partnered with CrowdANALTIX to host a hackathon throughout the Australian Open 2018. The Game Insights Group provided 10,000 points of Grand Slam tennis tracking data.

Algorun 18 hackathon was hosted by the Bogazici University’s Computer Club in April 2018. Participants were provided with NBA data.