Play Tennis Dataset Csv Download

(i) The Play Tennis / Don't Play Tennis dataset contains information on people's participation in tennis based on the weather. In this instance, our interest is in the likelihood that tennis will be played if it is sunny.

We must apply Bayes' theorem in order to calculate this. According to Bayes' theorem, P(A|B) = P(B|A) * P(A) / P(B). The event "play tennis" in this instance is A, whereas the event "sunny" is B.

Download 🔥 https://urlca.com/2y3BY5 🔥

The likelihood that it will be sunny given that tennis players are present is P(B|A). By counting the number of times it is sunny when people are playing tennis, which is 3/5, this may be determined.

P(play tennis | sunny) = 3/5 * 9/14 / 5/9 = 0.6 is the result of adding everything up. This suggests that, given that it is sunny, there is a 0.6 percent chance that someone will be playing tennis.

(ii) Based on the weather, it is possible to predict whether or not someone will play tennis using the Play Tennis / Don't Play Tennis dataset. The dataset has a total of 14 data points, 9 of which have the category "play tennis" and 5 of which have the category "don't play tennis."

We must first determine the likelihood that it will be sunny and breezy before we can determine the likelihood that someone will play tennis given those conditions. To accomplish this, use the following formula:

Now that we have determined the likelihood of it being sunny and windy, we can utilise this knowledge to determine the likelihood that someone will play tennis in the event that it is sunny and windy. To accomplish this, use the following formula:

Example

What is the probability of playing tennis when it is sunny, hot, highly humid and windy? So using the tennis dataset, we need to use the Naive Bayes method to predict the probability of someone playing tennis given the mentioned weather conditions.

If you're not bound to this specific dataset, there exist many alternative datasets with categorical features used in discrete classification tasks. See that UCI Machine Learning repository. Try filtering to "Categorical" and/or "Mixed" attribute types and "Classification" for the default task. Some potential candidates datasets for your task (with mostly categorical features):

Now comes the part where we apply the Discretizer object to the whole dataset. To that end, we will define a NaiveBayesPreprocessor object. If a field is discrete (i.e. categorical), it will leave it (mostly) untouched (in reality, it will eliminate the values that does not occur more than 1% of the time). If the field is continuous, it will bin it as above.

This time, BernoulliNB does well. This is because it binarizes the dataset prior to fitting the Bernoulli Naive Bayes, and the threshold it uses to binarize is 0. Incidentally, this works well with predicting the activity. In the prior example, this had almost no added value since everything was nonnegative.

At this point, one could do some parameter tuning, play with the possible bin value etc. In any case, this dataset is not a great dataset for the Naive Bayes type algorithms, but I wanted to see how this implementation does in such an example.

Lets say we have a table that decided if we should play tennis under certain circumstances. These could be the outlook of the weather; the temperature; the humidity and the strength of the wind:

Now, we are in the testing phase. For this, say we were given a new instance, and we want to know if we can play a game or not, then we need to lookup the results from the tables above. So, this new instance is:

Previous tennis epidemiological reviews have found that acute injuries more typically occur in the lower limbs, whereas chronic overuse injuries occur more frequently in the upper limbs and trunk.2 Musculoskeletal injuries in tennis can affect almost any part of the body, with the majority of injuries being classified as overuse injuries resulting from repetitive microtrauma.3 Gender is not thought to influence injury rate.2 Identification of the site at risk of injury and associated factors contributing to the risk of injury can help target more specific injury prevention strategies to maximise player health and minimise injury risk.

Injury rate was lower for male players (17.7 injuries per 1000 sets played) than female players (23.4 injuries per 1000 sets played). There was variability in the numbers of injuries reported by men and women players over the 10-year period (figure 2).

We considered all matches played by professional tennis players between 1968 and2010, and, on the basis of this data set, constructed a directed and weighted network of contacts. The resulting graph showed complex features, typical of many real networked systems studied in literature. We developed a diffusion algorithm and applied it to the tennis contact network in order to rank professional players. Jimmy Connors was identified as the best player in the history of tennis according to our ranking procedure. We performed a complete analysis by determining the best players on specific playing surfaces as well as the best ones in each of the years covered by the data set. The results of our technique were compared to those of two other well established methods. In general, we observed that our ranking method performed better: it had a higher predictive power and did not require the arbitrary introduction of external criteria for the correct assessment of the quality of players. The present work provides novel evidence of the utility of tools and methods of network theory in real applications.

Data were collected from the web site of the Association of Tennis Professionals (ATP, www.atpworldtour.com). We automatically downloaded all matches played by professional tennis players from January 1968 to October 2010. We restrict our analysis only to matches played in Grand Slams and ATP World Tour tournaments for a total of 3640 tournaments and 133261 matches. For illustrative purposes, in the top plot of the panel a of Figure 1, we report the number of tournaments played in each of the years covered by our data set. With the exception of the period between 1968 and 1970, when ATP was still in its infancy, about 75 tournaments were played each year. Two periods of larger popularity were registered around years 1980 and 1992 when more than 90 tournaments per year were played. The total number of different players present in our data set is 3700, and in the bottom plot of panel a of Figure 1 we show how many players played at least one match in each of the years covered by our analysis. In this case, the function is less regular. On average, 400 different players played in each of the years between 1968 and 1996. Large fluctuations are anyway visible and a very high peak in 1980, when more than 500 players participated in ATP tournaments, is also present. Between 1996 and 2000, the number of players decreased from 400 to 300 in an almost linear fashion. After that, the number of participants in ATP tournaments started to be more constant with small fluctuations around an average of about 300 players.

In panel a, we report the total number of tournaments (top panel) and players (bottom panel) as a function of time. In panel b, we plot the fraction of players having played (black circles), won (red squares) and lost (blue diamonds) a certain number of matches. The black dashed line corresponds to the best power-law fit with exponent consistent with the value .

In panel a, we draw the subgraph of the contact network restricted only to those players who have been number one in the ATP ranking. Intensities and widths are proportional to the logarithm of the weight carried by each directed edge. In panel b, we report a schematic view of the matches played during a single tournament, while in panel c we draw the network derived from it.

In the simplest case in which the graph is obtained by aggregating matches of a single tournament only, we can analytically determine the solutions of Eqs. (1). In a single tournament, matches are hierarchically organized in a binary rooted tree and the topology of the resulting contact network is very simple [see Figure 2, panels b and c]. Indicate with the number of matches that the winner of the tournament should play (and win). The total number of players present at the beginning of the tournament is . The prestige score is simply a function of , the number of matches won by a player, and can be denoted by . We can rewrite Eqs. (1) as(2)where and . The score is given by the sum of two terms: stands for the equal contribution shared by all players independently of the number of victories; represents the score accrued for the number of matches won. The former system of equations has a recursive solution given by(3)which is still dependent on a constant that can be determined by implementing the normalization condition(4)

In Figure 3, we plot Eqs. (6) and (7) for various values of . In general, sufficiently low values of allow to assign to the winner of the tournament a score which is about two order of magnitude larger than the one given to players loosing at the first round. The score of the winner is an exponential function of , the length of the tournament. Grand Slams have for instance length and their relative importance is therefore two or four times larger than the one of other ATP tournaments, typically having lengths or .

In panel a, we present a scatter plot of the prestige rank versus the rank based on the number of victories (i.e., in-strength). Only players ranked in the top 30 positions in one of the two lists are reported. Rank positions are calculated on the network corresponding to all matches played between 1968 and 2010. In panel b, a similar scatter plot is presented, but now only matches of year 2009 are considered for the construction of the network. Prestige rank positions are compared with those assigned by ATP. 2351a5e196