ACTIVITY
DETAILS
ACTIVITY
DETAILS
SCOPE
This project explores the statistical characteristics of FIFA 21 player data sourced from Kaggle. The goal is to apply foundational statistical concepts to a real-world sports dataset, enabling better understanding of how variables can be identified, measured, and analyzed. The dataset contains over 19,000 player records with details like age, nationality, position, height, weight, and overall rating.
The key objectives of this project are:
To categorize the variables in the dataset as categorical or numerical based on their nature.
To identify the scale of measurement (nominal, ordinal, interval, ratio) appropriate for each variable.
To distinguish between discrete and continuous numerical variables with proper justification.
To create meaningful visualizations such as bar charts, histograms, and scatter plots to reveal patterns and trends in the data.
To interpret the visualizations and provide statistical reasoning for their selection.
By completing this project, we demonstrate our ability to analyze a large real-world dataset using fundamental statistical tools, laying the groundwork for more advanced data analysis in future courses.
The dataset used is from Kaggle - FIFA 21 Complete Player Dataset, which contains detailed statistics of football players featured in FIFA 21 by EA Sports.
Source citation:
Bryanb (2020). FIFA 21 Complete Player Dataset. Retrieved from Kaggle: https://www.kaggle.com/datasets/bryanb/fifa-player-stats-database
This table classifies football player data into categorical and numerical variables.(all comparison data uploaded on activity-1 page)
○Categorical variables (like name, nationality, preferred foot, and position) represent labels or categories and are not measurable in a numeric sense.
○Numerical variables (like age, height, weight, and overall rating) are measurable quantities and can be used for arithmetic operations.
Understanding the type of variable helps determine which statistical methods are appropriate for analysis and visualization.
Variables are categorized based on their measurement scale:
Nominal: Categories with no order (e.g., Name, Nationality).
Ordinal (not shown here): Categories with order.
Interval: Numeric scale with equal intervals but no true zero (e.g., Overall Rating).
Ratio: Numeric scale with a true zero and meaningful ratios (e.g., Age, Height, Weight).
Variables can be:
Discrete: Countable, often whole numbers (e.g., Age, Overall Rating).
Continuous: Can take any value within a range, measurable with decimals (e.g., Height, Weight).
Visualizations help interpret data effectively:
Bar charts: Used for categorical data (e.g., Preferred Foot, Position).
Histogram: Ideal for frequency of numerical data (e.g., Age).
Scatter plot: Shows relationships between two numerical variables (e.g., Height vs. Weight).