Understanding the Data
Data Types: Identify types of data (numerical, categorical, datetime) to apply appropriate analysis techniques.
Data Structure: Examine the dataset's structure (rows, columns, data formats) to understand its organization and scope.
Summary Statistics: Calculate mean, median, mode, standard deviation, and range to understand central tendency and variability.
Distribution: Analyze numerical variable distribution using histograms, density plots, or box plots.
Histograms and Density Plots: Understand the distribution of a single numerical variable.
Box Plots: Identify outliers and understand the spread and skewness of data.
Bar Charts: Display frequency or proportion of categorical data.
Scatter Plots: Examine the relationship between two numerical variables.
Pair Plots: Visualize relationships between multiple variables simultaneously.
Handling Missing Values: Identify missing values and decide on strategies to handle them (imputation or removal).
Outliers: Detect and handle outliers using visualization techniques like box plots.
Data Transformation: Normalize or scale numerical data to prepare for analysis.
Correlation Matrix: Calculate correlation coefficients between pairs of numerical variables.
Heatmaps: Visual representation of the correlation matrix to identify strong and weak correlations.
Frequency Tables: Summarize the count of each category in categorical variables.
Bar Charts and Pie Charts: Visualize the distribution of categorical data.
Grouping and Aggregation: Group data by categorical variables and calculate aggregated statistics to identify patterns.
Cross-tabulation: Analyze the relationship between two or more categorical variables.
Statistical Tests: Perform tests like t-tests or chi-square tests to explore relationships and differences within the data.