I.Importance of Statistical Graph
Humans are visual creatures by nature; things make sense to us when they are represented in an easy-to-understand visualization. Statistical graphics, also known as data visualization, play a crucial role in exploratory data analysis and communicating findings effectively. They provide a visual representation of data, allowing us to understand patterns, relationships, and distributions more easily.
II.Types of statistical graph
1.Bar Graph
A bar chart presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. (The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart.)
A bar graph shows comparisons among discrete categories. One axis of the chart shows the specific categories being compared, and the other axis represents a measured value. (Some bar graphs present bars clustered in groups of more than one, showing the values of more than one measured variable.)
1.Easy to read and interpret.
2.Useful for comparing values between categories or data points. They allow for quick identification of differences and similarities.
1.Not useful for displaying continuous data.
2. Can be misleading if the scale is not appropriate (or if the data is presented in a way that is designed to mislead the viewer.)
3.Can only display one or two variables at a time, less useful for displaying multivariate data.
2) Line Graph
A graph that utilizes points and lines to represent change over time is defined as a line graph.
The diagram depicts quantitative data between two changing variables with a straight line or curve that joins a series of successive data points.
The horizontal axis represents time or another continuous variable, while the vertical axis represents the variable being measured.
1.Work well in showing trends chronologically.
2.Clearly display relationships with continuous periodical data.
3.Visualize data changes at a glance
1.Only matches best with periodical data.
2.Mess the chart if many categories are compared in one line chart.
3) Pie Chart
A pie chart is a circular statistical graphic that is divided into slices to illustrate numerical proportions.
In a pie chart, the arc length of each slice is proportional to the quantity it represents.
Pie charts are very widely used in comparing percentages between categories.
1.Represent the proportional relationship between different categories effectively when focusing on comparing ratios rather than the numerical values.
1.Inadequate for making accurate comparisons between categories
2.A single pie chart does not facilitate the comparison of more than one sets of data.
4) Histogram
A histogram is a visual representation of the distribution of quantitative data.
To construct a histogram, the first step is to "bin" (or "bucket") the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) are adjacent and are typically (but not required to be) of equal size.
1.Provide a visual representation of the distribution of continuous data.
2. Symmetry, skewness, and multimodality can be visually assessed, providing insights into the underlying data properties.
1.Provide a summary of the data distribution but can result in the loss of specific data points or values.
5) Scatter Plot
A scatter chart is a chart that shows the relationship between two variables.
The X is the horizontal line with the independent variable and the Y is the vertical with the dependent variable, an even scale is created on both axes, and dots are made at the point that represents the intersection of the two coordinates.
There are 2 patterns to be found within a scatter chart:
Linear or nonlinear: A straight—correlation can be formed through the data points, but a non-linear correlation might show a curved relationship.
Weak or strong: The stronger the correlation is, the closer the dots will be together. A weak correlation will have more data points spread out.
In order to clearly show these relationships and trends, many scatter charts utilize trend lines. A trend line is drawn on the chart to emphasize the direction and strength of the trend.
1.Can reveal clusters or groupings within the data, which can help identify subpopulations or distinct patterns within the larger dataset. Clustering can provide valuable information about specific subsets of data and their characteristics.
2.Can help identify outliers, which are data points that deviate significantly from the general pattern. Outliers can indicate anomalies or errors in the data, or they may represent unique observations that require further examination.
1.Can only represent the relationship between two variables. additional scatter plots are required if there are more than two variables of interest.
2. Show you how certain variables correlate with each other, but a strong correlation doesn't necessarily mean causation. This may lead to false assumptions when interpreting the data from a scatter plot.
6) Box Plot
A box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles.In addition to the box on a box plot, there can be lines (which are called whiskers) extending from the box indicating variability outside the upper and lower quartiles, thus, the plot is also called the box-and-whisker plot and the box-and-whisker diagram. Outliers that differ significantly from the rest of the dataset may be plotted as individual points beyond the whiskers on the box-plot. Box plots can be drawn either horizontally or vertically.
A boxplot is a standardized way of displaying the dataset based on the five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles.
Minimum (Q0 or 0th percentile): the lowest data point in the data set excluding any outliers
Maximum (Q4 or 100th percentile): the highest data point in the data set excluding any outliers
Median (Q2 or 50th percentile): the middle value in the data set
First quartile (Q1 or 25th percentile): also known as the lower quartile qn(0.25), it is the median of the lower half of the dataset.
Third quartile (Q3 or 75th percentile): also known as the upper quartile qn(0.75), it is the median of the upper half of the dataset.
1.Box plots offer a concise summary of the distribution characteristics of continuous data. They display key statistical measures, including the median, quartiles, and potential outliers, providing a comprehensive overview of the data's central tendency, spread, and skewness.
2.Box plots visually highlight potential outliers in the data.
3.The boxplot is easy to understand, even for individuals without a background in statistics or data analysis.
1.Box plots summarize the distribution, but they can lose specific data points or values in the process.
2.Boxplots may not be as effective when dealing with small datasets with few data points.
3.Box plots are primarily designed for continuous data and may not be suitable for representing categorical or ordinal data. The concept of quartiles and medians is not directly applicable to non-numeric data.
III. Try Yourself
Task 1: Try Playing Different Plots
Here is the link of shiny apps. Please try to select different graphs and setting to see the change of the graph.
Task 2: Question Practice
Using the link and answer the following question:
1.How many variables are there, and what are they?
2.Write down the type of each variables.
3.If you want to know how many observations of each species, which graph will you choose?
4.What does the boxplot show you when choosing the sepal. length as the x-variables?
5.Which graph can show you the relationship between two variables, and what does it show?
6.If you want to analyze the dataset, which graph(s) will you use for the visualization?
IV. How to choose the right graph?
One project's success or failure depends very much on how well your data is visualized. Your audience will not understand the amount of work you put in or how to use the results if you spend a lot of time and energy modeling and analyzing your data with the incorrect chart type to display the results.
There are so many different kinds of charts so selecting the right one can be difficult and perplexing. This post will provide you with an easy-to-follow method for choosing the sort of chart that best communicates your data and accurately portrays it.
How do I get start?
Before you start choosing the right chart kinds, you need to ask yourself 5 key questions about your data. By asking yourself these questions, you will be able to understand your data better and select the appropriate type of chart to show the data.
1) What’s the story your data is trying to deliver?
How and why was this data collected? Is your data collection aiming to identify patterns? to contrast between choices? Does it display a distribution? or is employed to see how several value sets relate to one another?
It will be a lot easier for you to choose a chart type if you know what your data is trying to convey and how it came to be.
After you have understood the meaning of your data, you must figure out to whom you will submit your findings. You may choose to utilize a different kind of chart when presenting your analysis of stock market trends to businessmen as compared to beginners. These two types of audiences have different levels of financial knowledge.
Increasing the effectiveness of data transmission is the main goal of applying data visualization. Because of this, you must be aware of your target audience in order to select the most appropriate chart type to present your data to them.
The kind of chart you choose will be greatly influenced by the size of your data. While some chart types work well with large datasets, some are not recommended for use with them. For instance, scatter plots perform better with a large number of datasets, while piecharts perform best with a small number of datasets.
Data can be described as continuous, qualitative, categorical, or in some other ways. Certain chart kinds can be excluded based on the type of data. In the case your data is continuous, for instance, a line chart can be a better option than a bar chart. In a similar way, it could be wise to use a pie chart or bar chart if your data is classified. Since continuous categories do not exist by definition, you probably shouldn't use a line chart with categorical data. There must be a discrete, limited number of categories.
5) What connections exist between the different elements of your data?
Lastly, you should consider the relationships between the different elements of your data. Do you have a data order based on time, size, or type? does not indicate a ranking determined by a variable? or a correlation between various variables? Is the data you have changing over time, or is it a time-series? Or is it more like a distribution?
It may be easier to choose the type of chart to use if you consider the relationship between the numbers in your dataset.
Bar Graph
1. Comparing portions of a larger dataset, emphasizing various categories, or illustrating changes across time.
2. Have a long category name
3. Display the dataset's positive and negative values.
1. If more than one data point is being used.
2. Try not to overcrowd your graph if you have many of categories. There shouldn't be more than ten bars on your graph.
Line Graph
1. Continuous dataset that changes over time
2. The dataset which is too big for a bar chart.
3. Show different series on the same timeline.
4. Show the trends rather than precise numbers.
1.Line charts perform better with larger datasets; Bar charts are a better option instead, if you have a little dataset.
Pie Chart
1. Display the overall dataset's proportions and percentages.
2. Works best with small size of datasets
3. When contrasting how ONE element affects several categories.
1. If your dataset is large.
2. If you wish to compare values in an exact or absolute manner.
Histogram
1.You would like to explore how members within a category in a dataset are distributed.
2.You have one continuous, numerical value that can be split into multiple bins
3.You are looking to understand the distribution of values within a single category
1. You want to investigate the distribution of members within a category in a dataset.
2. You have one continuous numerical value that can be divided into several bins.
3. Your goal is to comprehend how values are distributed inside a single category.
Scartter Plot
1. To display clustering and correlation in large datasets.
2. If there are points in your dataset with a pair of values.
3. If the dataset's point order is not crucial.
1. If your dataset is small.
2. If your dataset's values are not correlated.
Box Plot
1. To compare the distributions(medians).
2. To find outliers.
3. To understand spread and variability.
1. If your dataset is small.
2. not suitable for categorical data.
V.Conclusion
Therefore, in order to interpret and evaluate the useful applied mathematics information, statistical graphs are used, which streamlines our task. They help to communicate complex knowledge in a tabular, representational, and pictorial format that is very ideal for an easy-to-understand and clear understanding of the data.
Statistical graphs facilitate the comparison of data or information from many sources and enable us to make quantitative findings. The statistics facts are visually appealing and easy to interpret. These kinds of graphs can be created by using a statistics graph generator.
Along with a life of their reliability, statistics aids in drawing reliable conclusions about the population parameters from sample knowledge.
Although statistical graphs have drawbacks and restrictions, they are still commonly utilized in projects, presentations, and other similar contexts.
As a result, statistical graphs and the various kinds of them play a crucial role in our daily lives.
VI.Reference
Calzon, B. (2023, April 20). See 20 different types of graphs and charts with examples. datapine. https://www.datapine.com/blog/different-types-of-graphs-charts-examples/#graphs-charts-types
Learning, U. M. (2016, September 27). Iris species. Kaggle. https://www.kaggle.com/datasets/uciml/iris
Wikimedia Foundation. (2024, April 8). Bar Chart. Wikipedia. https://en.wikipedia.org/wiki/Bar_chart
Wikimedia Foundation. (2024b, April 4). Box plot. Wikipedia. https://en.wikipedia.org/wiki/Box_plot
Wikimedia Foundation. (2024b, April 19). Histogram. Wikipedia. https://en.wikipedia.org/wiki/Histogram
Wikimedia Foundation. (2024a, March 26). Pie Chart. Wikipedia. https://en.wikipedia.org/wiki/Pie_chart
Statistical graphs. Unacademy. (2022, May 16). https://unacademy.com/content/ssc/study-material/statistics/statistical-graphs/