COURSE CODE : THEORY
CONTACT HOURS: 60 Hours
FULL MARKS: 35
Unit-1
Introduction to Statistics:
Statistics deals with the collection, analysis, interpretation, presentation, and organization of numerical data. It involves methods for collecting data from various sources, summarizing the data using descriptive statistics, and making inferences or predictions based on the data using inferential statistics. Statistics plays a crucial role in decision-making, research, planning, and problem-solving across diverse fields and industries.
Nature of Statistics:
The nature of statistics is characterized by its quantitative approach to data analysis and its reliance on mathematical methods and techniques. It involves systematic and objective methods for collecting data, organizing data into meaningful patterns, analyzing data to extract insights and trends, and drawing conclusions or making predictions based on data analysis. Statistics is both a science and an art, combining mathematical rigor with practical application in real-world contexts.
Importance of Statistics:
Statistics is essential for several reasons:
Data-driven Decision Making: Statistics provides a systematic framework for making informed decisions based on data analysis. It helps individuals, businesses, governments, and organizations evaluate options, assess risks, and prioritize actions using statistical tools and techniques.
Research and Analysis: Statistics is fundamental to scientific research, social research, market research, and data analysis in various fields. It enables researchers to design experiments, conduct surveys, analyze data, test hypotheses, and draw valid conclusions from empirical evidence.
Planning and Forecasting: Statistics is used for planning, forecasting, and predicting outcomes in areas such as economics, finance, marketing, healthcare, agriculture, and environmental studies. It helps organizations anticipate trends, identify opportunities, mitigate risks, and optimize resources.
Policy Development: Governments and policymakers rely on statistical data and analysis to formulate policies, monitor progress, evaluate programs, and make evidence-based decisions. Statistics informs public policy in areas such as education, healthcare, social welfare, and economic development.
Quality Control: Statistics plays a crucial role in quality control and process improvement in industries such as manufacturing, engineering, and healthcare. It helps monitor product quality, identify defects, analyze production processes, and implement corrective actions.
Relation with Allied Subjects:
Statistics is closely related to several allied subjects, including:
Mathematics: Statistics is a branch of mathematics and shares principles, concepts, and methods with mathematical disciplines such as algebra, calculus, probability theory, and linear algebra.
Economics: Statistics is widely used in economic analysis, econometrics, financial modeling, and economic forecasting. It helps economists analyze market trends, measure economic indicators, and evaluate policy impacts.
Psychology: Statistics is essential in psychological research, experimental design, and data analysis. It helps psychologists conduct experiments, analyze survey data, and draw conclusions about human behavior.
Sociology: Statistics is used in sociological research, social surveys, and demographic analysis. It helps sociologists study population trends, social phenomena, and patterns of behavior within society.
Business and Management: Statistics is integral to business analytics, data science, market research, and strategic decision-making in business and management. It helps businesses analyze customer data, predict market trends, optimize operations, and improve performance.
Uses of Statistics:
The uses of statistics are vast and diverse, including:
Data Analysis: Statistics is used to analyze data sets, identify patterns, calculate averages, measure variability, and summarize data using descriptive statistics.
Probability: Statistics deals with probability theory, including calculating probabilities, assessing risks, modeling random events, and making probabilistic predictions.
Inference: Statistics involves inferential methods for drawing conclusions, making predictions, testing hypotheses, and estimating parameters based on sample data.
Forecasting: Statistics is used for forecasting future trends, making projections, and predicting outcomes based on historical data and statistical models.
Decision Making: Statistics aids decision-making processes by providing quantitative insights, assessing uncertainties, evaluating alternatives, and optimizing choices based on statistical analysis.
Misuses of Statistics:
While statistics is a powerful tool for data analysis and decision-making, it can be misused or misinterpreted in several ways:
Misleading Interpretations: Statistics can be misinterpreted or manipulated to support biased or misleading conclusions. Cherry-picking data, using inappropriate statistical methods, or misrepresenting findings can lead to erroneous interpretations.
Sampling Biases: Improper sampling techniques or biased sampling methods can skew results and lead to inaccurate conclusions about populations or phenomena.
Correlation vs. Causation: Statistics can show correlations between variables, but it's essential to avoid inferring causation without sufficient evidence or rigorous experimental design.
Statistical Fallacies: Misunderstanding statistical concepts or falling prey to statistical fallacies (e.g., survivorship bias, regression to the mean, Simpson's paradox) can lead to faulty reasoning and incorrect conclusions.
Data Privacy and Ethics: Misuse of statistics can involve unethical practices such as data manipulation, data falsification, or violating privacy rights in data collection and analysis.
The different types of data along with examples for each type:
Primary Data and Secondary Data:
Primary Data: This type of data is collected firsthand through methods such as surveys, interviews, experiments, or observations. It is original and specific to the research or study at hand.
Example: Survey responses collected directly from customers about their preferences for a new product.
Secondary Data: Secondary data is collected from existing sources such as books, journals, databases, or official records. It is already available and was collected for a different purpose.
Example: Using census data published by a government agency to study population demographics.
Quantitative Data and Qualitative Data:
Quantitative Data: Quantitative data is numerical and measurable, allowing for mathematical analysis. It includes variables like height, weight, age, income, etc.
Example: Number of products sold in a month, temperature readings in Celsius.
Qualitative Data: Qualitative data is descriptive and non-numerical, providing insights into qualities, characteristics, or attributes. It includes data like opinions, preferences, colors, etc.
Example: Customer feedback comments about a service, types of animals in a zoo.
Discrete Data and Continuous Data:
Discrete Data: Discrete data consists of separate, distinct values with gaps between them. It often represents countable items or whole numbers.
Example: Number of students in a classroom, number of goals scored in a football match.
Continuous Data: Continuous data can take any value within a range and has no gaps between values. It is often measured on a scale.
Example: Height measurements, temperature in degrees Celsius.
Time Series Data, Spatial Series Data, and Cross-Sectional Data:
Time Series Data: Time series data is collected over time at regular intervals, allowing for analysis of trends and patterns.
Example: Monthly sales data for a year, stock prices over a week.
Spatial Series Data: Spatial series data is collected across different geographical locations or areas.
Example: Temperature readings across different cities, population density in various regions.
Cross-Sectional Data: Cross-sectional data is collected at a specific point in time and represents a snapshot or cross-section of a population or phenomenon.
Example: Survey responses collected from individuals at a particular moment, income levels of households in a city.
Ordinal Data and Nominal Data:
Ordinal Data: Ordinal data represents categories with a natural order or ranking but does not have equal intervals between categories.
Example: Rating scales (e.g., 1 to 5 stars for a product), education levels (e.g., high school, college, graduate).
Nominal Data: Nominal data represents categories without a natural order or ranking. The categories are purely qualitative.
Example: Types of fruits (e.g., apple, banana, orange), colors (e.g., red, blue, green).
Illustration with Examples:
Let's combine these types of data with examples:
Primary Data: Survey responses collected directly from customers about their satisfaction with a new restaurant (Quantitative and Qualitative Data).
Secondary Data: Using sales data from a company's database to analyze revenue trends over the past year (Time Series Data).
Discrete Data: Counting the number of students who scored A grades in a class (Ordinal Data).
Continuous Data: Measuring the temperature in degrees Celsius at different times of the day (Continuous Data).
Time Series Data: Tracking the monthly website traffic for a year to identify peak periods (Time Series Data).
Spatial Series Data: Comparing pollution levels across different neighborhoods in a city (Spatial Series Data).
Cross-Sectional Data: Conducting a survey to gather information about people's preferences for different smartphone brands (Cross-Sectional Data).
Ordinal Data: Ranking job satisfaction levels on a scale from "Very Satisfied" to "Not Satisfied" (Ordinal Data).
Nominal Data: Classifying cars based on their colors (Nominal Data).
Unit-2
Let's break down the concepts of data collection, including questionnaires, schedules, pilot surveys, designing methods, and outliers:
Questionnaire and Its Basic Characteristics:
A questionnaire is a data collection tool used to gather information from respondents. It consists of a set of questions designed to elicit specific responses related to the research objectives.
Basic characteristics of a questionnaire include clarity (clear and understandable questions), relevance (questions related to research objectives), neutrality (avoiding bias or leading questions), simplicity (concise and straightforward questions), and completeness (covering all relevant aspects of the topic).
Definition of Schedule and Pilot Survey:
A schedule is a structured form used to collect data from multiple sources or respondents. It typically includes a set of questions or items along with spaces for recording responses.
A pilot survey is a preliminary or trial survey conducted on a small scale to test the effectiveness of the questionnaire or schedule before implementing it on a larger scale. It helps identify potential issues, refine questions, and ensure data quality and reliability.
Designing a Questionnaire and Schedule:
Designing a questionnaire involves several steps:
a. Defining research objectives and determining the information needed.
b. Choosing appropriate question types (open-ended, closed-ended, scaled, etc.).
c. Structuring questions logically and sequentially.
d. Using clear language and avoiding jargon or ambiguity.
e. Pretesting the questionnaire with a small group to identify and rectify any issues.
Designing a schedule involves creating a structured format for data collection, including sections for demographic information, multiple-choice questions, Likert scales, and open-ended responses. It should be organized, easy to navigate, and user-friendly for respondents.
Concept of Outliers:
Outliers are data points that significantly deviate from the rest of the data in a dataset. They can distort statistical analysis and affect the accuracy of results if not addressed appropriately.
Outliers can occur due to measurement errors, data entry mistakes, natural variation, or extreme values in the data.
Identifying outliers involves statistical techniques such as box plots, z-scores, or visual inspection of data distributions. Once identified, researchers may choose to remove outliers if they are determined to be anomalies or errors, or they may analyze them separately to understand their impact on the overall data.
Scrutiny of data involves a thorough examination of data to ensure its accuracy, reliability, and validity. This process includes checking for internal consistency and detecting errors that may have occurred during data collection and recording. Here are the key aspects of scrutinizing data:
Checking Internal Consistency:
Internal consistency refers to the degree of agreement or coherence among different parts or elements of the data. It ensures that the data collected are reliable and free from inconsistencies.
Techniques for checking internal consistency include:
a. Cross-verification: Comparing data from different sources or variables to check for discrepancies or contradictions. For example, comparing responses to related questions in a questionnaire.
b. Reliability tests: Using statistical methods such as Cronbach's alpha (for scale-based questions) or inter-rater reliability (for subjective judgments) to measure the consistency and reliability of data.
c. Logic checks: Applying logical reasoning to identify illogical or implausible responses. For instance, checking for contradictory answers (e.g., age of a child being higher than the parent's age).
Detection of Errors in Collection and Recording:
Errors can occur at various stages of data collection and recording, including during survey administration, data entry, or data processing. Detecting and correcting these errors is crucial for maintaining data accuracy.
Common types of errors and methods for detection include:
a. Data Entry Errors: Double-entry verification, where data is entered twice by different individuals and compared for discrepancies.
b. Missing Data: Checking for missing values and determining if they are due to non-response, skipped questions, or data entry mistakes.
c. Outliers: Using statistical techniques (e.g., box plots, z-scores) to identify outliers that may indicate errors or anomalies in the data.
d. Measurement Errors: Reviewing data collection instruments (e.g., questionnaires, surveys) for clarity, completeness, and appropriateness of measurement scales.
e. Coding Errors: Checking data coding for accuracy and consistency, especially in qualitative data analysis.
Data Cleaning and Validation:
After identifying errors and inconsistencies, data cleaning involves correcting errors, resolving discrepancies, and validating data to ensure its integrity and reliability.
Techniques for data cleaning and validation include:
a. Imputation: Filling in missing data using statistical methods or imputation techniques based on patterns in the data.
b. Validation checks: Conducting thorough validation checks to verify data accuracy, completeness, and consistency.
c. Data transformation: Converting data into standardized formats, units, or scales to facilitate analysis and comparison.
d. Documentation: Maintaining detailed documentation of data cleaning procedures, including changes made and reasons for corrections.
Unit-3
Presentation of data involves conveying information in a clear, organized, and visually appealing manner to facilitate understanding and interpretation. Here are various methods of data presentation:
Textual Representation
Textual representation involves presenting data in written or descriptive form using paragraphs, bullet points, or lists.
It can include summaries, explanations, interpretations, and key findings derived from the data.
Textual representation is useful for providing context, background information, and detailed explanations alongside numerical data.
Examples:
In a strike call given by the trade unions of shoe making Industry in the city of Delhi, 50% of the workers reported for the duty and only 2 out of the 20 Industries in the city were totally closed.
Surveys conducted by Non-government organisation reveal that in the state of Punjab, area under pulses has tended to shrink by 40% while the area under rice wheat has tended to expand by 20% between the years 2001-2011.
Advantages:
The following are some of the advantages of using textual presentations:
It provides ample amount of Informations and Details.
Includes short but concise descriptions and explanation.
Effective for small data.
Provides the presenter and opportunity to explain things properly.
Disadvantages:
The methods of presentation is ineffective when the quantity of data is too large.
If the is not presented with proper facts and figures, it may lead to wrong analysis.
Tabular Representation
The systematic presentation of numerical data in rows and columns is known as Tabulation. It is designed to make presentation simpler and analysis easier. This type of presentation facilitates comparison by putting relevant information close to one another, and it helps in further statistical analysis and interpretation. One of the most important devices for presenting the data in a condensed and readily comprehensible form is tabulation. It aims to provide as much information as possible in the minimum possible space while maintaining the quality and usefulness of the data.
“Tabulation involves the orderly and systematic presentation of numerical data in a form designed to elucidate the problem under consideration.”
Objectives of Tabulation:
The aim of tabulation is to summarise a large amount of numerical information into the simplest form. The following are the main objectives of tabulation:
To make complex data simpler: The main aim of tabulation is to present the classified data in a systematic way. The purpose is to condense the bulk of information (data) under investigation into a simple and meaningful form.
To save space: Tabulation tries to save space by condensing data in a meaningful form while maintaining the quality and quantity of the data.
To facilitate comparison: It also aims to facilitate quick comparison of various observations by providing the data in a tabular form.
To facilitate statistical analysis: Tabulation aims to facilitate statistical analysis because it is the stage between data classification and data presentation. Various statistical measures, including averages, dispersion, correlation, and others, are easily calculated from data that has been systematically tabulated.
To provide a reference: Since data may be easily identifiable and used when organised in tables with titles and table numbers, tabulation aims to provide a reference for future studies.
Tabulation is a very specialised job. It requires a thorough knowledge of statistical methods, as well as abilities, experience, and common sense. A good table must have the following characteristics:
Title: The top of the table must have a title and it needs to be very appealing and attractive.
Manageable Size: The table shouldn’t be too big or too small. The size of the table should be in accordance with its objectives and the characteristics of the data. It should completely cover all significant characteristics of data.
Attractive: A table should have an appealing appearance that appeals to both the sight and the mind so that the reader can grasp it easily without any strain.
Special Emphasis: The data to be compared should be placed in the left-hand corner of columns, with their titles in bold letters.
Fit with the Objective: The table should reflect the objective of the statistical investigation.
Simplicity: To make the table easily understandable, it should be simple and compact.
Data Comparison: The data to be compared must be placed closely in the columns.
Numbered Columns and Rows: When there are several rows and columns in a table, they must be numbered for reference.
Clarity: A table should be prepared so that even a layman may make conclusions from it. The table should contain all necessary information and it must be self-explanatory.
Units: The unit designations should be written on the top of the table, below the title. For example, Height in cm, Weight in kg, Price in ₹, etc. However, if different items have different units, then they should be mentioned in the respective rows and columns.
Suitably Approximated: If the figures are large, then they should be rounded or approximated.
Scientifically Prepared: The preparation of the table should be done in a systematic and logical manner and should be free from any kind of ambiguity and overlapping.
A table’s preparation is an art that requires skilled data handling. It’s crucial to understand the components of a good statistical table before constructing one. A table is created when all of these components are put together in a systematic order. In simple terms, a good table should include the following components:
1. Table Number:
Each table needs to have a number so it may be quickly identified and used as a reference.
If there are many tables, they should be numbered in a logical order.
The table number can be given at the top of the table or the beginning of the table title.
The table is also identified by its location using subscripted numbers like 1.2, 2.1, etc. For instance, Table Number 3.1 should be seen as the first table of the third chapter.
2. Title:
Each table should have a suitable title. A table’s contents are briefly described in the title.
The title should be simple, self-explanatory, and free from ambiguity.
A title should be brief and presented clearly, usually below the table number.
In certain cases, a long title is preferable for clarification. In these cases, a ‘Catch Title’ may be placed above the ‘Main Title’. For instance, the table’s contents might come after the firm’s name, which appears as a catch title.
Contents of Title: The title should include the following information:
(i) Nature of data, or classification criteria
(ii) Subject-matter
(iii) Place to which the data relates
(iv) Time to which the data relates
(v) Source to which the data belongs
(vi) Reference to the data, if available.
3. Captions or Column Headings:
A column designation is given to explain the figures in the column at the top of each column in a table. This is referred to as a “Column heading” or “Caption”.
Captions are used to describe the names or heads of vertical columns.
To save space, captions are generally placed in small letters in the middle of the columns.
4. Stubs or Row Headings:
Each row of the table needs to have a heading, similar to a caption or column heading. The headers of horizontal rows are referred to as stubs. A brief description of the row headers may also be provided at the table’s left-hand top.
5. Body of Table:
The table’s most crucial component is its body, which contains data (numerical information).
The location of any one figure or data in the table is fixed and determined by the row and column of the table.
The columns and rows in the main body’s arrangement of numerical data are arranged from top to bottom.
The size and shape of the main body should be planned in accordance with the nature of the figures and the purpose of the study.
As the body of the table summarises the facts and conclusions of the statistical investigation, it must be ensured that the table does not have irrelevant information.
6. Unit of Measurement:
If the unit of measurement of the figures in the table (real data) does not change throughout the table, it should always be provided along with the title.
However, these units must be mentioned together with stubs or captions if rows or columns have different units.
If there are large figures, they should be rounded up and the rounding method should be stated.
7. Head Notes:
If the main title does not convey enough information, a head note is included in small brackets in prominent words right below the main title.
A head-note is included to convey any relevant information.
For instance, the table frequently uses the units of measurement “in million rupees,” “in tonnes,” “in kilometres,” etc. Head notes are also known as Prefatory Notes.
8. Source Note:
A source note refers to the place where information was obtained.
In the case of secondary data, a source note is provided.
Name of the book, page number, table number, etc., from which the data were collected should all be included in the source. If there are multiple sources, each one must be listed in the source note.
If a reader wants to refer to the original data, the source note enables him to locate the data. Usually, the source note appears at the bottom of the table. For example, the source note may be: ‘Census of India, 2011’.
Importance: A source note is useful for three reasons:
-> It provides credit to the source (person or group), who collected the data;
-> It provides a reference to source material that may be more complete;
-> It offers some insight into the reliability of the information and its source.
9. Footnotes:
The footnote is the last part of the table. The unique characteristic of the data content of the table that is not self-explanatory and has not previously been explained is mentioned in the footnote.
Footnotes are used to provide additional information that is not provided by the heading, title, stubs, caption, etc.
When there are many footnotes, they are numbered in order.
Footnotes are identified by the symbols *, @, £, etc.
In general, footnotes are used for the following reasons:
(i) To highlight any exceptions to the data
(ii)Any special circumstances affecting the data; and
(iii)To clarify any information in the data.
The following are the merits of tabular presentation of data:
Brief and Simple Presentation: Tabular presentation is possibly the simplest method of data presentation. As a result, information is simple to understand. A significant amount of statistical data is also presented in a very brief manner.
Facilitates Comparison: By grouping the data into different classes, tabulation facilitates data comparison.
Simple Analysis: Analysing data from tables is quite simple. One can determine the data’s central tendency, dispersion, and correlation by organising the data as a table.
Highlights Characteristics of the Data: Tabulation highlights characteristics of the data. As a result of this, it is simple to remember the statistical facts.
Cost-effective: Tabular presentation is a very cost-effective way to convey data. It saves time and space.
Provides Reference: As the data provided in a tabular presentation can be used for other studies and research, it acts as a source of reference.
Notes:
Different Types of Tables:
The tables can be categorised into various categories depending upon different aspects, such as the purpose, the nature of data used for the investigation, and the extent of coverage of the table. The following are the various kinds of tables that are commonly used in studies of statistics.
There are two kinds of tables based on the objective or purpose:
1. General Purpose Table:
A General Purpose Table covers a variety of information on a particular subject and shows the raw data in complete detail. For instance, the table provided in the census report. It is also referred to as a Reference Table or a Repository Table.
A general purpose table does not have any specific analytic objective while presenting the data. These tables are often big in size and are provided as a reference in the appendix. There are several uses of these tables. They are commonly used in the departmental reports of the government.
2. Special Purpose Table:
These tables offer information specific to a particular inquiry. For instance, the profit/loss figures of the business over the years. It is also referred to as a Text Table, Summary Table, or Analytical Table. These tables, which present the findings of data analysis, are usually quite concise.
There are two kinds of tables:
1. Original Table:
This style of the table does not round off its figures; instead, it presents statistical data in its original format. It is often referred to as the Primary Table or the Classification Table. An original table includes information that was first gathered from the original (primary) source.
2. Derived Table:
A Derived Table is a table that displays findings derived from the original data, such as averages, percentages, ratios, etc. It is often referred to as a Derivative Table. A derived table shows the information generated/derived from the primary or the original tables.
There are two kinds of tables:
1. Simple Table:
In this type of table, a single characteristic is used to present the data. It is the simplest type of table and is often referred to as a First Order Table or a One-way Table. These are used to show the univariate frequency distribution because they examine only one variable.
For instance, The table displays the number of students enrolled in every section of B.Com.
2. Complex Table:
A complex table displays data in accordance with two or more characteristics. The complex table can be classified into three parts based on characteristics:
(i) Two-way Table (also known as Double Table): It provides details on two characteristics of a certain phenomenon that are interrelated to each other.
For instance, the table would change to a two-way table if the number of students in B.Com in every section was further divided by Morning and Night shifts
(ii) Three-way Table (also known as Treble Table): It provides details on three characteristics of a certain phenomenon that are interrelated to each other.
For instance, the table would change to a three-way table if the number of students in B.Com in every section contains information regarding morning and night batches, further classified by gender.
(iii) Manifold Table: A manifold table is a table that explains more than three characteristics of the data. These tables offer information on a wide range of phenomena that are interrelated with each other. This is the most complicated type of table.
For instance, the table would change to a manifold table if the number of students in B.Com in every section contains information regarding morning and night batches, further classified by gender, family income, and housing.
Diagrammatic representation
Diagrammatic representation refers to the use of visual elements such as lines, bars, circles, and symbols to represent data, information, or concepts graphically. It is a powerful tool for conveying complex ideas, relationships, trends, and patterns in a clear and understandable manner. Diagrammatic representation can be used across various disciplines, including statistics, mathematics, science, business, and education. Here are the details about different types of diagrammatic representation commonly used:
Line Diagrams (Line Charts):
Line diagrams use lines to connect data points plotted on a graph. They are used to illustrate trends, changes over time, and relationships between variables.
Key features:
X-axis: Represents the independent variable (e.g., time, categories).
Y-axis: Represents the dependent variable (e.g., numerical values, measurements).
Data points: Represent specific values plotted at corresponding points on the graph.
Line diagrams are effective for showing continuous data series, identifying patterns, and visualizing changes or fluctuations.
Multiple Axes Diagrams:
Multiple axes diagrams have more than one axis (X or Y axis) on the same graph, allowing for the representation of multiple datasets or variables with different units of measurement or scales.
Key features:
Multiple X-axes or Y-axes: Each axis represents a different variable or dataset, enabling comparison and correlation analysis.
Different scales: Each axis may have its own scale or measurement units.
Multiple axes diagrams are useful for comparing related but distinct datasets, highlighting relationships, and identifying correlations or trends.
Bar Diagrams (Bar Charts):
Bar diagrams use rectangular bars to represent data values, with the length or height of each bar proportional to the data it represents.
Key features:
Horizontal bar diagrams: Bars extend horizontally from the y-axis, making it easy to compare values across categories.
Vertical bar diagrams: Bars extend vertically from the x-axis, suitable for comparing values within categories or groups.
Grouped or stacked bars: Multiple bars can be grouped or stacked within each category, representing different datasets or variables.
Bar diagrams are effective for comparing quantities, frequencies, proportions, and percentages across categories or groups.
Pie Diagrams (Pie Charts):
Pie diagrams (or pie charts) use slices of a circle to represent proportions or percentages of a whole.
Key features:
Slices of the pie: Each slice represents a portion or category, with the size of the slice proportional to its share of the whole.
Total circle: Represents 100% or the whole dataset.
Pie diagrams are effective for showing composition, distribution, shares, and proportions within a dataset.
Other Diagrams:
Scatter plots: Use dots or markers to represent individual data points, showing the relationship between two variables.
Histograms: Use bars to represent frequency distribution of continuous data, showing the distribution of values within intervals or bins.
Area charts: Similar to line charts but fill the area under the lines, emphasizing cumulative totals or trends over time.
Diagrammatic representation plays a crucial role in data analysis, decision-making, communication, and storytelling. Choosing the appropriate type of diagram depends on the nature of the data, the message to be conveyed, and the audience's understanding and preferences for visual presentation.
Unit-4
There are four types of frequency distributions:
Ungrouped frequency distributions: The number of observations of each value of a variable.
You can use this type of frequency distribution for categorical variables.
Grouped frequency distributions: The number of observations of each class interval of a variable. Class intervals are ordered groupings of a variable’s values.
You can use this type of frequency distribution for quantitative variables.
Relative frequency distributions: The proportion of observations of each value or class interval of a variable.
You can use this type of frequency distribution for any type of variable when you’re more interested in comparing frequencies than the actual number of observations.
Cumulative frequency distributions: The sum of the frequencies less than or equal to each value or class interval of a variable.
You can use this type of frequency distribution for ordinal or quantitative variables when you want to understand how often observations fall below certain values.
Ungrouped Frequency Distribution Table
Grouped Frequency Distribution Table
Cummulative Frequency Distribution Table