Extra Activity 2

I've acquired the 'COVID-19 in India' dataset from Kaggle and leveraged it for in-depth analysis

Covid 19 India

let's identify the Type of variable:

Categorical Variables: These variables represent categories or groups and have no inherent order or numerical value

Date: Categorical Variable. While dates can be considered numerical, in this context, they are likely treated as categories since each date represents a distinct point in time.
Time: Categorical Variable. Similar to date, time is often treated categorically in datasets like this, representing specific points in time.
State/Union Territories: Categorical Variable. States and Union Territories are distinct categories and do not have a numerical order.

Numerical Variables: These variables represent measurable quantities and can be treated as discrete or continuous

Cured: Numerical Variable. It represents the count of individuals who have recovered from the condition, making it a quantitative variable.
Deaths: Numerical Variable. Like "Cured," it represents a count of events and is a quantitative variable.
Confirmed: Numerical Variable. This variable represents the count of confirmed cases and is quantitative.
Sno (Serial Number): Numerical Variable. It represents a unique identifier for each record, and it is quantitative and discrete.

let's identify the scale of measurement for each variable:

Interval Scale:
- S no (Serial Number): Represents a consistent numerical order, but the difference between serial numbers may lack a meaningful interpretation.
Nominal Scale:
- Date: Represents categories with no inherent order or numerical significance.
- State/Union Territories: Represents distinct categories with no inherent order.
Ordinal Scale:
- Time: If represented in categories like "morning," "afternoon," etc.
Ratio Scale:
- Cured: Counts with a true zero point, ratios between values are meaningful.
- Deaths: Similar to "Cured," with a true zero point and meaningful ratios.
- Confirmed: Also a ratio scale with a true zero point and meaningful ratios.

Let's Identify the discrete and continuous variable in the dataset

Discrete Variables:

- Sno (Serial Number): Represents unique identifiers, typically whole numbers without meaningful values in between.

Continuous Variables:

- Date: While often treated as discrete, it can be considered continuous when measuring time intervals.

- Time: Can be treated as continuous, especially in formats like hours, minutes, or seconds.

- Cured: Represents counts, discrete; if measuring recovery rates, it could be considered continuous.

- Deaths: Represents counts, discrete; if measuring death rates, it could be considered continuous.

- Confirmed: Represents counts, discrete; if measuring rates or proportions, it could be considered continuous.

Page updated

Report abuse