Precision Agriculture is revolutionizing modern farming by enabling data-driven decisions. This dataset offers insights for building predictive models to recommend suitable crops based on various environmental parameters.
Original Dataset :- Crop Recommendation Dataset (kaggle.com)
Categorical Variables:
Crop: Categorizes data into different crop types without any inherent order.
Season: Defines the growing season (Kharif, Rabi, Zaid) with no inherent order.
Soil Type: Represents different soil types (Clay, Loamy, Sandy, etc.) with no order.
Numerical Variables:
Nitrogen Ratio: Quantifies nitrogen content in the soil on a ratio scale.
Phosphorous Ratio: Measures phosphorous content, also on a ratio scale.
Potassium Ratio: Represents potassium content, measured on a ratio scale.
Temperature: Recorded in degrees Celsius on a ratio scale.
Humidity: Expressed as a percentage on a ratio scale.
pH: Represents the pH level of the soil on a ratio scale.
Rainfall: Measured in mm, also on a ratio scale.
Central Tendency:
Crop: The mode represents the most frequent crop type. This is useful for identifying the most commonly grown crops in the dataset.
Season: The mode represents the most common growing season. This helps to determine which season is most frequently recorded in the dataset.
Soil Type: The mode represents the most frequent soil type. This identifies the most commonly occurring soil type in the dataset.
Dispersion:
Nitrogen Ratio: The mean, median, and mode provide measures of central tendency. The range, variance, and standard deviation indicate the spread and dispersion of nitrogen ratios from the mean. For more analysis and description, refer to subsheet 'Variable 4' of the Dataset sheet.
Phosphorous Ratio: The mean, median, and mode are used to summarize the central values of phosphorous ratios. The range, variance, and standard deviation describe how phosphorous ratios vary from the average. For more analysis and description, refer to subsheet 'Variable 5' of the Dataset sheet.
Potassium Ratio: The mean, median, and mode provide insights into the central values of potassium ratios. The range, variance, and standard deviation measure the dispersion of potassium ratios. For more analysis and description, refer to subsheet 'Variable 6' of the Dataset sheet.
Temperature: The mean, median, and mode summarize temperature data, while the range, variance, and standard deviation show temperature variability. For more analysis and description, refer to subsheet 'Variable 7' of the Dataset sheet.
Humidity: The mean, median, and mode represent central values of humidity, with range, variance, and standard deviation indicating dispersion. For more analysis and description, refer to subsheet 'Variable 8' of the Dataset sheet.
pH: The mean, median, and mode provide central tendencies for pH levels. The range, variance, and standard deviation reflect the dispersion of pH values. For more analysis and description, refer to subsheet 'Variable 9' of the Dataset sheet.
Rainfall: The mean, median, and mode give central values for rainfall amounts, while the range, variance, and standard deviation describe the variability in rainfall. For more analysis and description, refer to sub sheet 'Variable 10' of the Dataset sheet.
Crop And Soil
Association:
(i) Contingency Table
(ii) Row Relative Contingency Table
Reason:
(i) We can generate a contingency table based on the counts of the variables "Crop" and "Soil Type". Analyzing this table reveals how different crops are distributed across various soil types.
(ii) The row relative contingency table helps us understand the proportion of each crop type across different soil types, highlighting any significant preferences or distributions.
Temperature And Humidity
Association:
(i) Scatter Plot
(ii) Covariance
(iii) Correlation Coefficient
Reason:
(i) The scatter plot shows a weak positive association between Temperature and Humidity. This suggests that as Temperature increases, Humidity tends to increase slightly.
(ii) The positive covariance of 23.14 indicates a positive relationship between Temperature and Humidity, meaning that an increase in one variable is associated with an increase in the other.
(iii) The correlation coefficient of 0.21 reflects a weak positive correlation between Temperature and Humidity. This implies that there is only a slight relationship where higher values in one variable tend to be associated with higher values in the other.
The Updated Crop Recommendation Dataset from Kaggle supports precision agriculture by analyzing crop recommendations based on environmental factors. It includes categorical variables (Crop, Season, Soil Type) and numerical variables (Nitrogen Ratio, Phosphorous Ratio, Potassium Ratio, Temperature, Humidity, pH, Rainfall). Central tendency is summarized using mean, median, and mode, while dispersion is assessed through range, variance, and standard deviation. Associations between variables like Crop and Soil Type are analyzed using contingency tables, while Temperature and Humidity are examined through scatter plots, covariance, and correlation coefficients. Detailed insights are available in the dataset’s sub sheets.