Python

Project 1: Basic Statistics from Input

Code Python

On the left is a project I created outside of class in order to double check my work in my statistics class. It checks basic functions from input. It checks these things below

Central Tendency: Mean, Median, Mode, and a 10% Trimmed Mean (robust to outliers).

Dispersion: Minimum, Maximum, Range, Variance, Standard Deviation, Interquartile Range (IQR), and Coefficient of Variation (CV).

Distribution Shape: Skewness (asymmetry) and Kurtosis (tailedness).

Outlier Detection: Lower Inner Fence (LIF), Upper Inner Fence (UIF), and a list of potential outliers.

Under this is an example output of this code:

I have also made a similar project in Java, this demonstrates ideas such as descriptive statistics and exploratory data analysis (EDA)

Project 2: Data Visualization from Mock Statistics

On the left are mock statistics provided by one of my professors to help us improve our ability to wrangle data and, hopefully, by the end, create meaningful data visualizations using Python to identify trends and grow in our ability to code such projects. The image on the left is a small excerpt of the Excel spreadsheet; it spans thousands of rows, which is why it was chosen and contains extensive data points that can be used to create accurate graphs of all sorts.

The main code block of this project handles several important data visualization tasks using the pandas, matplotlib, and seaborn libraries. Here's a quick summary:

Data Loading: The project starts by importing the dataset into a pandas DataFrame.
Geographical Distribution: It then creates a scatter plot with matplotlib to visualize the geographic spread of housing in California. The plot uses longitude and latitude for the coordinates, with point colors representing the median house value and sizes based on population.
Income Distribution: Next, a histogram is generated using matplotlib to display the distribution of median income across households in the dataset.
Feature Correlation: Lastly, seaborn is used to generate a correlation heat map, showing the linear relationships between all numerical features, which helps in identifying how different variables are related.

Below is the code ran and the graphs that were created from it:

Data Visualization

Geographical Distribution: Housing values are highest along the coast and in urban centers, often correlating with higher population density.

Median Income Histogram: The distribution of median income is right-skewed, meaning most households have lower to mid-range incomes, with fewer in the high-income brackets.

Correlation Heat map: There's a strong positive correlation between median_income and median_house_value.

Page updated

Google Sites

Report abuse

Python

Project 1: Basic Statistics from Input

Project 2: Data Visualization from Mock Statistics

Contact at:

Cell Number - (205) 420 - 1586

Email - Dawson.Miskelly@gmail.com

Handshake - Dawson Miskelly