In this project, I demonstrate a variety of approaches to cleaning a dataset with SQL. I've included the Nashville Housing dataset used in the project in the GitHub repository linked below. I converted date format, self joined the table to identify data that could populate null values, broke out addresses into separate columns using SUBSTRING and PARSENAME functions, updated tables, used CASE statements to standardize entries, set up a CTE to identify and remove duplicates, and deleted unused columns.
In this project, I review the built-in R data set txhousing to build and evaluate simple and multiple regression models.