Week of 8/31
Began programming a data scraper to collect common data set spreadsheets. Had difficulties with hardware platforms, library compatibility, and library installation. However, all necessary components have been installed properly now, and am planning on finishing it on Sat. Nov. 5.
Week of 10/7
I switched the website I will be pulling college sites from. Also added a tag finder to try to pull only the links I need from the site data.
11/14
Looking at college websites and where they placed their common data set downloads, I realized it might not be possible to pull them just from their website home page. Going to research deeper into how this can be done.
11/18
Found a data set with college data available to download here. Very thankful I found this as using a data scrape to get data from all schools was looking more impossible by the day. Though the set is not complete it should be good enough for me to begin working with.
Data Scraper will be scrapped. I will now move on to data processing.
11/21
Began looking at reading and extracting data from these PDF documents. Will need to decide whether or not to use python libraries and scripts or Optical Character Recognition (OCR) since the CDS is done in a table format making it potentially difficult.
11/23
decided to just convert them using an online engine. Would be easier and more accurate than the inconsistent and unreliable output of a script.
11/28
Goal for the week - Convert and parse all of my collected data from PDFs into a usable format - hopefully looking to be able to move on to data sorting by the end of the week.
12/2
Found a website that can reliably convert the tables to excel sheets. I have to continue to convert the files one by one.
12/9
Finished converting all pdfs - Will next have to find out how to process the excel sheets and the data using scripts
12/12
Began looking to find out how to deal with all of my datasets as one in order to likely create lists and dictionaries separated based on data type.
12/20
Initial framework of script is written. Needs finalization and execution. will then move on to building a search engine.
1/5
Upon review of the datasheets converted, they're formats did not convert properly into a coherent and consistent format. Attempting to write scripts and parse data would be unsuccessful. I am instead going to build a website in order to have a product to show, and then might manually make datasets using schools in NY by hand.
1/9
Have scrapped my data collection - will now manually collect data sheets from new york schools, and make a small data set using back4app. I will later create a website showing my sample concept using that data.
1/17
This week's goal was to collect common data set spreadsheets from sample New York to collect data from manually later on. I currently have 11 schools, and will download them and pull data from them next week.
2/4
Continuing to collect data from CDS
I am about halfway through the collection.
2/10
Data collection is now complete
Will start building my website and its framework
2/17
Began writing my script, which is currently just the skeleton without function.
3/3
Continue working on data sorting algorithm. Want to have divided lists to read from by the end of the week. Need to properly learn how to use the library I imported (openpyxl)
3/17
Fixed my initial syntactical errors and typos. I have also solved all the errors thrown that were caused by improper importing of variables and libraries. I am now trying to get my program to actually read my excel sheet file.
3/24
This week I hope to finish my first sort algorithm, and begin working on the user and how my programs will act with user input.
I have also started my poster, and put in most of what I can as of right now.
3/30
Data and initial data script is complete. However, I still need to verify its functioning. Otherwise, the resetting of my chromebooks data has been very frustrating, but my project should be able to move along soon.
4/21
My goal this week is to finally finish writing my program and get the damn thing to work on my pc at home (since this chromebook is shooting my foot). I hope to begin my website by monday.