I basically did this for fun and primarily to put my data science skills into practice. This may be an unconventional use case but it is a start of applying concepts like statistics, wrangling, randomness, distributions, visualizations.
The summary of this project is to scrape lotto results online and store it into a spreadsheet which is currently my data holder (or database). Then from there, I can transform it and use it to extract information using visuals and infer from it better decisions on real betting.
The scope of this project is only for 6/42 Lotto held in the Philippines.
Right now, the whole thing's a mess and outdated due to time constraints. But I'm putting progress into it from time to time.
The data is scraped from the web specifically from the website provided in the diagram. The method I used is BeautifulSoup library and Selenium from Python.
The data is transferred into the spreadsheet where it was transformed beforehand to conform into a friendlier format i.e. XML to Tabular.
Those spreadsheets can be used for analysis and/or visualizations.
Visualizations are written inside Jupyter notebooks and shared online using Plotly Chart Studio.
Separate individual analysis are also written inside Jupyter notebooks.
Steps 1 and 2 are automated using Powershell. The script runs in my local environment where it checks if there's a latest result or not.
Notes:
The website I provided is not the sole source of the data. The historical data came from an amalgamation of websites that I scraped and consolidated all of it into one spreadsheet. The link in the diagram is the primary website for the latest results.
Currently, the whole framework is a mess. It's just a bunch of ideas thrown around with no centerpiece. Therefore, I would like to list out the future agenda that I'll implement in the near future (in high-level).
Fix visualizations and create a central dashboard for everything, that also covers choosing the right tool for it.
Create statistical analysis and find a way to print it on the web.
Fix code structure, probably an OOP framework.
Improve database system to reduce burden for Item #1.