I like the formatting.
Title
- I would remove the doing in your title making it "Predictive Analytics with Linear Regression"
Problem Statement
- Should all of "US spatial wildfires occurrence dataset" be capitalized if it is a name?
- Make ; a , in "for large fire
- events; get ahead of"
- Are you only using the date feature? Your network is not being trained on anything else?
Predictive Analytics
- What is Predictive Analytics
- Curious about the difference between data-driven and statistical approach. Isn't statistics determined from data
- remove repeated phrase in "In predictive analytics, the data is used to create a model, the data is used to create model-defining parameters, weights, or coefficients.
- I would use "it" more often. I think you say "predictive analytics" too much
- I don't think you need the semicolon here "different techniques, such as; predictive modelling,"
- Origin
- use the phrase "predictive analytics" less often
- Methodology and Applications
- "Predictive analytics methods" vs "Predictive analytic techniques" choose one version for analytics in these types of phrases. "s" or no "s"
- check throughout the paper
- I would put colon over semicolon here "they all follow;"
- put comma between deployed an it in "once a model is deployed it must be"
- Should it be known instead of know? "Parametric models assume know distributions in data."
- It is not clear to me what parametric models are
- I would turn "have no assumption about data," into "have no assumptions about the data"
- This sentence does not make sense to me "Predictive analytic algorithms can also be grouped by similarity; regression, instance-based, regularization, decision tree, clustering, Bayesian for example [4]."
- I would put this sentence in the origin section "The field got its start within the business field, and was used then and now for consumer behavior predictions, market analysis, economic predictions and more [7]."
- I would make the comma a period and split the two sentences. "limited to business purposes though, the city of Chicago has had"
Linear Regression
- Overview
- I would say the "model is simple, and as a result ..." in "is a model known for its simplicity, and as result it is known as one of the easiest"
- Make one sentence by turning first period into a comma: "With x as the input and y as the output. This formula is the one used for simple linear regression."
- I would put a caption on the image
- I would make the dash a comma "over-fitting – when a model is too"
- History
- Applications
- I think I would end this section saying why you choose Linear regression or clearly stating why you think it is a good choice for your problem
Related works
- wildfire predictions
- Should this be capitalized? "Predictive Services"
- I would mention what methods they use and how effective it was
- Have you described the dataset yet at this point?/what data are you talking about"the data described above from past fire sizes and"
- I don't get the second half of this sentence "However, it does not appear that they use linear regression, or the data described above from past fire sizes and dates have been used to make long term predictions."
- dataset
- How successful were those projects?
Methodology
- goals
- I wouldn't say answer the prediction question here, I would define it again: "and from scratch that is able to answer the prediction question."
- You say goals and use are but only state one goal in this sentence: "Our main goals for this project are to first implement basic linear regression using both a package – scikit-learn – and from scratch that is able to answer the prediction question."
- I don't really like this set up with the semicolon and then just stating the first goal: "As a group our three goals are as follows; first to conduct a comparative analysis" I don't know if my dislike is justified though
- I would remove "we are studying" from "how the different algorithms we are studying would compare when"
- linear regression
- most commonly used for what? "because it is most commonly used."
- This sentence seems to casual and typo at the end "The trend of the data seemed as if it could be linear, so we decided to test it and see it it was."
- dataset
- testing the fit
- I would remove this sentence "Underfitting is the opposite of the overfitting concept described above."
- run on sentence. maybe put and instead of comma: "Overfitting is when a model is too closely fitted to a dataset, underfitting is when the model doesn’t fit the data enough."
- I had trouble reading this sentence: "This line of testing will ensure that our algorithm is first working
- properly through comparison, over and under-fitting, then finally allow for fine tuning and improvement after building a solid base model."
- I would still also like to know why you are looking at accuracy to judge your data and not recall, precision, f1 score, etc.
- cleaning the data
- I would remove the words line and column and replace with the machine learning terms: "and if the line had data in both columns we needed,"
- define what columns you were looking for: "had data in both columns we needed,"
- What does this mean?: "and added them into appropriate lists for each variable."
- progress
I think you can shorten the section about predictive analytics. I don't think it added enough to the paper for it to be that long and I got a little bored while reading it. I think your linear regression section is the more important part.