My notes:
- Maybe talk more in depth about predictive analytics
- I am not sure why but I don't like the transition between the first and second paragraph.
- remove all my and I
- I would say a good, or better adjective in " allowed me to get an understanding" so it sounds like you know what you are talking about.
- I would either give each reason to do linear regression its own paragraph or combine into 1, not 2 paragraphs
- Maybe go into why linear regression is flexible.
- I would explain more about what linear regression is
- I know what linear regression is and I didn't get much understanding in the paragraph "It models a dependent ..."
- I would try to up the wording a little bit to sound more professional, like it removing "a lot" for something that sounds better or >70000 or something like that
- Instead of listing what the data has, I would make a table with the name, type,range, etc
- Is there a reason you are specifically looking into if fires have worsened and not improved or stayed the same
- I would pick a concrete date to start the testing set and not say the past 2 or so years
- I would use machine learning terminology like test set, training set, accuracy, recall, etc
- instead of 1 million lines, 1 million training and test instances (or whatever the proper term is)
- instead of columns, features
- Be more specific on how you will analyze the algorithm and why you chose these: recall, accuracy, precision, f1 score, etc
- Will you have a validation set as well or use cross-fold validation?
- I would find a better way to say parts of the data did make any sense
- Maybe find more scientific reason that you chose Python over R, this might not be necessary though
- What is the currently known prediction rate? Is that prediction rate on fires? What creates the predictions?
- Maybe explain more about why/how you are comparing algorithms trained and test on different data sets
- Over all whenever you say analyse, I would say exactly how you are going to do that.
I thought your paper was good overall. I would just work on being more descriptive and use more scientific/machine learning terminology.