This dataset includes CSV files that contain Tweet IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The tweets have been collected by an on-going project deployed at https://live.rlamsal.com.np.
Date range - 20th March to 18th November, separate file for each day.
Number of tweet ids per file (per day) - around 30-40 lakh
Hashtags and keywords - "corona", "#corona", "coronavirus", "#coronavirus", "covid", "#covid", "covid19", "#covid19", "covid-19", "#covid-19", "sarscov2", "#sarscov2", "sars cov2", "sars cov 2", "covid_19", "#covid_19", "#ncov", "ncov", "#ncov2019", "ncov2019", "2019-ncov", "#2019-ncov", "pandemic", "#pandemic" "#2019ncov", "2019ncov", "quarantine", "#quarantine", "flatten the curve", "flattening the curve", "#flatteningthecurve", "#flattenthecurve", "hand sanitizer", "#handsanitizer", "#lockdown", "lockdown", "social distancing", "#socialdistancing", "work from home", "#workfromhome", "working from home", "#workingfromhome", "ppe", "n95", "#ppe", "#n95", "#covidiots", "covidiots", "herd immunity", "#herdimmunity", "pneumonia", "#pneumonia", "chinese virus", "#chinesevirus", "wuhan virus", "#wuhanvirus", "kung flu", "#kungflu", "wearamask", "#wearamask", "wear a mask", "vaccine", "vaccines", "#vaccine", "#vaccines", "corona vaccine", "corona vaccines", "#coronavaccine", "#coronavaccines", "face shield", "#faceshield", "face shields", "#faceshields", "health worker", "#healthworker", "health workers", "#healthworkers", "#stayhomestaysafe", "#coronaupdate", "#frontlineheroes", "#coronawarriors", "#homeschool", "#homeschooling", "#hometasking", "#masks4all", "#wfh", "wash ur hands", "wash your hands", "#washurhands", "#washyourhands", "#stayathome", "#stayhome", "#selfisolating", "self isolating"
Hydration - We picked 9 days spanning a 2 month period and then extracted tweets using Twarc hydrator. The months chosen for analysis is August and September because many interesting things happened during this period and it is recent. We have collected tweets of the following dates:
1.csv - 3 August - 3810246
2.csv - 10 August - 3911443
3.csv - 17 August - 3680958
4.csv - 24 August - 3982835
5.csv - 31 August - 3725303
6.csv - 7 sept - 3428867
7.csv - 14 sept - 3590356
8.csv - 21 sept - 3647346
9.csv - 28 sept - 2828438
To get new confirmed cases for all the countries on a specified date, we made use of an API - https://corona-api.com/countries/{code} . We need to call this API using the alpha 2 codes for the country, and it will return the new confirmed cases datewise. From this data we extracted the data for above mentioned 9 dates.
We collected alpha 2 codes for all the country using the 'pycountry' library. The 'pycountry' library converts country name to alpha 2 and vice versa.
We used parseapi (https://parseapi.back4app.com/classes/Country) to collect the names of states and cities of some Major countries like USA, Canada, China, India, Australia, Russia, China, Brazil, Iran, South Africa, Spain, Germany, France, and UK .