https://paperswithcode.com/area/natural-language-processing
Sentiment Analysis https://paperswithcode.com/task/sentiment-analysis
Named Entity Recognition https://paperswithcode.com/task/named-entity-recognition-ner
Fake News Detection https://paperswithcode.com/task/fake-news-detection
Course Description: Introduce python (environment, lambdas, csv files, numpy library, Data manipulation/cleaning using pandas- Series, DataFrame as data structures for data analysis, functions such as groupby, merge, and pivot tables).
Course Goal: take tabular data, clean it, manipulate it, and run basic inferential statistical analyses.
Prerequisites: Basic python or programming background.
Who is this class for: Intended for learners who want to apply statistics, machine learning, information visualization, social network analysis, and text analysis techniques to gain new insight into data. The class is taught in a tutorial format using the pandas library, and only a minimal statistics background is expected, and the first course contains a refresh of these basic concepts. Learners with a formal training in Computer Science but without formal training in data science will still find the skills they acquire in these courses valuable in their studies and careers.
Applied Plotting, Charting & Data Representation in Python
Course Description: Introduce information visualization basics, using the matplotlib library, design and information literacy perspective- what makes a good and bad visualization, and what statistical measures translate into in terms of visualizations. Demonstrate a variety of basic statistical charts helping learners to identify when a particular method is good for a particular problem. The course will end with a discussion of other forms of structuring and visualizing data.
Prerequisites: Take after Introduction to Data Science in Python and before the remainder of the Applied Data Science with Python courses: Applied Machine Learning in Python, Applied Text Mining in Python, and Applied Social Network Analysis in Python. Basic computer science background with minimal statistics background already covered in first course https://www.coursera.org/learn/python-data-analysis?authMode=login.
Who is this class for: Learners who want to apply statistics, machine learning, information visualization, social network analysis, and text analysis techniques to gain new insight into data and/or a tutorial of the matplotlib system.
Applied Machine Learning in Python
Course Description: Introduce applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods. How machine learning is different than descriptive statistics?, Introduce the scikit learn toolkit . The issue of dimensionality of data, and the task of clustering data, as well as evaluating those clusters. Supervised approaches for creating predictive models, and learners will be able to apply the scikit learn predictive modelling methods while understanding process issues related to data generalizability (e.g. cross validation, overfitting). The course will end with a look at more advanced techniques, such as building ensembles, and practical limitations of predictive models.
Course Goal: To identify the difference between a supervised (classification) and unsupervised (clustering) technique, identify which technique they need to apply for a particular data-set and need, engineer features to meet that need, and write python code to carry out an analysis.
Prerequisite: Basic python or programming background. Only minimal statistics background is expected. This course should be taken after Introduction to Data Science in Python https://www.coursera.org/learn/python-data-analysis?authMode=login and Applied Plotting, Charting & Data Representation in Python and before Applied Text Mining in Python and Applied Social Analysis in Python https://www.coursera.org/learn/python-plotting?authMode=login.
Who is this class for: This course is intended for learners who want to apply statistics, machine learning, information visualization, social network analysis, and text analysis techniques to gain new insight into data.
Course Description: This course introduce network analysis using the NetworkX library.
Prerequisite: Basic python or programming background. If you are not familiar with python, then this course should be taken after Introduction to Data Science in Python https://www.coursera.org/learn/python-data-analysis?authMode=login .
Who is this class for: This course is intended for learners who want to apply statistics, machine learning, information visualization, social network analysis, and text analysis techniques to gain new insight into data.
Analysis and visualization of very large networks using Pajek http://mrvar.fdv.uni-lj.si/pajek/
NodeXL support social network and content analysis https://www.smrfoundation.org/nodexl/
WoS2Pajek http://vlado.fmf.uni-lj.si/pub/networks/pajek/WoS2Pajek/default.htm
https://www.coursera.org/learn/algorithmic-thinking-1 will teach algorithmic efficiency and consider its application to several problems from graph theory. As the central part of the course, students will implement several important graph algorithms in Python and then use these algorithms to analyze two large real-world data sets. The main focus of these tasks is to understand interaction between the algorithms and the structure of the data sets being analyzed by these algorithms.
FakeNewsChallenge
https://remicnrd.github.io/Aspect-based-sentiment-analysis/
above uses csv files, you can convert xml to csv using http://blog.appliedinformaticsinc.com/how-to-parse-and-convert-xml-to-csv-using-python/
IBDB Moview reviews (Bag of Words vs Word Embedding) https://www.kaggle.com/c/word2vec-nlp-tutorial
Reuters News (Word Embedding) https://www.kaggle.com/hoonkeng/deep-eda-word-embeddings-sentiment-analysis
Exploratory Data Analysis
Tweets, https://www.kaggle.com/erikbruin/text-mining-the-clinton-and-trump-election-tweets
NyT Comments, Word Clouds, https://www.kaggle.com/aashita/word-clouds-of-various-shapes
News Headlines, ngrams, https://www.kaggle.com/gunnvant/what-india-reads-about-a-visual-essay/notebook
Pre-processing + Classification
Kaggle Challenges