Progress reports

Week 1: Established budget, meeting schedule, and platforms to share a work space (Google Drive, GitHub)

Week 2: Have shifted focus from Alzheimer's and dementia to depression and mood disorders. There is a plethora of data on the Internet about depression, much of it being firsthand experience from individuals diagnosed with it. We will be collecting these datasets as material for our AI to undergo supervised learning.

Week 3: Begin development of potential user survey, meeting set up with doctor at counseling center (CAPS), researching depression treatments and natural language processing

Best course of action for the NLP is Word2Vec (semantic analysis based on context):
- https://skymind.ai/wiki/word2vec
- https://medium.com/arvind-internet/applying-word2vec-on-our-catalog-data-2d74dfee419d

Week 4: Appropriate data sets found on both Twitter and Reddit, additional resources and data pending approval by Dr. Chandramouli

Awaiting approval on RSDD (Reddit Self-reported Depression Diagnosis) data set access: https://georgetown-ir-lab.github.io/emnlp17-depression/

Week 5: Survey approved by Stevens Head Counselor, Dr. Rose, and published. Began collecting responses.

Week 6: Survey reached over 225 responses, giving us a wealth of data and opinions on professional mental healthcare from potential users

https://docs.google.com/forms/d/e/1FAIpQLSdJ-2Lfln5AIXvmV909RrGGSC53BUK43pk_eLMXipOovC4nWQ/viewform?usp=sf_link

Week 7/8: Research conducted on external resources, as well as final product's potential competition. Practicing utilization of components of the NLP.

Week 9/10: Pulled data from Reddit using Reddit API. Filtered emojis and other non-unicode text from data. Used NLTK Parts of Speech (POS) tagging to tag parts of speech from Reddit posts from r/depression, r/SuicideWatch, and other subreddits relating to depression, anxiety or other mood disorders. Using multi-dimensional scaling to visualize the POS tagging to observe relationships. In the first graph below, we plotted the relationships in use of langauge between the varying subreddits, and compared them to those of "id-depression" - posts in which the users said they had been diagnosed with depression. In the second graph below, we took each of those users and compared their language use in all of their posts (in any subreddit) and compared it to language used in other subreddits.

Weeks 10-13: Continue to draw conclusions from MDS. Collect more data from Reddit. Train gensim Word2Vec model in preparation to see what issues we will run into upon development of our own word embedding model.

Issues include:

Spelling mistakes by users
Negation not taken into account

Winter intercession: Begin work on front end for iOS and Android devices

Weeks 14-20: Split into individual assignments, as follows:

Athina: Back-end topic analysis integration and refinement
Charles: iOS front-end design and implementation
AJ: Android front-end design and implementation