Project

Project

Through the course of the term, students will work on a term long project in 3-person groups.

The objective of each project is to leverage rich and high-quality datasets to answer and address open problems in the health domain. Project tasks can include data mining, modeling, prediction, classification, etc. but most importantly, projects should aim to advance the state-of-the-art in research literature or practice.

To get started, see strategies and resources for finding a research dataset.

Deliverables

  1. In-class presentations (P1 & P5)

  2. A written report (this can be a publishable paper written to submit to a fitting venue or a report written strictly for this course). In either case, the paper should be written using a target venue's paper template and should follow the appropriate guidelines provided by a relevant journal/conference. See guideline below.

  3. A project website to document each project and progress. This website will serve as a final portfolio for the work that is done throughout the term. Some examples from previous terms are linked below:

Guidelines for Final Paper

Every project group is expected to write a final paper to share their research results and findings. There will be writing milestones due throughout the term (see schedule) to keep teams on task with writing. The primary publication venue is:

However, alternative venues can be selected based on the team project and preference. Other example venues are:

Important Notes:

  • Manuscript length: ~ 10 - 12 papers not including references. This guideline is for teams writing for ACM HEALTH for which no page limit is given.

  • Organization: Every manuscript must follow instructions provided by the selected publication venue. An example of submission guidelines for ACM HEALTH can be found here.

  • Template: All papers should use the appropriate template provided by the selected publication venue. An example of such a template for ACM HEALTH can be found here.

  • Reference Papers: It is always a good idea to have a few examples papers from the selected publication venue that can be used as a reference during the course of writing your own paper. Some example reference papers for ACM HEALTH can be found here.

  • LaTex: All final papers should be written using LaTex. Each project team should use Overleaf - an online, collaborative LaTex editor.

Project Milestones

There will be several milestones to track progress of the project throughout the term:

  • P1 (5%): Open Problems in ConditionX (week 2)

  • P2 (6%): Exploratory & Initial Analysis (~week 4)

  • P3 (9%): Introduction & Related Work sections (~week 5)

  • P4 (10%): Method & Initial Results (~week 7)

  • P5 (30%): Final Presentation & Final Paper (week 9)

P1 (5%) - Open Problems in ConditionX (in-class presentations, 12mins)

These presentations will be during the class period on Tuesday (4/5) and Thursday (4/7). Below is a guideline:

  1. Describe the background and underlying mechanism behind a certain health condition or group of conditions

  2. Describe the state-of-the-art for care/treatment/management etc.

  3. Identify 3 - 4 open problems that data science solutions can address

  4. Provide solid rationale and literature support for each problem identified

P2 (6%) - Exploratory & Initial Analysis

Exploratory data analysis (EDA) is a critical and often neglected step in data analysis. In this assignment, students should conduct appropriate EDA that is fitting for their dataset with the primary goal of understanding the dataset fully and identifying the types of research questions that are fitting to answer with that dataset. Some guidelines on conducting EDA can be found on the resources page.

Assignment Requirements:

  1. Write clear and clean code (in python using jupyter notebook/google colab) with appropriate comments and section titles to 5 - 10 descriptive figures for exploring various dimensions of your research dataset.

  2. Create your project website using a freely available service (e.g., google sites) that includes:

    • Title

    • Group Members

    • Objective (What is the goal of this project?)

    • Innovation (1-paragraph description on why this work is innovative, you must support this with citations/references to related work in literature)

    • Data Description (1-paragraph description of the dataset and its important features)

    • Exploratory Analysis (embed the written code from #1 here, either as a .pdf or directly on the website)

    • References (using an appropriate citation format)

  3. Submit a link to your project website via canvas.

    • Ensure that your website is publicly accessible through the link submitted, especially if you use google sites.

Need some inspiration?

See examples from previous terms below:

P3 (9%) - Introduction & Related Work

This assignment is the first milestone toward writing your written report/paper. Regardless of whether your team plans for a publishable paper or class report, this paper should be written using the appropriate template for a journal/conference in the space. See the Guidelines for the Final Paper for additional instructions.

Assignment Requirements

Write the Introduction & Related Work sections (~2 pages) of your written report/paper.

Guidelines on items to address are below:

  1. Why is the problem space important?

  2. What specific gap exists in the space?

  3. Describe related work in the space (~5 to 8 other papers that attempted to address the identified problem or similar problems in the space?

  4. What is your own research objective(s)?

  5. Provide a brief description (1 - 2 sentences) of how you plan to accomplish the stated objectives

  6. What are the key contributions of your work?

Note:

  • Numbers 1 - 3 above must be supported with references.

  • If writing a research paper is new to everyone in your project group, please reach out to the teaching team for additional guidance. We would be glad to help!

P4 (13%): Method & Initial Results

In this assignment, you will continue with implementing data science methods toward your project objective/goal. Then you will write the methods and results section of your paper (continuing with the template you used in P3).

Assignment Requirements:

  1. Write clear and clean code (in python using jupyter notebook/google colab) with appropriate comments and section titles to implement methods toward your project objective/goal.

  2. Embed the written code and result figures from #1 on your project website , either as a .pdf or directly on the website.

  3. Write the methods and results section of your project report/paper. See guidelines for writing each section below.

  4. Submit a .pdf of your paper on canvas.


Guidelines to Consider for the Methods section:

  • Start with a data description subsection. Describe key attributes of your dataset that are important for understanding your methods and results.

  • Break down your method into smaller components and describe each component in its own subsection. For example, data cleaning -> feature extraction -> feature selection -> classification.

  • Consider creating a flowchart of your full methodology and approach for analysis, starting from the raw data to the output. Include such a flowchart in your methods section.

  • Make sure to describe (although briefly) each out of the box method used. For example, if you use lasso regression, don’t simply assume the readers know what this is but start by describing lasso regression at a high-level and/or with equations then cite references where more details can be found.

  • Look at examples for other papers we have read in class or papers from student papers in DS4H last year (see the deliverables above).

  • Make sure you are writing in the format of a publishable paper and not a class project report. The language and style of these two are quite different.


Guidelines to Consider for the Results section:

  • Start with key result(s) or finding(s) from your analysis then move into the less significant results.

  • Summarize the full results using 1 or 2 tables and/or figures. Ensure these are legible, e.g. legible axis labels, etc.

  • Make sure you have text/paragraphs dedicated to describing what should be seen or the take-away from your tables/figures

  • Look at examples for other papers before writing papers and/or examples from student papers in DS4H last year (see the deliverables above).

P5: Final Presentation (10%)

The final presentations will be on Tuesday (5/24) and Thursday (5/26) during our regular class time. All presentations should be 12mins long with 3mins for Q&A.

The presentation should include the following:

  1. Title, Authors/Presenters

  2. Motivation/Background (why should the audience care?)

  3. Research objective (what is the specific goal of the work? why is it important?)

  4. Data Description/Summary (use text and visuals - this is a good place for some of your exploratory analysis)

  5. Methodology (think flowchart if there are multiple steps, give grounded rationale for the approach taken)

  6. Results (key findings/takeaways)

  7. Limitations/Challenges (including how you would address these in future work)

  8. Top learnings from the project experience

  9. References (on appropriate slides)

Things to consider:

  • The grading scaling is as follows: A+ (100%), A- (94%), B+ (88%), B- (82%), C+ (76%), C- (70%), Less than C-.

  • You are the authors of the paper, the researchers behind the work, the experts on the topic. Make sure to present accordingly.

  • You are not graded based on whether you achieved good/bad results. Instead, you are graded on the soundness of your approach, knowledge of the space, and ability to communicate the work.

  • Use good presentation practice (for example: slides should not be too busy with text and/or visuals, figures should be legible with clear axis labels and legends, etc.)

  • There is a strict time limit. It's a good idea to practice your talk before hand.

  • Have fun!!! If you're not enthused talking about your own work, then chances are the audience is not enthused listening.

  • Presentation order is TBD.

P5: Final Paper (20%)

The final paper is due on Friday (5/27). This should be a fully polished version that includes revisions based on comments received in prior submissions and other improvements that your team has identified.

The final paper should include sections for introduction, related work, methods and results per P3 & P4. In addition, this version should have a newly added discussion section. The discussion section should include implications of your results/findings, comparison with results from related studies, limitations of the work presented, and directions for future research.

As you finalize your paper, be sure to leverage examples such as this one, which accessible from the project website of students from a prior term (see the deliverable section above).

Additional Requirements:

  • Update your project website to include the final version of your written code (which must be well organized and commented) and your final paper.

  • Submit a link to your project website as a comment on Canvas

  • Submit a pdf of your final paper on Canvas