Project guidelines

Final projects will entail original investigation into any area of computer vision defined very broadly, or a focused literature review in a topic from such an area. That means that machine learning over visual data, HCI, computational photography, computer graphics, language-vision interfaces, computer vision applied to domains such as medical images, and so on, are all acceptable topics in addition to the core computer vision topics.

Scope

As a broad target, the final project should involve approximately as much work as two mini-project assignments for each student in the group. This year we are restricting the maximum size of each group to two. Thus, the total work should scale roughly linearly with the group size, and be distributed roughly equally. Similarly, multi-purpose projects which are being submitted for multiple classes should scale with the number of classes involved. An ambitious, well-done project from a group of two (or shared between two or more classes) should be on the order of a conference paper in depth of experimentation. I encourage you to tackle large problems in groups, for multiple classes, or both.

Note that in order for a project to span multiple classes you need prior approval from all instructors.

Milestones

  • Oct 10: Abstract due

  • Dec 1: Project presentation (10am - noon ET) [New]

  • Dec 7: Final reports due

The abstract is just a short paragraph or two telling me who is in your group, describing the problem you've chosen, sketching the general approach you intend to take, and stating the kinds of data you're using. If you haven't already spoken to me about project ideas, you may want to stop by my office hours or to make an appointment before this point. The abstract mainly serves to give me a chance to make sure you're on a good path and who is doing what. Abstracts will have to be uploaded to Gradescope as a single pdf file. One submission per team is sufficient.

Towards the end of the class each team will make a short presentation or a poster describing their intermediate results. An important skill in research is to be able to tell in a week or two in advance whether your ideas are going to work, well before you've fully done all the engineering and experiments.

The final write-up should be on the order of 6-8 pages, describing your approach, results, data analysis, and so on. The initial abstract is a required checkpoint, but you will receive the bulk of the points at the end, based on your final write-ups. Take a look at the detailed rubric below.

Under normal circumstances, all group members will receive the same grade for the final project. Late days will not apply to the final reports. I have to get your grades in to the university, and I'm already giving you as long as I possibly can.

Grading (23% Total)

  • Abstract: 2%

  • Final report: 18%

    • write-up: 6%

      • clarity, structure, language, references: 2%

      • background literature survey, good understanding of the problem: 2%

      • good insights and discussions of methodology, analysis, results, etc.: 2%

    • technical: 7%

      • correctness: 3%

      • depth: 2%

      • innovation: 2%

    • evaluation and results: 5%

      • sound evaluation metric: 2%

      • thoroughness in analysis and experimentation: 2%

      • results and performance: 1%

  • Poster/Presentation: 3%

Ideas

You are welcome to come up with your own topics -- some of you already may have done so. Take a look at the the resources listed at the end of this page for potential topics. You are also welcome to come by my office hours to get ideas from me.

Project resources

Some ideas:

  • Organizing personal photo collections. Think of all the photos you take on your mobile phone. What is a useful way of browsing and searching such a collection?

  • Better field-guides to categorize animals and plants using computer vision. Here is one for identifying tree species http://leafsnap.com. Take a look at the Fine-grained Visual Recognition workshops from recent years (http://fgvc.org/).

  • Detecting interesting events in ego-centric cameras, e.g., GoPro. How can you tell when something interesting happens in the video stream?

  • Analyzing architecture – what cities are similar to Chicago in terms of the style of buildings?

  • Analyzing 3D dataset collections – how can you retrieve a 3D model from a computer graphics database using a photo? There are many 3D models available for download at https://3dwarehouse.sketchup.com. You might want to focus on a sub-category, say, airplanes.

  • List of computer vision datasets: http://www.cvpapers.com/datasets.html.

  • A list of project ideas from Serge Belongie: http://cseweb.ucsd.edu/classes/wi06/cse190a/projects.html

  • There are a number of computer vision startups with wide range of applications. These include sports replays, such as the “Goal-line” technology used this year in the FIFA world cup, medical applications, robotics, industrial inspections, etc. David Lowe maintains a (somewhat outdated) list of computer vision applications in the industry: http://www.cs.ubc.ca/~lowe/vision.html

  • https://paperswithcode.com/ shows you the state of the art across various datasets among other things.

  • https://registry.opendata.aws/ contains a large number of publicly available datasets hosted on AWS. These include satellite imagery, RADAR and other data on which you can try out some computer vision techniques.

  • Explore the use of computer vision services on the cloud to solve some challenging problems. Some choices are https://aws.amazon.com/rekognition/, https://cloud.google.com/vision, https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/

  • Latest papers from CVPR, ECCV, ICCV, NeurIPS, ICML, etc.

A sample of projects from a prior course offering:

  • Scene text recognition

  • Improving object detection using depth estimation

  • Dust removal from images

  • Fast face-retrieval using vocabulary trees on deep features

  • Hyperspectral image classification

  • Character recognition in movies

  • Could motion analysis

  • Analysis of medical images

  • Stereo reconstruction survey

  • Counting heads in images

  • Implementation of a VR engine

  • Poselet based person identification

  • Gaze tracker

  • Photo stitching across seasons/day-night

  • Segmentation using CNNs

  • 3D Sketching

Computing Resources

Some vision projects may involve large scale data and require GPU computing resources. We recommend you to check out "AWS Education" and "Google Cloud Platform".

  • AWS: https://aws.amazon.com/education/awseducate/

    • UMass is an "AWS member institution", so you are in the higher allowance tier. Use your .edu email and the full school name "University of Massachusetts Amherst" when you register to get the full benefits (a total of $100 annually).

    • To get GPUs, use g3 (up to 4 NVIDIA Tesla M60 GPUs) or p2 (up to 16 NVIDIA K80 GPUs) instances in EC2. Check the pricing first and make your plan accordingly!

  • Google Cloud Platform: https://cloud.google.com/

    • You get $300 credits for the first 12 months, and always free on their free-tier resources (not including GPUs)

Final Presentation Guidelines

Towards the end of the semester we will have a presentation from each group. Details TBD.