Assignment 02 - Motion Queued Face Detection, Due April 10 Due April 13

See the Addendum Section below for updates

Overview

The purpose of this assignment is to test the value of masks derived from motion detection in video as a source of constraint when performing face detection in video. The data for this assignment consists of 32 videos from the Point-and-Shoot Face Recognition Challenge (PaSC). You will be provided links to this data withing the CS Department. You are responsible for limiting access to this data while you work with it. Under no circumstances may you redistribute the PaSC video data.

In the course of carrying out this assignment you will gain experience in a variety of areas:

  • Working with video data.
  • Running the cascade face detector from OpenCV. (done in Homework 2)
  • Restricting attenttion to regions of interest.
  • Carrying out quantitative empirical analysis of detector performance.
  • Using "ground truth" data as a critical component in evaluation, and also properly recognizing the limits of machine generated "ground truth".
  • Summarizing results of comparative experiments.
  • Opportunity to explore ways to improve face detection.

The driving question behind this assignment is wether face detection can be improved by taking advantage of a moving foreground versus stable background extraction process. In principle, face detection should be more reliable when false positive outside of the regions where a person is moving are filtered out (or never generated to begin with). However, the foreground extraction algorithm itself is not perfect, and whether limiting choices to the cascade classifier in OpenCV actually helps is by no means a certainty. Hence, the outcome of your tests are not a forgone concusion. Further, as part of this assignment, you are encourages to explore your own ideas that might lead to better face detection.

What You are Being Given

In this assignment you are being given 32 video files, two videos for each of 16 people. This is a relatively small portion of the PaSC handheld video. For each frame in each video, you are being provided a binary mask that highlights where a video stabilization and foreground extraction process has attempted to identify the moving person.

The video and mask data is only readable on our unix machines by accounts in the computer vision group. All students in this class are now members of this unix group. Also, as we have discussed in class, this data is provided to us under license and by using it you are taking responsibility for seeing to it that access to this data is restricted to members of the vision group. Under no circumstances may you repost or redistribute this data. The path to the data is: ~vision/data/cs510spring2015/assign02

You are also being provided a file containing face detection information obtained using an algorithm available to NIST: the PittPatt SDK 5.2.2 algorithm. Detections adjusted for stabilization are included in the second file.

Note, this is a subset of the larger file pasc_video_pittpatt_detections.csv.

Your Tasks

Here the major portions of this assignment are broken out into tasks and sub-tasks:

Task 0: Infrastructure

You will need to build code to run the cascade face detector over the videos and automatically score the result. For the sake of this assignment, the face detections found by the PittPatt algorithm will be treated as "ground truth". There are of course weaknesses in this assumption, but the alternative of hand labeling all videa frames is outside the scope of this assignment.

On each frame of vide for which PittPatt has detected a face, there are two distinct things to check.

  1. Does the cascade detector find the face. Best to use a fifty percent overlap rule.
  2. Does the cascade detector generate false positives, and if so, how many.

Now you will need to return a pair of numbers for any given experiment with the face detector

  1. The false negative rate.
  2. The false positive rate.

Also track the absolute number underlying these rates. In other words, the actual number of false negatives and false positives.

You are strongly encouraged to log results of a given experiment in a single comma separated variable file (.csv format) in order to make subsequent analysis using either a spreadsheet program or R easier.

Task 1: Establish a Baseline

Design a set of experiments to test different configurations of the cascasde classifier and record performance using the tools you developed in Task 0. At a minimum, you must explore at least 4 different configurations of the the face detection algorithm. Of course you may explore more, and the goal is to determine a configuration that delivers the good performance - better than a set of alternative - on the videos provided.

Task 2: Use Motion Queues

Modify the standard face detection algorithm to consider only faces consistent with where the foreground masks suggest a person is present. How exactly you choose to do this is up to you. There are a variety of alternative ways to use the masks as a constraint, and you need to think through the approach that makes the most sense to you. We will discuss this in class and you may want to run alternatives by the instructor.

In developing your approach to using the foreground masks, you should consider at a minimum of four alternative configurations of your new approach. Then, in a manner similar to that used in Task 1, conduct experiments to determine the best configuration relative to the alternatives considered.

Task 3: Write it Up

Empirical research in computer science is about wrting code and running code. However, the product is seldom the code, but instead what you can teach others. In that spirit, the principal product of this assignment will be a report. Your report will clearly present what you have done in carrying out Tasks 1 and 2 above. Note that Task 0 does not explicitly come up in your report.

In terms of pure mechanics of writing, you are being explicitly asked to learn to use LaTeX for this assignment. Discussing writing support among computer scientists can quickly become like discussion "the best editor". However, all arguments aside, LaTeX is used extensively for research publications and being familiar with it constitutes and important research skill. To get you started, here a simple LaTeX Template.

In no particular order, here are a series of recommendations to keep in mind when writing your report. You must include figures to illustrate key points and summarize quantitative comparisons. You must have a bibliography and also a related work section. This assignment is drawing upon two lines of publisehd research: face detection and motion segmentation. You are expected to ask the instructor if you feel you need help in refining the scope of the related work section. You must have both an introduction and a conclusion. You may directly quote from other work if doing so sparingly and because it adds value. Quoting directly from cited works without clearly delinating material as being qouted is called Plagiarism. Generally, writing clear paragraphs is better than relying upon lists. Review your writing carefully and remember that revision is key to good writing. First drafts are not meant to be read by others.

Submission and Grading

You will email the instructor a single PDF file containing your report. The file name will begin with "assignment02" followed by an underscore followed by your last name. Grading will be based upon reading of the report and also on a one-on-one code walk through with the instructor.

Addendum

PittPatt Face Coordinates (3/27/15)
Note that you are actually provided two videos, the pre-stabilization mp4 video and the after stabilization avi file. The motion masks are registered to the stabilized video, while the PittPatt face detection coordinates are relative to the original unstabilized video. The PittPatt detections corrected for the stabilized videos are now available .