Project 2: Template Tracking Video Stabilization - Due Friday, March 26th. Due Friday, April 2nd.

Jet landing at Nagoya Airport, Japan

See the Adendums at the bottom of the page (3/23/2010)

Don't forget the student note page with tips on doing the project

What

In this project you will write code that stabilizes a video sequence so that the primary object of interest, an airplane coming in for a landing, stays in the exact center of the image. The video sequence, "Continental Airlines Boeing 737-800 Landing at Nagoya", is being used with permission of the author. You are being provided a local version which is the first 24 seconds converted to 720 individual image frames. While this is very wasteful in terms of disk space, it will save you time in so much as you need not develop code for reading and writing video files.

Your program will take as input the video frames and a bounding box specification of the 'tracking target' in the first video frame. In other words, a box just large enough to encompass the airplane in the first image frame. Your program will write out new images which are stabilized versions in which the airplane always appears in the center of the image. Therefore, while in the current video the airplane jumps around a bit relative to the center of the image, in your new version it will remain exactly centered - or as nearly centered as your code is able.

Most of you will want to work with the PPM images unless you have already worked out a good relationship with a different more compact image file format API. The full set of 720 ppm images takes up nearly 2GB of disk space, so if you are working on the CS Unix Cluster please do NOT make your own copy. You can reach the ppm images directly from ~cs510/data/ and you will not need your own copies.

For those of you working on your own machines, here is a link to the entire ppm sequence in a gzipped tar bundle: nagoyaFrames.tar.gz (397 MB)

For students wanting the images in PNG or movie format, here are additional links: nagoyaFramesPNG.zip (390MB) , nagoyaLanding24sec.mp4 (12.9MB) and nagoyaLanding24sec.mov (6.2MB)

Why

There are several reasons why this assignment is a useful learning excercise. First and foremost, template matching is one of the most fundamental tricks in image interpretation, and while limited and primitive in many ways, it is also extremely easy to implement relative to many more complex methods, and should always be considered as a first option for tasks where it may be sufficient. Further, as you will discover, these first 24 seconds of video are sufficient for you to observe template matching both succeeding wonderfully - the early frames - and start to experience problems as the airplane grows larger and changes it 3D ang relative to the camera. Second, video stabilization is an important of image processing, and while it is often thought of as a low-level operation not concerned with schematic content, i.e. the specific objects present, this specific example was chosen precisely because it forces one to think about how stabilization as a generic task relates to real world objectives such as 'keep the plane in the center of the image'.

How

Most of you will tackle this problem in the following manner. If you want to depart from this approach and attempt some variant, that is fine so long as you first review your alternate plan with the instructor.

The essential operation within your code will be to compute the cross-correlation between the target and each new frame of video, and the target will simply be the portion of the first image frame that contains the airplane. Note that the bounding box specifying this part of the image is an input to your program, and that is how you determine the exact pixels that constitute the template. In the first frame of video, determine the translation, delta pixels horizontal and delta pixels vertical, between the exact center of the template and the exact center of the image. Use these values to create a new output version of the first video frame in which the center of the template now appears at the exact center of the image. Keep the width and height of this new image the same as the original, and fill pixels in the new image who lack a corresonding pixel in the original with a neutral value - black is fine. Note this step is an initialization step and does not require any template matching.

Now proceed to the next frame of video and use cross correlation to find the center of the template in this new image. In your first implementation, compute the cross correlation brute force in the spatial domain. Now produce a version of this image where the center of the template appears at the center of the image and write this image to disk. Repeate this process until you have processed all 720 frames of video. Now keep this exact mode of operation as one option for you system. In other words, you are going to start making your system more efficient and more robust, but with each major extension which follows, it should be invoked optionally through additional command large arguments to your system.

The first extension to the system described above is to replace the brute force cross correlation step with a procedure that computes the cross correlation in the frequency domain using the Fast Fourier Transform. Keep in mind as you do this that the FFT has a constraint regarding the size of images - typically the width and height must be a power of two. Also keep in mind that you will likely need to fill out the template image, either explicitly or conceptually, to match in dimensions the video image.

The second extension to the system is to introduce the notion of template updating. You will notice that over time in the video the template from the first frame will start to become less and less like the appearance of the airplane in the current frame. There a many ways to try to correct this problem, and you may explore several if you wish. At a minimum, you must implement the ability to periodically refresh the template. So, for example, track using the first template for 30 frames, then extract a new template from the center of the 30th frame after it has been centered on the target. Now use the new template for an additional 30 frames, and repeat the process. Keep in mind that the airplane is approaching the camera and thus growing in size in the image. This is probably something that you want to take into account when replacing and older template with a newer one.

Tips and Issues:

The full 24 seconds is a lot of data! Keep your development work managable by starting off small. If you code hits an error on the first eight frames, running more frames won't help, but it will sure slow down your testing. So, work incrementally and only head for the full data set toward the end as you become confident you system is doing what you expect and with some reasonable degree of efficiency. So, for example, running the brute force correlation version on all 720 frames is not necessary and may not be pleasant.

A note pad has been created for collecting additional useful comments and pointers.

Formalities:

You may write your video stabilization algorithm in C, C++, Java, Pascal, Python or other languages with prior approval. You may also use libraries to assist with common operations such as reading and writing images and taking Fourier Transforms. As with project 1, you will have the opportunity to present and discuss your system one-on-one with the instructor. You will aslo be asked to submit your code through RamCT.

You will also notice that writing out an entire sequence at full resolution (1280x720) takes roughly 2G of disk space. You are strongly encouraged to down sample to half that (640x360) in all but your final run. Also, there are tools for turning your output into a more compact video format after your program runs, so while you may briefly require 2G of space, you can then create a video and discard the individual frames.

Adendums:

In consultation with several students on Tuesday March 23rd it became apparent the Naive solution on these images, even done well, is slower (and more Naive) than I originally expected. We will discuss this more in class on March 24th, for now note that the due date has been extended by a week. (Ross 3/23/2010)