Jason Heflinger

Overview

This project is an application of SIFT technology to attempt to take multiple images of an object, and then construct a 3D model from it! But before we dive into the semantics, we have to know what exactly SIFT is!

What is SIFT?

SIFT stands for Scale Invariant Feature Transform, and was an algorithm introduced bt David Lowe in 1999. It's a computer vision technique used for detecting and describing local features in images. Now, it is widely used for applications such as image matching, object recognition, and image stitching. In this case, we will be using it to attempt 3D model reconstruction from 2 images!

SIFT Feature Matching

How Does SIFT Work?

SIFT has 4 major steps: first it does scale-space extrema detection, then keypoint localization, then orientation assignment, and then finally descriptor generalization. I'll go more into detail in each of these steps in the following sections.

Scale-Space Extrema Detection

Scale-space extrema detection is a step in which we first identify any potential keypoints to use in our reconstruction. How this works is that SIFT first creates a series of progressively blurred versios of the original image. This is called the scale-space representation. Then, we can use DoG (Difference of Gaussians) to identify any potential keypoints.

Keypoint Localization

Now that we have a ton of potential keypoints, we have to narrow these down. This is done with keypoint localization to eliminate any unstable keypoints. This is done by fitting a quadratic function to nearby samples to reject keypoints with low contrast. We now have keypoints we can be more confident in.

Orientation Assignment

Now, each keypoint needs to also hold an orientation so each keypoint feature will be invariant to image rotation. This is done using a local orientation histogram in the neighborhood of each keypoint based on gradient oreintations. By doing this, any paired keypoints will also be able to sync their orientation so we can ensure the direction of any features we extract.

Descriptor Generalization

Finally, we give each keypoint a feature descriptor for the local image region. This is done by capturing the appearance and gradient information from the keypoint neighborhood. It is important to note that while this descriptor is robust to scale and rotation, it is only partially robust when it comes to illumination and affine transformations. For our purposes however, this will be good enough!

What's Next?

Well, we've finished using SIFT, but what did that get us? What's next? SIFT only gets us a set of keypoints that sync between the two images, as well as descriptors to generalize information about them. How can we use this to create a 3D model? Well, we have to do 4 more steps: we first use a flann-based matcher, then a KNN match and lowe's ratio test, and then use RANSAC to finally get our correspondences, and the finally we can put it all together using basic homography and profit! As before, I'll go more into deph into each of those steps in the following sections.

Flann-Based Matcher

We now have 1000s of features detected, but not too many work for our purposes still. To reduce this, we use a flann-based matcher to contruct a k-dimensional space. then we can search for matches using the feature distance, and reduce our working set significantly1

KNN Matching and Lowe's Ratio Test

Now KNN matching can be used to find the k best matches (we will be using k=2 in this case). Then, we can use Lowe's ratio test to ensure there is no close second match, making each match more and more confident. Now we can ensure that features are unique in each image instead of weak correlations.

RANSAC

Finally, we have reasonable features matches left. Now we can use RANSAC to ensure that points that correspond to a homography are the only ones chosen. After all, these are the only points that are usable for our purpose of constructing the 3D image.

Profit

And now that we have our final filtered set, we can now use basic homography and some other techniques to construct a wireframe and transform each feature from the image into the wireframe to make a 3D model. Tada! We are all done!

SIFT Feature Matching An Academic Building at Rose-Hulman

Results

During this process, there was some manual input around the end to create the wireframe. Because of this, it is apparent that this was unfortunately not enough to create a 3D model alone with SIFT. However, there are plenty of automation techniques in computer vision that can automate this wireframe, which still validates this technique for this application!

Technologies

Unforunately, this project was all done with Matlab (ew, yes I know). The objective was to do this all from scratch, and since I was learning matlab for a computer vision class already, it seemed like a 2 birds with one stone type of deal.