This project is an application of SIFT technology to attempt to take
multiple images of an object, and then construct a 3D model from it! But before we dive into
the semantics, we have to know what exactly SIFT is!
SIFT stands for Scale Invariant Feature Transform, and was an algorithm introduced
bt David Lowe in 1999. It's a computer vision technique used for detecting and describing local features
in images. Now, it is widely used for applications such as image matching, object recognition, and image
stitching. In this case, we will be using it to attempt 3D model reconstruction from 2 images!
SIFT Feature Matching
SIFT has 4 major steps: first it does scale-space extrema detection, then
keypoint localization, then orientation assignment, and then finally descriptor generalization.
I'll go more into detail in each of these steps in the following sections.
Scale-space extrema detection is a step in which we first identify any potential keypoints
to use in our reconstruction. How this works is that SIFT first creates a series of progressively blurred versios of the original
image. This is called the scale-space representation. Then, we can use DoG (Difference of Gaussians)
to identify any potential keypoints.
Now that we have a ton of potential keypoints, we have to narrow these down. This is done
with keypoint localization to eliminate any unstable keypoints. This is done by fitting a quadratic
function to nearby samples to reject keypoints with low contrast. We now have keypoints we can be more
confident in.
Now, each keypoint needs to also hold an orientation so each keypoint feature will be invariant
to image rotation. This is done using a local orientation histogram in the neighborhood of each keypoint based
on gradient oreintations. By doing this, any paired keypoints will also be able to sync their orientation so we
can ensure the direction of any features we extract.
Finally, we give each keypoint a feature descriptor for the local image region. This is done
by capturing the appearance and gradient information from the keypoint neighborhood. It is important to note
that while this descriptor is robust to scale and rotation, it is only partially robust when it comes to illumination
and affine transformations. For our purposes however, this will be good enough!
Well, we've finished using SIFT, but what did that get us? What's next? SIFT only gets us
a set of keypoints that sync between the two images, as well as descriptors to generalize information about them.
How can we use this to create a 3D model? Well, we have to do 4 more steps: we first use a flann-based matcher, then a KNN
match and lowe's ratio test, and then use RANSAC to finally get our correspondences, and the finally we can put it all together using
basic homography and profit! As before, I'll go more into deph into each of those steps in the following sections.
We now have 1000s of features detected, but not too many work for our purposes still. To reduce this,
we use a flann-based matcher to contruct a k-dimensional space. then we can search for matches using the feature distance,
and reduce our working set significantly1
Now KNN matching can be used to find the k best matches (we will be using k=2 in this case). Then, we can use
Lowe's ratio test to ensure there is no close second match, making each match more and more confident. Now we can ensure
that features are unique in each image instead of weak correlations.
Finally, we have reasonable features matches left. Now we can use RANSAC to ensure that
points that correspond to a homography are the only ones chosen. After all, these are the only points that are usable
for our purpose of constructing the 3D image.
And now that we have our final filtered set, we can now use basic homography and some other techniques to
construct a wireframe and transform each feature from the image into the wireframe to make a 3D model. Tada! We are all done!
SIFT Feature Matching An Academic Building at Rose-Hulman
During this process, there was some manual input around the end to create the wireframe. Because of this,
it is apparent that this was unfortunately not enough to create a 3D model alone with SIFT. However, there are plenty
of automation techniques in computer vision that can automate this wireframe, which still validates this technique for this
application!
Unforunately, this project was all done with Matlab (ew, yes I know). The objective was to do this all
from scratch, and since I was learning matlab for a computer vision class already, it seemed like a 2 birds with one stone
type of deal.