본문으로 바로가기

SIFT (Scale invariant feature transform)

category AI/Computer Vision 2021. 3. 26. 15:02

SIFT

1) 개요

    - SIFT extracts features that are invariant to scale, rotation and intensity.

    - SIFT features can be matched reliably across 3D viewpoint and noise.

    - SIFT includes both an interest point detector and a descriptor.

Interest point detector
- It is an algorithm that chooses points from an image based on some criterion.
- Harris, Min Eigen, and FAST are interest point dector.

Descriptor
- It is a vector of values, which describes the image patch around an interest point. (raw pixel values, histogram of gradient, etc)

(ref : https://dsp.stackexchange.com/questions/24346/what-is-the-difference-between-feature-detectors-and-feature-descriptors)

 

STEPS

1) DoG

출처 : https://docs.opencv.org/master/da/df5/tutorial_py_sift_intro.html

    - SIFT interest point locations are found using difference-of-Gaussian functions (DoG).

$$D(X, \sigma) = [G_{k\sigma}(X)-G_{\sigma}(X)]*I(X)=[G_{k\sigma}-G_{\sigma}]*I=I_{k\sigma}-I_{\sigma}$$

    - Before detecting interest points, blur the image due to noise.

$$I_{\sigma} = G_{\sigma}\:blurred\:grayscale\:image$$

$$k=determining\:the\:separation\:in\:scale$$

    - Interest points are the maxima and minima of D(x, sigma) across both image location and scale.

DoG
- Compute the value by subtracting 2 different std values of gaussian kernel.
- This value is similar to LoG value.
- LoG has to do 2 times of derivatives. So, LoG needs more computing power.
- Furthermore, for scale invariance, LoG is needed to be normalized to std square.

 

2) Find interest points (maxima and minima of D(x, sigma); DoG)

출처 : https://docs.opencv.org/master/da/df5/tutorial_py_sift_intro.html

    - DoG images are searched for local extrema or minima over scale and space.

    - Above picture is the example. With same octave and different scale images, one pixel in an image is compared with its neighbors as well as 9 pixels in different scale images. If it is a extrema or minima, it is a potential keypoint.

    - If we make n DoGs, each octave have n + 3 DoG  images. Scale difference will be 2^(1/n)

Octave : Group of same size images

 

3) Interest point localization

    - Refine to get more accurate results from potential keypoints (edges, Low contrast keypoints).

 

    (1) Elimination of edge response

        - DoG has higher response for edges.

        - Use Harris corner detector to remove edges.

 

    (2) Elimination of low contrast keypoints

        - Use taylor series expansion of scale space to get more accurate location of extrema.

        - If a potential keypoint's contrast is less than a threshold value, reject it.

 

4) Orientation assignment

    - To achieve invariance to rotation, blur the around of keypoint and refine the gradient direction and magnitude.

    - Use gaussian weighted circular window 1.5 timed std.

    - An orientation histogram with 36 bins covering 360 degrees is created.

출처 : https://salkuma.files.wordpress.com/2014/04/sifteca095eba6ac.pdf

    - The peak of histogram will be a keypoint's orientation. If there is another peak which is higher than 80% of top peak, it recognizes both.

 

5) Keypoint Descriptor

    - It is represented as a vector to form interest point descriptor.

    - To achieve invariance to rotation, a reference direction is chosen based on the direction and magnitude of the image gradient around each point.

    - Compute a desrcriptor based on the position, scale and rotation.

출처 : https://salkuma.files.wordpress.com/2014/04/sifteca095eba6ac.pdf

    - The descriptor takes a grid of subregions around the point and for each subregion computes an image gradient orientation histogram. (Standard setting : 4 x 4 subregions with 8 bin orientation histograms)

 

6) Keypoint matching

    - For matching a feature in one image to a feature in another image is to use the ratio of the distance to the two closest matching features.

    - Get euclidean distance between each keypoint in two images.

    - By using ratio of the distance (1st nearest vs. 2nd nearest), check if the matching is corrected. (threshold : 0.8)

 

Code

1) Keypoints

img = cv2.imread('building.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

sift = cv2.SIFT_create()
kp = sift.detect(gray, None)

img = cv2.drawKeypoints(gray, kp, img)

plt.figure(figsize= (10, 7))
plt.imshow(img)

 

2) Feature matching

img1 = cv2.imread('/users/sejongpyo/downloads/631.jpg')
gray1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)

img2 = cv2.imread('/users/sejongpyo/downloads/632.png')
gray2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

#sift
sift = cv2.xfeatures2d.SIFT_create()

keypoints_1, descriptors_1 = sift.detectAndCompute(img1, None)
keypoints_2, descriptors_2 = sift.detectAndCompute(img2, None)

bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck = True)

matches = bf.match(descriptors_1, descriptors_2)
matches = sorted(matches, key = lambda x:x.distance)

img3 = cv2.drawMatches(img1, keypoints_1, img2, keypoints_2, matches[:30], img2, flags = 2)
plt.imshow(img3)
plt.axis('off')
plt.show()

참고

salkuma.files.wordpress.com/2014/04/sifteca095eba6ac.pdf

http://programmingcomputervision.com/

blueskyvision.tistory.com/21

docs.opencv.org/master/da/df5/tutorial_py_sift_intro.html

'AI > Computer Vision' 카테고리의 다른 글

Hough Transform  (0) 2021.04.01
Image Resampling (Image pyramids)  (0) 2021.03.30
Harris Corner Detector  (0) 2021.03.24
Morphology  (0) 2021.03.24
Image filtering (Detecting edges)  (0) 2021.03.24