Tej Bade, [email protected]
In this project, we are given digitized Prokudin-Gorskii glass plate images, each containing three sub-images side-by-side. One such image, tobolsk.jpg, is shown below.
Each of the three sub-images shows the actual image through the lens of a specific color channel (red, green, or blue). The goal of the project is to take the image, extract the red, green, and blue color channels, and produce a color image by stacking the channels on top of each other. Before stacking the channels, they have to be aligned first. This involves shifting the channels and using certain metrics to measure how well the channels align. In the next few sections, I outline my approaches for proper alignment.
To align all of the channels, I separately aligned the green and red channels to the blue channel. I started by shifting the channel a specific amount in the x and y directions using np.roll. Then, I take the normalized cross-correlation (NCC) between the two channels. Considering the channels as matrices, the NCC is found by first normalizing the channels by dividing each element by the Frobenius norm of the matrix. Then, we take the dot product of the matrices by multiplying element-wise and summing the products together to get the final score. A larger NCC indicates a better alignment.
In order to find the right displacement vector to use for each channel, I tried all possible vectors (x, y) such that both x and y fall within the range [-20, 20]. The image below is the result of using this method on tobolsk.jpg.
The naïve approach is a brute-force method that works when the desired displacement vector of each channel does not have a very large magnitude. Increasing the size of the search window to take into account larger displacement vectors makes our calculations very time-consuming. To combat this, I use a recursive algorithm called the pyramid method. I repeatedly downsize the channels by a factor of 2 until each channel falls under a particular size (I use channel.size ≤ 100000). In this base case, I use a search window that is [-20, 20] x [-20, 20]. For each recursive call, I take the displacement vector calculated on the downsized image and scale it by a factor of 2. I then perform the search with a window centered on this vector but with a width of [-5, 5] x [-5, 5], giving us a new displacement vector. This recursion continues until we have a displacement vector for the entire image. The image below is the result of using this method on “icon.tif”.
For some images like emir.tif, there is a noticeable misalignment of the channels even after the pyramid method. This is because much less light is captured in the green and red channels than in the blue channels from the blue jacket in the image, which throws off the NCC metric. Instead of using the brightness of the images in the alignment, we can use edge detection. I used OpenCV’s implementation of the Canny Edge Detection algorithm. The algorithm starts off by removing the noise in the image with a 5x5 Gaussian filter. Then, it finds the intensity gradient of the image by filtering with a Sobel kernel in both the horizontal and vertical directions. The images below show the result of the pyramid method with and without edge detection on emir.tif and harvesters.tif.
Using the pyramid method on emir.tif
Using the pyramid method on harvesters.tif
Using the pyramid method with Canny edge detection on emir.tif
Using the pyramid method with Canny edge detection on harvesters.tif
All of the pictures below are the result of the pyramid method with Canny edge detection except for lady.tif, which just uses the regular pyramid method (since edge detection does not make the alignment better). The captions for each image contain the final displacement vectors used for the green and red color channels.
monastery G: (-3, 2) R: (3, 2)
tobolsk G: (3, 3) R: (6, 3)
emir G: (49, 24) R: (107, 40)
icon G: (39, 16) R: (89, 23)
onion_church G: (52, 24) R: (107, 35)
self_portrait G: (77, 29) R: (175, 37)
train G: (41, 0) R: (85, 29)
cathedral G: (5, 2) R:(12, 3)
church G: (25, 3) R: (58, -4)
harvesters G: (60, 17) R: (123, 14)
melons G: (80, 10) R: (176, 14)
sculpture G: (33, -11) R: (140, -27)
three_generations G: (56, 12) R: (111, 8)
lady G: (57, -6) R: (123, -17)