Improved Transformation Invariant Reverse Image Search

Introduction

This is an improvement of a previous algorithm I worked on for improving reverse image search for transformed images which can be seen here. The methodology is based on finding and extracting image fragments and then applying a transformation to the fragments to enable constant time image matching of these transformed images.

I am publishing this work because it is an interesting and novel alternative approach to carrying out this type of search. The algorithm would require significant additional work before being production-ready but in limited testing it was able to correctly match some transformed images which existing methods could not. Overall I think the algorithm has the potential, in future more developed iterations, to enhance reverse image search. Even if this does not prove the case, the solution will hopefully be of interest for its originality and analytical character.

Compared to the previous algorithm, the main improvement I made with the new algorithm is an efficiency gain. The running time of the previous algorithm was $O(n^3)$ with respect to the size/complexity of the image. Hence it runs extremely slow on large complex images. This new algorithm scales linearly $O(n)$ with respect to the size/complexity of the image.

Below I firstly give an overview of how the entire algorithm works. I then focus on a core element of the algorithm: how it applies transformations to image fragments to enable image matching. In this regard I give a summary of the transformation process and then outline the mathematics involved.

Overview

Reverse image search is a brilliantly useful tool. It is also a frustrating tool. It frequently fails to recognize slightly edited, cropped, skewed or rotated images.

This algorithm solves this problem by defining shapes in an image based on connected edges and then splitting the image into the image fragments bounded by those shapes. Using some maths, we find a common transformation for similar shapes across all images. The transformation of a shape of course reconfigures the image fragments enclosed by the shapes. Our algorithm then hashes these fragments, allowing us to perform partial image matching in constant time.

Figure 1.

For example, in Figure 1 images (A) and (B) are the images we hope to match using our algorithm. Traditional reverse image search will fail to match these images because they have been transformed. We match them by first dividing up the images into shapes by finding connected edges. (Figure 2 below visually represents an algorithmic search for connected edges in an image). The shapes enclose image fragments, which we then extract.

In this example, images (C) and (D) show one of the fragments extracted from the image. The fragment in this case is the polygon drawn in blue and is shown after we have found a common rotation and scale.

To find a common scale, we simply scale to some constant. We find a common rotation by hashing the fragment 360 times with a 1 degree change in rotation each time. I recognise this way of finding a rotation is somewhat crude: ideally the algorithm would be able to select an appropriate rotation without cycling through so many options. This is clearly an area for possible improvement but the current solution, while inelegant, still works quite well.

Finding a common transformation is more complicated than finding a common rotation and scale, so I give a more in-depth explanation, including the maths, below.

Figure 2.

Finding a common transformation

Intuitively, it is difficult to determine whether C and D are the same images. However, after our algorithm has applied a transformation to both images, we end up with images E and F; it is now easy to see that the images are the same.

To actually go from images (C) to (E) and (D) to (F) in Figure 1 we need to agree on some common way to transform both image fragments (a 'common transformation') so we can compare them and decide if they are a match. However, unlike in the simplified example shown above, our algorithm must avoid comparing images directly when computing a common transformation. This is so the running time of the algorithm does not scale linearly with the number of images in our database. Once we have found a common transformation, we can hash the transformed fragments and then make use of traditional reverse image search techniques to perform partial image matching with our database images in constant time.

To find a common transformation between two visually similar images without directly comparing them, our algorithm uses the properties of the shapes we extract from the images. We do this in such a way that visually similar fragments which have been subject to transformations of various types, will be transformed again (by the algorithm) to the same transformation.

This is shown in the demo below.

We have extracted the same fragment from the two visually similar images on the left. The fragments only differ by a transformation; they are otherwise identical. We have derived a function for the transformations we can apply to this fragment. The local minimum for this function will always give us the same fragment. You can see this in the demo: move the orange circle to the local minimum and the two image fragments will be identical.

Figure 3.

Finding a Common Transformation: The Maths

I will now describe how we derive the function plotted in Figure 3 to find a common transformation for image fragments. It is important to note at the outset that it is not feasible to handle all possible transformations that can be applied to an image; some transformations are simply too destructive. For this algorithm we have only dealt with 2D affine transformations. These include a wide range of changes to images: rotation, translation, scaling, skewing and any combination of these. 2D affine transformations are easy to work with mathematically and, because they preserve lines and parallelism, we can guarantee the transformations aren't so destructive that we can't find a common transformation between two images. We can see examples of 2D affine transformations below.

\f\relax{a,b,c,d}=[\textnormal{Input Shape}] \begin{bmatrix} a & b\\ c & d \end{bmatrix} =\underbrace{ \begin{bmatrix} 1 & 1\\ -1 & 1\\ -1 & -1\\ 1 & -1 \end{bmatrix} }_{\textnormal{Input x, y coordinates}} \begin{bmatrix} 1 & 1\\ -1 & 1 \end{bmatrix} =\underbrace{ \begin{bmatrix} 1 & 1\\ -1 & 1\\ -1 & -1\\ 1 & -1 \end{bmatrix} }_{\textnormal{Output x, y coordinates}}

To reduce the number of transformations we will deal with rotation and scale separately. This allows us to reduce the number of parameters in our 2D affine transformation matrix from 4 parameters:

\f\relax{a,b,c,d}= \begin{bmatrix} a & b\\ c & d \end{bmatrix}

to 2 parameters:

\f\relax{a,b}= \begin{bmatrix} a & b\\ 0 & \frac{1}{a} \end{bmatrix}

We can do this by extracting a rotation and scale matrix from our matrix

f (a, b, θ, s) = Rotation Matrix [c o s θ s i n θ - s i n θ c o s θ] Scale Matrix [s 0 0 s] [a 0 b \frac{1}{a}]

And then dropping the rotation and uniform scale parts of the equation. We can do this because we handle rotation and uniform scale separately as outlined above.

Now we have a matrix which represents all our transformations except uniform scale and rotation:

f (a, b) = [Input Shape] [a 0 b \frac{1}{a}] = Input x, y coordinates \ ⎣ ⎢ ⎢ ⎢ ⎡ 1 - 1 - 1 1 11 - 1 - 1 ⎦ ⎥ ⎥ ⎥ ⎤ [1001] = Output x, y coordinates ⎣ ⎢ ⎢ ⎢ ⎡ 1 -1 -1 1 11 -1 -1 ⎦ ⎥ ⎥ ⎥ ⎤

Notice how with the new transformation matrix we lose the ability to set the rotation and area with changes to the input parameters. This is what we want because rotation and scaling are handled separately. By removing the ability to control these we have reduced our parameters from 4 to 2.

Now that we have reduced the number of parameters we can derive a function to give us a common transformation. We will derive a function which computes the transformation which minimises the squared distance from the center of our shape to every single point in our shape. We choose this function because for any arbitary shape it has one easy to find global minimum. We'll use the coordinates of the global minimum as the parameters for our common transformation.

We will start with the function to sum the squared distance to each edge:

\f\relax{\textnormal{shape}}=\sum{}_{} x_i^2 + y_i^2

Now apply our transformation matrix to each point

\underbrace{ \begin{bmatrix} x & y \end{bmatrix} }_{\textnormal{Input Point}} \begin{bmatrix} a & b\\ 0 & \frac{1}{b} \end{bmatrix} = \underbrace{ \begin{bmatrix} ax+by & \frac{y}{b} \end{bmatrix} } _{\textnormal{Output Point}}

substitute in the new values for x and y

f\relax(shape, a, b)= \sum{ \begin{bmatrix} (ax_{i}+by_{i})^2 + \left(\dfrac{y_{i}}{a}\right)^2 \end{bmatrix} }

now let's clean this up a little to make it easier to manipulate

f\relax(shape, a, b)= \sum{ \begin{bmatrix} a^2x_{i}^2+2aby_{i}x_{i} +b^2y_{i}^2+ \dfrac{1}{a^2}(y_{i})^2 \end{bmatrix} }

f\relax(shape, a, b)=a^2\sum{[x_{i}^2]}+2ab\sum{[y_{i}x_{i}]} +b^2\sum{[y_{i}^2]}+ \dfrac{1}{a^2}\sum{[y_{i}^2]}

If we plot this function for a square it looks like this:

The bottom axes are a and b in our transformation matrix and the vertical axis is the output of our function

You can see from the above graph that the local minimum of this function brings each part of the shape as close as possible to the center of the shape. Rectangles will become squares, triangles will become equilateral triangles, ovals will become circles. So if we find the values of a and b which minimise this function they will be the values of a and b we insert into our 2D transformation matrix which give us the common transformation we are looking for.

To minimise the function we first get the derivatives of the function with respect to a and b.

So we take this function from above

f\relax(shape, a, b)= a^2\sum{ \begin{bmatrix} x_{i}^2 \end{bmatrix} } + 2ab\sum{ \begin{bmatrix} y_{i}x_{i} \end{bmatrix} } + b^2\sum{ \begin{bmatrix} y_{i}^2 \end{bmatrix} } + \frac{1}{a^2}\sum{ \begin{bmatrix} y_{i}^2 \end{bmatrix} }

and differentiate with respect to a and b

\frac{df\relax(shape, a, b)}{da}= 2a\sum{ \begin{bmatrix} x_{i}^2 \end{bmatrix} } + 2b\sum{ \begin{bmatrix} y_{i}x_{i} \end{bmatrix} } - \left( 2\frac{ \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} }{a^3} \right)

\frac{df\relax(shape, a, b)}{db}= 2a\sum{ \begin{bmatrix} y_{i}x_{i} \end{bmatrix}} + 2b\sum{ \begin{bmatrix} y_{i}^2 \end{bmatrix} }

...find the global minimum by setting the derivatives equal to 0...

0= 2a\sum{ \begin{bmatrix} y_{i}x_{i} \end{bmatrix} } + 2b\sum{ \begin{bmatrix} y_{i}^2 \end{bmatrix} }

...isolate b...

b= -\frac{a\sum{ \begin{bmatrix} y_{i}x_{i} \end{bmatrix} } }{ \sum{ \begin{bmatrix} y_{i}^2 \end{bmatrix} } }

...substitute in b...

0= 2a\sum{ \begin{bmatrix} x_{i}^2 \end{bmatrix} } + 2\left( -\frac{a\sum{ \begin{bmatrix} y_{i}x_{i} \end{bmatrix} } }{ \sum{ \begin{bmatrix} y_{i}^2 \end{bmatrix} } } \right)\sum{ \begin{bmatrix} y_{i}x_{i} \end{bmatrix} } - \left( 2\frac{ \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} }{a^3} \right)

...isolate a...

a^4=\frac{ \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} ^2 }{ \sum \begin{bmatrix} x_{i}^2 \end{bmatrix} \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} - \sum \begin{bmatrix} y_{i}x_{i} \end{bmatrix}^2}

our final 2 equations for a and b are:

a=\sqrt[4]\frac{ \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} ^2 }{ \sum \begin{bmatrix} x_{i}^2 \end{bmatrix} \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} - \sum \begin{bmatrix} y_{i}x_{i} \end{bmatrix}^2}

b= -\frac{a\sum{ \begin{bmatrix} y_{i}x_{i} \end{bmatrix} } }{ \sum{ \begin{bmatrix} y_{i}^2 \end{bmatrix} } }

which when applied to our shape gives the shape with our common transformation applied. These are the transformation matrices which when applied to images (C) and (D) in Figure 1 will give us (E) and (F).

The great thing about solving something analytically is that we're left with a short simple function. Our final pseudo code is:

/**

* Get the 2 parameters (a, b) of a transformation matrix [[a,b],[0,1/a]] which will spread

* our input shape out evenly in all directions

* @param xs = the sum of the square of all the x values of all our points in our shape

* @param xy = the sum of the square of all the y values of all our points in our shape

* @param xy = the sum of the product (y*x) of all the x and y values of all our points in our shape

**/

function get_a(xs, ys, xy) {

return sqrt(sqrt( pow(ys, 2)/(xs*ys - pow(xy, 2)) ))

}

function get_b(xs, ys, xy) {

return get_a(xs, ys, xy) * (-xy/ys)

}

We now have an algorithm which works on the vertices of a polygon. However two shapes can be visually similar but have completely different vertices. If we describe the same shape but with different vertices our algorithm will fail to match. To fix this we will use integration to take into account the entire area of the shape.

Firstly, in order to avoid summing to an infinte value we need to replace $\sum \begin{bmatrix} y_{i}^2 \end{bmatrix}$ , $\sum \begin{bmatrix} x_{i}^2 \end{bmatrix}$ and $\sum \begin{bmatrix} y_{i}x_i \end{bmatrix}$ in our equations with the average values for $y_i^2$ , $x_i^2$ and $y_ix_i$ . So our equations for 'a' and 'b' become:

a=\sqrt[4]\frac{ \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} ^2 }{ \sum \begin{bmatrix} x_{i}^2 \end{bmatrix} \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} - \sum \begin{bmatrix} y_{i}x_{i} \end{bmatrix}^2} \rightarrow a=\sqrt[4]\frac{ \left( \frac{ \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} }{n} \right)^2 }{ \frac{ \sum \begin{bmatrix} x_{i}^2 \end{bmatrix} }{n} \frac{ \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} }{n} - \left( \frac{ \sum \begin{bmatrix} y_{i}x_{i} \end{bmatrix} }{n}\right)^2 }

We can do this because we only care about the ratio of these sums and not their actual values.

To compute the values of $\frac{ \sum \begin{bmatrix} y_{i}^2 \end{bmatrix} }{n}$ , $\frac{ \sum \begin{bmatrix} x_{i}^2 \end{bmatrix} }{n}$ and $\frac{ \sum \begin{bmatrix} x_{i}y_{i} \end{bmatrix} }{n}$ we will break our polygon into segments at each vertex and treat each edge as a line segment and then combine the averages of each segment. We can use integration on each of these line segments to get the average value of $y_i^2$ , $x_i^2$ and $y_ix_i$ for every point under the line. We skip the derivation here but the derived values are:

\frac{\sum y_i^2}{n} = \frac{\int_{x_1}^{x_2}(mx+c)^3dx}{3\int_{x_1}^{x_2}(mx+c)dx}

We can use the same equation to get $\frac{ \sum \begin{bmatrix} x_{i}^2 \end{bmatrix} }{n}$ by rotating the shape 90 degrees. And the final equation for the average of x*y is:

\frac{\sum y_ix_i}{n} = \frac{\int_{x_1}^{x_2}x(mx+c)^2dx}{2\int_{x_1}^{x_2}(mx+c)dx}

We then substitute these values into our equations for a and b and solve which gives us the parameters of our transformation matrix. This common transformation allows us to transform our input images into images we can easily match using traditional reverse image search.

Summary

This new algorithm builds on previous work of mine to improve reverse image search for transformed images. This page contains a brief summary of how the algorithm works and a more in-depth discussion of how the algorithm computes common transformations to apply to image fragments to enable matching.

Comments on this piece of work can be sent to murphyt7@tcd.ie