Geometric Transformations

Transforming Points

Geometric transformations do exactly what the name suggests. You are familiar with the rotation matrix, which rotates some 2D-vector counterclockwise through an angle \(\theta\):

\[\begin{bmatrix} \cos \theta & - \sin \theta\\ \sin \theta & \cos \theta \end{bmatrix}\]

One transformation that can’t be expressed as a matrix is translation (simply moving vectors in their entirety). We are used to expressing it with vector addition:

\[\hat \v x = \v x + \v t\]

Still, we really want to use a matrix multiplication because our graphics cards are great at them. The solution is to use homogeneous coordinates. Here’s how that works:

\[\begin{aligned} \text{Euclidean to homogeneous } \v x &= \begin{bmatrix}x_1\\x_2\end{bmatrix} \longrightarrow \hv x = \begin{bmatrix}x_1\\x_2\\1\end{bmatrix}\\ \text{Homogeneous to Euclidean } \hv x &= \begin{bmatrix}x_1\\x_2\\a\end{bmatrix} \longrightarrow \v x = \begin{bmatrix}x_1/a\\ x_2/a\end{bmatrix} \end{aligned}\]

Now we can assemble a matrix in homogeneous coordinates that can translate.

\[\begin{aligned} \begin{bmatrix}1&0&t_1\\0&1&t_2\\0&0&1\end{bmatrix} \begin{bmatrix}x_1\\x_2\\1\end{bmatrix} = \begin{bmatrix}x_1 + t_1\\x_2 + t_1\\1\end{bmatrix}\\ \end{aligned}\]

We could fit the rotation in as well, giving us 3 degrees of freedom – the angle \(\theta\) and the distance of translation in the direction of both axis. This is called a Rigid Body Motion (RBM).

\[\begin{bmatrix} \cos \theta & -\sin \theta & t_1\\ \sin \theta & \cos \theta & t_2\\ 0 & 0 & 1 \end{bmatrix}\]

There are more named transformations, like the affine transformation, which has six degrees of freedom, and the projective transformation, which has 8 degrees of freedom.

\[\begin{aligned} &\text{Rotation} &\begin{bmatrix} \cos \theta & -\sin \theta & 0\\ \sin \theta & \cos \theta & 0\\ 0 & 0 & 1 \end{bmatrix}\\ &\text{Translation} &\begin{bmatrix}1&0&t_1\\0&1&t_2\\0&0&1\end{bmatrix}\\ &\text{Rigid Body Motion} &\begin{bmatrix} \cos \theta & -\sin \theta & t_1\\ \sin \theta & \cos \theta & t_2\\ 0 & 0 & 1 \end{bmatrix}\\ &\text{Affine} &\begin{bmatrix}a&b&c\\d&e&f\\0&0&1\end{bmatrix}\\ &\text{Projective} &\begin{bmatrix}a&b&c\\d&e&f\\g&h&1\end{bmatrix} \end{aligned}\]
In [1]: from utils import e2h, h2e, plot_polygon_transformations

In [2]: square = np.array([[0,0],[1,0],[1,1],[0,1]])

# Define transformation matrix for a Rigid Body Motion
In [3]: theta = (1/4)*np.pi

In [4]: RBM = np.array([[np.cos(theta), -np.sin(theta), 1],
   ...:                 [np.sin(theta), np.cos(theta), 1.5],
   ...:                 [0,0,1]])
   ...: 

# Affine transformation
In [5]: Af = np.array([[1,0.7,2],
   ...:                 [0.2,0.8,1],
   ...:                 [0,0,1]])
   ...: 

# Projective transformation
In [6]: Pr = np.array([[1,2,3],
   ...:                 [0.9,0.85,4],
   ...:                 [0.05,0.45,1]])
   ...: 

In [7]: hom_square = e2h(square.T)

In [8]: rbm_square = h2e((RBM @ hom_square)).T

In [9]: af_square = h2e((Af @ hom_square)).T

In [10]: pr_square = h2e((Pr @ hom_square)).T

In [11]: plot_polygon_transformations(
   ....:     [square, rbm_square, af_square, pr_square],
   ....:     ['start', 'rbm', 'affine', 'projective']);
   ....: 
../_images/transformations.png

Transforming Lines

We can see lines as collections of points and transform them that way, but it would be nice to think of them a little more abstractly. The general equation of a line is:

\[l_1 x_1 + l_2 x_2 + l_3 = 0 \]

Here \(\begin{bmatrix}l_1\\l_2\end{bmatrix}\) is the normal vector to the line and \(l_3\) is the distance from the origin to the line. In homogeneous coordinates we can nicely express this entirely in one matrix product:

\[\begin{bmatrix}l_1&l_2&l_3\end{bmatrix} \begin{bmatrix}x_1\\x_2\\1\end{bmatrix} = 0\\ \v l^T \hv x = 0\\\]

Now we can convey all the information about a line in homogeneous coordinates in one vector. Note: it is common practice to denote lines with row vectors and points with column vectors. This makes it easy to distinguish one from the other.

It turns out transforming lines in homogeneous coordinates is not as straightforward as transforming points. If \(M\) is some transformation matrix, \(\hv x\) is a point and \(\v l\) is a line and an accent (\('\)) signifies their transformed versions:

\[\begin{aligned} \hv x' &= M \hv x\\ \v l' &= (M^{-1})^T \v l \end{aligned}\]
Proof
\[\begin{aligned} \v l'^T \hv x' &= 0\\ \v l'^T M \hv x &= 0\\ \left(M^T \v l' \right)^T \hv x &= 0 \quad \text{using } (AB)^T = B^T A^T\\ \v l^T \hv x &= (M^T \v l')^T \hv x\\ \v l^T &= (M^T \v l')^T\\ \v l &= M^T \v l'\\ (M^T)^{-1} \v l &= \v l'\\ (M^{-1})^T \v l &= \v l'\\ \end{aligned}\]

Estimating a Transformation

Affine

What if we have some origin points and transformed versions of those points, and we are looking for the matching transformation matrix? Let’s start with an affine transformation.

\[\begin{bmatrix}x_1'\\y_1'\\1\end{bmatrix} = \begin{bmatrix}a&b&c\\d&e&f\\0&0&1\end{bmatrix}\begin{bmatrix}x_1\\y_1\\1\end{bmatrix}\\\]

And do some rewriting:

\[\begin{bmatrix}x_1'\\y_1'\end{bmatrix} = \begin{bmatrix}x_1&y_1&1&0&0&0\\0&0&0&x_1&y_1&1\end{bmatrix} \begin{bmatrix}a\\b\\c\\d\\e\\f\end{bmatrix}\]

This is more familiar territory. Now we can see that if we invert the first matrix after the equals sign (let’s call it \(A\)), we can solve for our missing variables \(\{a, b, \dots, f\}\). But.. We are trying to reverse-engineer six degrees of freedom from how one point moved. We need more before-and-after point correspondences.

\[\begin{bmatrix}x_1'\\y_1'\\x'_2\\y_2'\\x_3'\\y_3'\end{bmatrix} = \begin{bmatrix} x_1&y_1&1&0&0&0\\ 0&0&0&x_1&y_1&1\\ x_2&y_2&1&0&0&0\\ 0&0&0&x_2&y_2&1\\ x_3&y_3&1&0&0&0\\ 0&0&0&x_3&y_3&1\\ \end{bmatrix} \begin{bmatrix}a\\b\\c\\d\\e\\f\end{bmatrix}\]

If we invert \(A\) (the largest matrix here) we should have the transformation down to a T! But what if we don’t have the point correspondences exactly correct? Maybe we are making some kind of paper scanning application where we try detect the corners of the piece of paper in view and transform it into a tidy scan. It’s quite possible that the corners we found are off by a little bit. We should use the pseudo-inverse:

\[(A^T A)^{-1} A^T\]

Using this method we may also add more points in hopes of getting a more accurate approximation. Of course if these added point correspondences are not that accurate it will only dilute the pool of points, resulting in a worse estimation of the transformation.

Projective

Reverse-engineering a projective transformation is a little more complicated because the component we are dividing by when going back from homogeneous to Euclidean coordinates is almost certainly not \(1\). If we apply the same rewriting as we did for the affine transformation we have an extra factor \(\lambda\) to account for:

\[\lambda \begin{bmatrix}x_1'\\y_1'\\1\end{bmatrix} = \begin{bmatrix}a&b&c\\d&e&f\\g&h&1\end{bmatrix} \begin{bmatrix}x_1\\y_1\\1\end{bmatrix}\]

This equation shows \(\lambda\) is equal to \(gx_1 + hy_1 + i\), so we have:

\[\begin{aligned} \left(gx_1 + hy_1 + i\right)x_1' = ax + by + c\\ \left(gx_1 + hy_1 + i\right)y_1' = dx + ey + f\\ \end{aligned}\]

Some rewriting later:

\[\begin{aligned} \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1' x_1 & -x_1' y_1 & -x_1'\\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1' x_1 & -y_1' y_1 & -y_1' \end{bmatrix} \begin{bmatrix} a\\b\\c\\d\\e\\f\\g\\h\\i \end{bmatrix} = \v 0 \end{aligned}\]
Proof
\[\begin{aligned} \left(gx+hy+i\right)x' &= ax + by + c\\ 0 &= ax+by+c-x'xg -x'xh - x'i\\ (gx + hy + i)y' &=dx + ey + f\\ 0 &= dx + ey + f - y'xg - y'yh - y'i\\ \end{aligned}\]

The matrix form above equals the combination of line two and four here.

Again adding more point correspondences so that we can solve the system of equations:

\[\begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1'x_1 & -x_1'y_1 & -x_1'\\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1'x_1 & -y_1'y_1 & -y_1'\\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2'x_2 & -x_2'y_2 & -x_2'\\ 0 & 0 & 0 & x_2 & y_2 & 1 & -y_2'x_2 & -y_2'y_2 & -y_2'\\ x_3 & y_3 & 1 & 0 & 0 & 0 & -x_3'x_3 & -x_3'y_3 & -x_3'\\ 0 & 0 & 0 & x_3 & y_3 & 1 & -y_3'x_3 & -y_3'y_3 & -y_3'\\ x_4 & y_4 & 1 & 0 & 0 & 0 & -x_4'x_4 & -x_4'y_4 & -x_4'\\ 0 & 0 & 0 & x_4 & y_4 & 1 & -y_4'x_4 & -y_4'y_4 & -y_4' \end{bmatrix} \begin{bmatrix}a\\b\\c\\d\\e\\f\\g\\h\\i\end{bmatrix} = \v 0\]

If we write this down as \(\v A \v p=\v 0\), it is clear that \(\v p\) (our parameter vector that we are looking for) is in \(\v A\)’s null space! Of course the trivial solution (\(\v 0\)) doesn’t make any sense, and we may again be dealing with some point correspondences that are not quite perfect. So let’s say we’re not looking for \(\v 0\) exactly, but some vector that has a norm close to \(0\).

The trick is to use \(\v A\)’s Singular Value Decomposition: \(\v A = \v U \v \Sigma \v V^T\). It turns out the best solution for \(\v p\) is the last column of \(\v V\). When programming this, don’t forget you need to reshape this column vector of parameters to be in a \(3 \times 3\) shape.

Proof: why \(\v p\) is the last column of \(\v V\)

Let’s just say we are looking for \(\v p\) being of unit length so that we are not going to find \(\v 0\) as a solution. The primary objective is minimize \(|| \v A \v p ||\) so that we get as close as possible to the ideal that \(\v A \v p = 0\).

\[\begin{aligned} \min || \v A \v p || \quad &\text{ while } ||\v p|| = 1\\ \min || \v U \v \Sigma \v V^T \v p || \quad &\text{ while } ||\v p|| = 1\\ \min || \v \Sigma \v V^T \v p || \quad &\text{ while } ||\v p|| = 1\\ \text{ (}\v U \text{ is an orthogonal matrix;}&\text{ it does not change length)}\\ \text{let } \v n = \v V^T \v p \text{, then }& \v p = \v V \v n\\ \text{ (because } \v V &\text{ is also orthogonal)}\\ \min || \v \Sigma \v n || &\text{ while } ||\v V \v n|| = 1\\ \min || \v \Sigma \v n || &\text{ while } || \v n || = 1\\ \end{aligned}\]

Now we need to use the fact that \(\Sigma\) is a diagonal matrix with the singular values on the diagonal, from large to small. So if we want to minimize \(|| \v \Sigma \v n ||\) we should simply set \(\v n\) to \(\begin{bmatrix}0&0&\dots&0&1\end{bmatrix}^T\) to end up with only the smallest value from \(\v \Sigma\). We can fill in:

\[\begin{aligned} \v n &= \begin{bmatrix}0&0&\dots&0&1\end{bmatrix} \v p &= \v V n\\ \v p &= \v V \begin{bmatrix}0\\0\\ \vdots\\0\\1\end{bmatrix}\\ \text{i.e. } \v p &= \text{the last column of } \v V \end{aligned}\]