# Geometric Transformations¶

## Transforming Points¶

Geometric transformations do exactly what the name suggests. You are familiar with the rotation matrix, which rotates some 2D-vector counterclockwise through an angle \(\theta\):

One transformation that can’t be expressed as a matrix is translation (simply moving vectors in their entirety). We are used to expressing it with vector addition:

Still, we really want to use a matrix multiplication because our graphics cards are great at them. The solution is to use homogeneous coordinates. Here’s how that works:

Now we can assemble a matrix in homogeneous coordinates that can translate.

We could fit the rotation in as well, giving us 3 degrees of freedom – the angle \(\theta\) and the distance of translation in the direction of both axis. This is called a Rigid Body Motion (RBM).

There are more named transformations, like the affine transformation, which has six degrees of freedom, and the projective transformation, which has 8 degrees of freedom.

```
In [1]: from utils import e2h, h2e, plot_polygon_transformations
In [2]: square = np.array([[0,0],[1,0],[1,1],[0,1]])
# Define transformation matrix for a Rigid Body Motion
In [3]: theta = (1/4)*np.pi
In [4]: RBM = np.array([[np.cos(theta), -np.sin(theta), 1],
...: [np.sin(theta), np.cos(theta), 1.5],
...: [0,0,1]])
...:
# Affine transformation
In [5]: Af = np.array([[1,0.7,2],
...: [0.2,0.8,1],
...: [0,0,1]])
...:
# Projective transformation
In [6]: Pr = np.array([[1,2,3],
...: [0.9,0.85,4],
...: [0.05,0.45,1]])
...:
In [7]: hom_square = e2h(square.T)
In [8]: rbm_square = h2e((RBM @ hom_square)).T
In [9]: af_square = h2e((Af @ hom_square)).T
In [10]: pr_square = h2e((Pr @ hom_square)).T
In [11]: plot_polygon_transformations(
....: [square, rbm_square, af_square, pr_square],
....: ['start', 'rbm', 'affine', 'projective']);
....:
```

## Transforming Lines¶

We can see lines as collections of points and transform them that way, but it would be nice to think of them a little more abstractly. The general equation of a line is:

Here \(\begin{bmatrix}l_1\\l_2\end{bmatrix}\) is the normal vector to the line and \(l_3\) is the distance from the origin to the line. In homogeneous coordinates we can nicely express this entirely in one matrix product:

Now we can convey all the information about a line in homogeneous coordinates in one vector. Note: it is common practice to denote lines with row vectors and points with column vectors. This makes it easy to distinguish one from the other.

It turns out transforming lines in homogeneous coordinates is not as straightforward as transforming points. If \(M\) is some transformation matrix, \(\hv x\) is a point and \(\v l\) is a line and an accent (\('\)) signifies their transformed versions:

## Proof

## Estimating a Transformation¶

### Affine¶

What if we have some origin points and transformed versions of those points, and we are looking for the matching transformation matrix? Let’s start with an affine transformation.

And do some rewriting:

This is more familiar territory. Now we can see that if we invert the first
matrix after the equals sign (let’s call it \(A\)), we can solve for our
missing variables \(\{a, b, \dots, f\}\). But.. We are trying to
reverse-engineer six degrees of freedom from how *one* point moved. We need
more before-and-after point correspondences.

If we invert \(A\) (the largest matrix here) we should have the
transformation down to a T! But what if we don’t have the point correspondences
*exactly* correct? Maybe we are making some kind of paper scanning application
where we try detect the corners of the piece of paper in view and transform it
into a tidy scan. It’s quite possible that the corners we found are off by a
little bit. We should use the pseudo-inverse:

Using this method we may also add more points in hopes of getting a more accurate approximation. Of course if these added point correspondences are not that accurate it will only dilute the pool of points, resulting in a worse estimation of the transformation.

### Projective¶

Reverse-engineering a projective transformation is a little more complicated because the component we are dividing by when going back from homogeneous to Euclidean coordinates is almost certainly not \(1\). If we apply the same rewriting as we did for the affine transformation we have an extra factor \(\lambda\) to account for:

This equation shows \(\lambda\) is equal to \(gx_1 + hy_1 + i\), so we have:

Some rewriting later:

## Proof

The matrix form above equals the combination of line two and four here.

Again adding more point correspondences so that we can solve the system of equations:

If we write this down as \(\v A \v p=\v 0\), it is clear that \(\v
p\) (our *parameter* vector that we are looking for) is in \(\v A\)’s
null space! Of course the trivial solution (\(\v 0\))
doesn’t make any sense, and we may again be dealing with some point
correspondences that are not quite perfect. So let’s say we’re not looking for
\(\v 0\) exactly, but some vector that has a norm close to \(0\).

The trick is to use \(\v A\)’s Singular Value Decomposition: \(\v A = \v U \v \Sigma \v V^T\). It turns out the best solution for \(\v p\) is the last column of \(\v V\). When programming this, don’t forget you need to reshape this column vector of parameters to be in a \(3 \times 3\) shape.

Proof: why \(\v p\) is the last column of \(\v V\)

Let’s just say we are looking for \(\v p\) being of unit length so that we are not going to find \(\v 0\) as a solution. The primary objective is minimize \(|| \v A \v p ||\) so that we get as close as possible to the ideal that \(\v A \v p = 0\).

Now we need to use the fact that \(\Sigma\) is a diagonal matrix with the singular values on the diagonal, from large to small. So if we want to minimize \(|| \v \Sigma \v n ||\) we should simply set \(\v n\) to \(\begin{bmatrix}0&0&\dots&0&1\end{bmatrix}^T\) to end up with only the smallest value from \(\v \Sigma\). We can fill in: