Section 14.2 Digital Image Processing
At its core, a digital image is a matrix where each entry corresponds to a pixel. The value of the entry can represent the intensity of the pixel (in a grayscale image) or the intensity of the red, green, and blue channels (in a color image).
Each entry of the matrix has coordinates \((i,j)\text{,}\) indicating its position in the matrix, with \(i\) representing the row and \(j\) representing the column. We can consider the coordinates \((i,j)\) as a vector and apply the transformation
\begin{equation*}
T(i,j)=(i',j')\text{.}
\end{equation*}
This allows us to manipulate the entire image by transforming each \((i,j)\) coordinate, producing a new image with pixels at position \((i',j')\text{.}\)
Itβs important to note that in computer science \((0,0)\) is typically located at the top-left corner of the image, with the \(i\)-coordinate increasing downwards and the \(j\)-coordinate increasing to the right. This means that, if we want our output to follow the standard Cartesian coordinate system, our \(x\)-coordinate is actually the vertical coordinate, \(j\text{,}\) and our \(y\)-coordinate is the horizontal coordinate, \(i\text{.}\)
Subsection 14.2.1 Rotating a Single Point
We start with the simplest possible image: a single black pixel in a \(10 \times 10\) grid of white pixels. The black pixel corresponds to a value of \(1\) and the white pixels correspond to a value of \(0\text{.}\) Note that we are using
flip_y=False as an argument of the matrix_plot function to display the image with the origin at the bottom-left.
Before rotating, there is a practical issue to address. Our rotation matrix rotates around the origin \((0, 0)\text{,}\) which can move the transformed pixel outside of the bounds of the image entirely. To prevent this, we pad the image: we embed our matrix inside a larger matrix of zeros using
block_matrix, giving the transformed pixels room to land.
Our pixel now sits at position \((15, 15)\) inside a \(30 \times 30\) grid. We can now define the rotation matrix:
\begin{equation*}
R(\theta) =
\begin{bmatrix}
\cos(\theta) & -\sin(\theta) \\
\sin(\theta) & \cos(\theta)
\end{bmatrix}
\end{equation*}
For example, a transformation matrix that rotates \(45 ^\circ \) will be \(R(\frac{\pi}{4})\text{.}\)
Next, we collect the coordinates of every lit pixel in the image, using a
for() loop to check which entries of the matrix are greater than zero.
We obtain a coordinate vector that indicates there is a black pixel at \((15,15)\text{.}\)
Now we construct a matrix \(D\) whose columns are the coordinate vectors obtained. Then, we multiply \(D\) by the rotation matrix \(R(\theta)\text{,}\) rotating all pixels simultaneously via a single matrix multiplication. We store the rotated image in the original matrix \(AP\)
Both the original pixel and the rotated pixel are now visible. The original pixel sits at \((15, 15)\text{,}\) and a rotation of \(\theta = \pi/4\) maps the pixel to:
\begin{equation*}
R\!\left(\frac{\pi}{4}\right)
\begin{bmatrix} 15 \\ 15 \end{bmatrix}
=
\begin{bmatrix} 0 \\ 15\sqrt{2} \end{bmatrix}
\end{equation*}
Because we canβt map a pixel to a non-integer defined entry, we use
round() to approximate it to the nearest integer.
\begin{equation*}
\begin{bmatrix} 0 \\ 15\sqrt{2} \end{bmatrix}
\approx
\begin{bmatrix} 0 \\ 21 \end{bmatrix}
\end{equation*}
which we can verify directly in the image.
Subsection 14.2.2 Rotating the Letter F
The same approach applies to any shape. We encode the letter βFβ as a \(10 \times 10\) binary matrix.
Plotting with
flip_y=False displays row \(0\) of the matrix at the bottom of the image, with \(y\) increasing upward. This means the matrix must be defined with the bottom of the image in row \(0\) and the top in the last row. As a result, the letter looks upside down when reading the code, but displays correctly when plotted.
We again embed the letter inside a larger grid of zeros. Since the rotation is around the origin and the letter will swing up and to the left, we add more padding in those directions to ensure the rotated pixels stay in bounds.
Now we apply the rotation. Since the function \(R\) is already defined from the previous cell, we can use it directly.
The letter F has been rotated \(45^\circ\) around the origin.
Subsection 14.2.3 Stretching the Letter F
This method can also be applied for other transformation matrices. For example, we can stretch or shrink an image
Matrix \(S\) is the transformation matrix that is used to stretch an image.
\begin{equation*}
S =
\begin{bmatrix}
a & 0 \\
0 & b
\end{bmatrix}
\end{equation*}
The scale that the image will shrink or stretch by is determined by the value of the stretching factor. In matrix \(S\text{,}\) there are 2 stretching factors. The horizontal stretching factor \(a\) will determine how much the image will be stretched in the \(i\) direction. Similarly, the vertical stretching factor \(b\) will determine how much the image will be stretched in the \(j\) direction.
If the stretching factor is less than \(1\text{,}\) then the transformation will shrink. If it is greater than \(1\) then the transformation will stretch. To maintain the original dimension of the image in a certain direction, the factor must be equal to \(1\text{.}\)
For example, if we wanted to shrink the image vertically to be half the original size while maintaining the original horizontal dimensions, we can use the transformation matrix \(S(1,\frac{1}{2})\text{.}\)
To do this transformation, we need to assign the values \(a=1\) and \(b=1/2\) in the transformation matrix \(S\text{.}\)
It is important to note that the way this method works is by changing the original coordinates. While the transformed F is still on the same \(j\)-level as the original F, it is only at halfway between the \(i\)-axis and the original F. Because we are multiplying by a factor of \(1/2\text{,}\) the coordinates are being transformed as well, and are also becoming \(1/2\) of the original coordinates.
Letβs try to stretch the letter F by a factor of \(3\) in both the horizontal and vertical direction. As mentioned earlier, the size of the graph will need to be changed so that there is room for all of the points to land on the graph. To do this, we can either reduce the current coordinates of the original shape by removing some padding on the left and bottom, or we can increase the bounds of the graph to accomodate for the new coordinates.
First, we need to set up our original matrix with the F shape and put it in another matrix with substantative padding so that the shape can stay within the bounds of the matrix.
When we are transforming the shape, we do not add any new pixels. This results in stretched images to contain gaps because we are increasing the distance between the pixels. Shrinking will not have this issue because we are decreasing the distance between the pixels.
