Camera Calibration Theory - justinblaber.org

Single Camera Model

We assume the camera adheres to the “pin-hole” model, where points in space project as straight lines to the camera aperture (the origin of the “scene” coordinate system) and intersect through an image plane at “image points”. This image plane is supposed to represent the ideal physical location of the image sensor, and contains a 2D projection of 3D scene points.

The diagram below describes the model:

Where:

image point: ( $x_i$ , $y_i$ , $f$ )
scene point: ( $x_s$ , $y_s$ , $z_s$ )

We can compute image points from scene points via:

$x_i=f\frac{x_s}{z_s}$

$y_i=f\frac{y_s}{z_s}$

If you delay the division by $z_s$ , the equations become linear:

$\begin{bmatrix}u \\v\\w\end{bmatrix} = \begin{bmatrix}f & 0 & 0\\0 & f & 0\\0 & 0 & 1\end{bmatrix}\begin{bmatrix}x_s \\y_s\\z_s\end{bmatrix}$

$x_i$ and $y_i$ can be recovered by:

$x_i=\frac{u}{w}$

$y_i=\frac{v}{w}$

Transformation from IMAGE POINTS to pixels COORDINATES

We assume there is a simple offset and scaling relationship between $(x_i, y_i)$ and $(x_{pix}, y_{pix})$ .

The diagram below describes this transformation:

Where:

image center: ( $x_o$ , $y_o$ )
scaling factors: $k_x$ , $k_y$

The image center accounts for the fact that the image sensor might be shifted from the optical center, and the scaling factors account for the physical size of the pixels.

This leads to:

$x_{pix} = k_x x_i + x_o = k_x f \frac{x_s}{z_s} + x_o$

$y_{pix} = k_y y_i + y_o = k_y f \frac{y_s}{z_s} + y_o$

Again, delaying the division of $z_s$ results in a linear equation:

$\begin{bmatrix}u^{\prime} \\v^{\prime}\\w^{\prime}\end{bmatrix} = \begin{bmatrix}\alpha_x & 0 & x_o\\0 & \alpha_y & y_o\\0 & 0 & 1\end{bmatrix}\begin{bmatrix}x_s \\y_s\\z_s\end{bmatrix} = \begin{bmatrix}A\end{bmatrix} \begin{bmatrix}x_s \\y_s\\z_s\end{bmatrix}$

where:

$\alpha_x = f k_x$
$\alpha_y = f k_y$
$x_{pix} = \frac{u^{\prime}}{w^{\prime}}$
$y_{pix} = \frac{v^{\prime}}{w^{\prime}}$

In an ideal world, the model would be complete and all we would need to do to characterize the camera “intrinsics” would be to find the $\begin{bmatrix}A\end{bmatrix}$ matrix. But, in the real world, there are distortions due to the camera sensor and lens. Some distortions include radial distortion introduced by lens manufacturing defects, the camera sensor not being completely orthogonal to the lens axis, etc, which are discussed in the next section.

LENs/CAMERA Distortion MODEL

Two popular lens/camera distortion models are heikkila97 and wang08. In my opinion, wang08 makes more sense intuitively than heikkila97, although the overall performance of the two is about the same. Since heikkila97 is used in many toolboxes (e.g Bouguet’s toolbox), I’ll discuss that here for the sake of example.

First, we must make a distinction between distorted and undistorted coordinates:

$x_{pix}^{ideal} = k_x x_i + x_o$

$y_{pix}^{ideal} = k_y y_i + y_o$

and:

$x_{pix}^{distorted} = k_x x_i^{distorted} + x_o$

$y_{pix}^{distorted} = k_y y_i^{distorted} + y_o$

We assume there are radial and tangential lens distortions:

$x_i^{distorted} = x_i + \delta x_i^{(r)} + \delta x_i^{(t)}$

$y_i^{distorted} = y_i + \delta y_i^{(r)} + \delta y_i^{(t)}$

Where:

$\delta x_i^{(r)} = x_i(k_1 r_i^{2} + k_2 r_i^{4})$
$\delta y_i^{(r)} = y_i(k_1 r_i^{2} + k_2 r_i^{4})$
$\delta x_i^{(t)} = 2 p_1 x_i y_i + p_2(r_i^{2} +2 x_i^{2})$
$\delta y_i^{(t)} = p_1(r_i^{2} +2 y_i^{2}) + 2 p_2 x_i y_i$
$r_i = \sqrt{x_i^{2} + y_i^{2}}$

All together:

$x_{pix}^{distorted} = k_x (x_i + \delta x_i^{(r)} + \delta x_i^{(t)}) + x_o$

$y_{pix}^{distorted} = k_y (y_i + \delta y_i^{(r)} + \delta y_i^{(t)}) + y_o$

The following diagram describes this model:

We now have most of the camera model addressed, but for camera calibration, a calibration pattern with known geometry is used. The geometry is described in coordinates relative to the calibration board, known as “world” coordinates and discussed in the next section.

World Coordinates to pixel Coordinates

First, the transformation from world to scene coordinates is a simple rigid body translation and rotation, shown below:

Where:

- world point: ( $X_w$ , $Y_w$ , $Z_w$ )

This leads to:

$\begin{bmatrix}x_s \\y_s\\z_s\end{bmatrix} = \begin{bmatrix} \begin{bmatrix} R \end{bmatrix} \vec{t} \end{bmatrix}\begin{bmatrix}X_w \\ Y_w \\Z_w \\ 1\end{bmatrix}$

If we left-multiply both sides by $\begin{bmatrix}A\end{bmatrix}$ :

$\begin{bmatrix}u^{\prime} \\v^{\prime}\\w^{\prime}\end{bmatrix} = \begin{bmatrix}\alpha_x & 0 & x_o\\0 & \alpha_y & y_o\\0 & 0 & 1\end{bmatrix} \begin{bmatrix} \begin{bmatrix} R \end{bmatrix} \vec{t} \end{bmatrix}\begin{bmatrix}X_w \\ Y_w \\Z_w \\ 1\end{bmatrix}$

This is a linear equation which transforms world coordinates to pixel coordinates (recall: $x_{pix} = \frac{u^{\prime}}{w^{\prime}}$ and $y_{pix} = \frac{v^{\prime}}{w^{\prime}}$ ).

Now, we have our entire model. In order to characterize the camera “intrinsics”, we need to compute $\begin{bmatrix}A\end{bmatrix}$ and the distortion coefficients. The next section describes a numerical implementation to do this.

Single Camera Calibration Numerical Implementation

We now have a full camera model to work with. Some things:

Camera calibration involves using a calibration target with known world coordinates ( $X_w$ , $Y_w$ , and $Z_w$ ). We take pictures of this calibration target, find these known coordinates in pixel coordinates, and then find the “intrinsic” and “extrinsic” parameters, through linear and non-linear optimization, which make:
$x_{pix}^{measured} \approx x_{pix}^{distorted}$

$y_{pix}^{measured} \approx y_{pix}^{distorted}$

i.e. we’re trying to make the model fit our measurements.
We re-parameterized things to be linear in the previous section so that linear optimization can be used. This is possible up until lens/camera distortion is involved. So, an initial guess can be made (usually assuming zero lens distortion, a reasonable assumption for most applications) with linear optimization, and then it is followed up with non-linear refinement with the full camera model.
For non-linear optimization, the rotation matrix needs to be re-parameterized so that it has 3 parameters. This is because the rotation matrix has 9 elements, but only 3 degrees of freedom. I use Euler angles here, but Rodriguez’s formula is also popular.
If the calibration target is flat, then $Z_w=0$ and:
$\begin{bmatrix}u^{\prime} \\v^{\prime}\\w^{\prime}\end{bmatrix} = \begin{bmatrix} A \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} & \vec{t} \end{bmatrix}\begin{bmatrix}X_w \\ Y_w \\ 1\end{bmatrix} = \begin{bmatrix} H \end{bmatrix} \begin{bmatrix}X_w \\ Y_w \\ 1\end{bmatrix}$

$\begin{bmatrix} H \end{bmatrix}$ is known as a homography and acts as a projection operator for homogeneous coordinates. Often, it is found up to a scale factor.

For this article, I’m assuming a 2D calibration board is used for calibration. The next sections explain how calibration (i.e. estimation of “intrinsic” and “extrinsic” parameters) is done for this case.

Computing Homographies

If we have two sets of points, for example a set of points on our calibration board in “world coordinates” and the corresponding set of points in pixel coordinates (i.e. we have detected and localized these points on the calibration board image), then we can approximate the transformation from these two sets of points using a homography. The assumption here is we’re using the pin-hole camera model and there is very little lens distortion.

Let:

$\vec{\tilde{m}} = \begin{bmatrix}u^{\prime} & v^{\prime} & w^{\prime}\end{bmatrix}^T$

$\vec{\tilde{M}} = \begin{bmatrix}X_w & Y_w & 1\end{bmatrix}^T$

$\begin{bmatrix} H \end{bmatrix} = \begin{bmatrix} A \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} & \vec{t} \end{bmatrix}$

Then:

$\vec{\tilde{m}} = \begin{bmatrix} H \end{bmatrix} \vec{\tilde{M}}$

Let:

$\begin{bmatrix} H \end{bmatrix}=\begin{bmatrix} \vec{\bar{h_1}}^T \\ \vec{\bar{h_2}}^T \\ \vec{\bar{h_3}}^T\end{bmatrix}$

Then:

$u^{\prime} = \vec{\bar{h_1}}^T \vec{\tilde{M}}=\frac{u^{\prime}}{w^{\prime}}\vec{\bar{h_3}}^T \vec{\tilde{M}}$

$v^{\prime} = \vec{\bar{h_2}}^T \vec{\tilde{M}}=\frac{v^{\prime}}{w^{\prime}}\vec{\bar{h_3}}^T \vec{\tilde{M}}$

$w^{\prime} = \vec{\bar{h_3}}^T \vec{\tilde{M}}$

We form homogeneous equations:

$\vec{\bar{h_1}}^T \vec{\tilde{M}}-x_{pix}\vec{\bar{h_3}}^T \vec{\tilde{M}} = 0$

$\vec{\bar{h_2}}^T \vec{\tilde{M}}-y_{pix}\vec{\bar{h_3}}^T \vec{\tilde{M}} = 0$

Then, we put them in matrix form:

$\begin{bmatrix} \vec{\tilde{M}}^T & \vec{0} & -x_{pix} \vec{\tilde{M}}^T \\ \vec{0} & \vec{\tilde{M}}^T & -y_{pix} \vec{\tilde{M}}^T \end{bmatrix} \begin{bmatrix} \vec{\bar{h_1}} \\ \vec{\bar{h_2}} \\ \vec{\bar{h_3}} \end{bmatrix} = \begin{bmatrix} L \end{bmatrix} \vec{x} = \vec{0}$

To solve:

$\begin{bmatrix} L \end{bmatrix} \vec{x} = \vec{0}$

The solution is the eigenvector of $\begin{bmatrix} L \end{bmatrix}^T \begin{bmatrix} L \end{bmatrix}$ associated with the smallest eigenvalue. To compute it, use the SVD:

$\begin{bmatrix} L \end{bmatrix} = \begin{bmatrix} U \end{bmatrix} \begin{bmatrix} D \end{bmatrix} \begin{bmatrix} V \end{bmatrix}^T$

$\begin{bmatrix} L \end{bmatrix}^T \begin{bmatrix} L\end{bmatrix}=\begin{bmatrix}V \end{bmatrix}\begin{bmatrix}D \end{bmatrix}^T\begin{bmatrix}U \end{bmatrix}^T\begin{bmatrix}U \end{bmatrix}\begin{bmatrix}D \end{bmatrix}\begin{bmatrix}V \end{bmatrix}^T=\begin{bmatrix}V \end{bmatrix}\begin{bmatrix}D \end{bmatrix}^T\begin{bmatrix}D \end{bmatrix}\begin{bmatrix}V \end{bmatrix}^T$

Assuming the eigenvalues are sorted in descending order, then the last column of $\begin{bmatrix} V \end{bmatrix}$ will contain the least eigenvector.

Note that since this is a homogeneous equation, once we solve for $\vec{x}$ , what we will actually find is $\begin{bmatrix} \hat{H} \end{bmatrix} = \lambda \begin{bmatrix} H \end{bmatrix}$ , i.e. a scaled version of $\begin{bmatrix} A \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} & \vec{t} \end{bmatrix}$ . It’s important to keep this in mind.

Also, there is a normalized version of the above which makes it more stable and accounts for the fact that ( $x_{pix}$ , $y_{pix}$ ) and ( $X_w$ , $Y_w$ ) may have very different scales, which may make $\begin{bmatrix} L \end{bmatrix}$ ill conditioned.

The normalization step does two things:

translates both sets of points so their centroids are at the origin
both sets of points are scaled so the average distance from the origin is equal to $\sqrt{2}$ , which I believe is done to ensure the x and y components are close to 1.

A “normalization matrix” is used:

$\begin{bmatrix}T_m\end{bmatrix} = \begin{bmatrix} s_m & 0 & \bar{-x_{pix}}s_m \\ 0 & s_m & -\bar{y_{pix}}s_m \\ 0 & 0 & 1 \end{bmatrix}$

where:

$\bar{x_{pix}}$ = mean of $x_{pix}$

$\bar{y_{pix}}$ = mean of $y_{pix}$

$s_m=\frac{\sqrt{2}N}{\sum_{i=1}^{N}\sqrt{(x_{pix_i} - \bar{x_{pix}})^2+(y_{pix_i} - \bar{y_{pix}})^2}}$

The same matrix is also constructed for the set of points in world coordinates, which we call $\begin{bmatrix} T_M \end{bmatrix}$ .

Note:

Original homography: $\vec{\tilde{m}} = \begin{bmatrix} H \end{bmatrix} \vec{\tilde{M}}$
Preconditioned homography: $\vec{\tilde{m^{\prime}}} = \begin{bmatrix} H^{\prime} \end{bmatrix} \vec{\tilde{M^{\prime}}}$

where:

$\vec{\tilde{m^{\prime}}} = \begin{bmatrix}T_m\end{bmatrix}\vec{\tilde{m}}$

$\vec{\tilde{M^{\prime}}}= \begin{bmatrix}T_M\end{bmatrix}\vec{\tilde{M}}$

If we do the math to see the updated $\begin{bmatrix} L \end{bmatrix}$ :

$\begin{gathered} \begin{bmatrix} s_m & 0 & \bar{-x_{pix}}s_m \\ 0 & s_m & -\bar{y_{pix}}s_m \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix}u^{\prime} \\v^{\prime}\\w^{\prime}\end{bmatrix} = \begin{bmatrix}s_m u^{\prime}-\bar{x_{pix}}s_m w^{\prime} \\s_m v^{\prime}-\bar{y_{pix}}s_m w^{\prime} \\ w^{\prime} \end{bmatrix}=\begin{bmatrix}s_m (u^{\prime}-\bar{x_{pix}} w^{\prime}) \\s_m (v^{\prime}-\bar{y_{pix}} w^{\prime}) \\ w^{\prime} \end{bmatrix}=\vec{\tilde{m^{\prime}}} \end{gathered}$

$\begin{gathered} \begin{bmatrix} s_M & 0 & \bar{-X_w}s_M \\ 0 & s_M & -\bar{Y_w}s_M \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix}X_w \\ Y_w \\ 1 \end{bmatrix} = \begin{bmatrix}s_M X_w-\bar{X_w}s_M \\s_M Y_w-\bar{Y_w}s_M \\ 1 \end{bmatrix}=\begin{bmatrix}s_M (X_w-\bar{X_w}) \\s_M (Y_w-\bar{Y_w}) \\ 1 \end{bmatrix}=\vec{\tilde{M^{\prime}}} \end{gathered}$

Then:

$s_m (u^{\prime}-\bar{x_{pix}} w^{\prime}) = \vec{\bar{h_1^{\prime}}}^T \vec{\tilde{M^{\prime}}}=\frac{s_m (u^{\prime}-\bar{x_{pix}} w^{\prime})}{w^{\prime}}\vec{\bar{h_3^{\prime}}}^T \vec{\tilde{M^{\prime}}}$

$s_m (v^{\prime}-\bar{y_{pix}} w^{\prime}) = \vec{\bar{h_2^{\prime}}}^T \vec{\tilde{M^{\prime}}}=\frac{s_m (v^{\prime}-\bar{y_{pix}} w^{\prime}) }{w^{\prime}}\vec{\bar{h_3^{\prime}}}^T \vec{\tilde{M^{\prime}}}$

$w^{\prime} = \vec{\bar{h_3^{\prime}}}^T \vec{\tilde{M^{\prime}}}$

We form homogeneous equations:

$\vec{\bar{h_1^{\prime}}}^T \vec{\tilde{M^{\prime}}}-s_m(x_{pix}-\bar{x_{pix}})\vec{\bar{h_3^{\prime}}}^T \vec{\tilde{M^{\prime}}} = 0$

$\vec{\bar{h_2^{\prime}}}^T \vec{\tilde{M^{\prime}}}-s_m(y_{pix}-\bar{y_{pix}})\vec{\bar{h_3^{\prime}}}^T \vec{\tilde{M^{\prime}}} = 0$

Then, we put them in matrix form:

$\begin{bmatrix} \vec{\tilde{M^{\prime}}}^T & \vec{0} & -s_m(x_{pix}-\bar{x_{pix}}) \vec{\tilde{M^{\prime}}}^T \\ \vec{0} & \vec{\tilde{M^{\prime}}}^T & -s_m(y_{pix}-\bar{y_{pix}}) \vec{\tilde{M^{\prime}}}^T \end{bmatrix} \begin{bmatrix} \vec{\bar{h_1^{\prime}}} \\ \vec{\bar{h_2^{\prime}}} \\ \vec{\bar{h_3^{\prime}}} \end{bmatrix} = \begin{bmatrix} L^{\prime} \end{bmatrix} \vec{x^{\prime}}= \vec{0}$

We can now solve for $\begin{bmatrix} \hat{H^{\prime}} \end{bmatrix}$ using $\begin{bmatrix} L^{\prime} \end{bmatrix} \vec{x^{\prime}} = \vec{0}$ . But, we must recover the original homography after we have computed the preconditioned homography; to recover it:

$\begin{bmatrix}T_m\end{bmatrix}\vec{\tilde{m}}=\begin{bmatrix} {H}^{\prime}\end{bmatrix} \begin{bmatrix}T_M\end{bmatrix}\vec{\tilde{M}}$

$\vec{\tilde{m}}=\begin{bmatrix}T_m\end{bmatrix}^{-1}\begin{bmatrix} H^{\prime}\end{bmatrix} \begin{bmatrix}T_M\end{bmatrix}\vec{\tilde{M}}$

This means:

$\begin{bmatrix}H\end{bmatrix} = \begin{bmatrix}T_m\end{bmatrix}^{-1}\begin{bmatrix} H^{\prime}\end{bmatrix} \begin{bmatrix}T_M\end{bmatrix}$

Which implies:

$\begin{bmatrix}\hat{H}\end{bmatrix} = \begin{bmatrix}T_m\end{bmatrix}^{-1}\begin{bmatrix} \hat{H^{\prime}}\end{bmatrix} \begin{bmatrix}T_M\end{bmatrix}$

So, to use the preconditioned approach:

Use $\begin{bmatrix} L^{\prime} \end{bmatrix} \vec{x^{\prime}} = \vec{0}$ and the SVD method to solve for $\begin{bmatrix} \hat{H^{\prime}} \end{bmatrix}$
Recover $\begin{bmatrix}\hat{H}\end{bmatrix}$ with $\begin{bmatrix}T_m\end{bmatrix}^{-1}\begin{bmatrix}\hat{ H^{\prime}}\end{bmatrix} \begin{bmatrix}T_M\end{bmatrix}$

So now we have a linear method for computing homographies. Another refinement you can do is a nonlinear optimization of the homography using the linear method as the initial guess. This is more or less optional but included for completeness.

The first step is to divide the initial linear homography guess by $\begin{bmatrix}\hat{H}\end{bmatrix}_{3,3}$ so that $\begin{bmatrix}\hat{H}\end{bmatrix}_{3,3}=1$ . Homographies have 8 degrees of freedom and this is a typical constraint added to parameterize the homography for nonlinear optimization.

Next, we derive the Jacobian so we can perform Gauss-Newton iterations (note the initial guess is typically very good and the optimization is well behaved, so in my experience more robust methods like Levenberg-Marquardt aren’t necessary).

We start with:

$\begin{bmatrix}u^{\prime} \\v^{\prime}\\w^{\prime}\end{bmatrix} = \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & 1 \end{bmatrix} \begin{bmatrix}X_w \\ Y_w \\ 1\end{bmatrix}$

Let:

$x_{pix}^{ideal} = \frac{u^{\prime}}{w^{\prime}} = \frac{h_{11}X_w+h_{12}Y_w+h_{13}}{h_{31}X_w+h_{32}Y_w+1}$

$y_{pix}^{ideal} = \frac{v^{\prime}}{w^{\prime}} = \frac{h_{21}X_w+h_{22}Y_w+h_{23}}{h_{31}X_w+h_{32}Y_w+1}$

Set the cost function to:

$C_{LS}=\sum_{i=1}^{N}(x^i_p-x^m_p)^2+(y^i_p-y^m_p)^2$

where i = ideal, m = measured, and p = pix.

To compute the Jacobian, we start off with the residuals:

$res_x=x^i(\vec{h})-x^m$

$res_y=y^i(\vec{h})-y^m$

Take the partial derivative:

$\frac{\partial{res_x}}{\partial{\vec{h}}} = \frac{\partial{x^i}}{\partial{\vec{h}}}$

$\frac{\partial{res_y}}{\partial{\vec{h}}} = \frac{\partial{y^i}}{\partial{\vec{h}}}$

The Jacobian has the form:

$\begin{bmatrix}J\end{bmatrix}_{ij} = \begin{bmatrix} \frac{\partial{res_i}}{\partial{\vec{h_j}}}\end{bmatrix}$

Where:

And you can perform Gauss-Newton updates using the pseudo inverse (although you really should use a direct linear solver):

$\Delta\vec{h} = -\begin{bmatrix}J\end{bmatrix}^+\vec{res}$

$\vec{h_{new}} = \vec{h_{old}} + \Delta\vec{h}$

Until some convergence condition.

Now, we have a method to determine a fully optimized homography given two sets of points.

Initial guess of intrinsic parameters

After homographies for each image of a calibration board are computed, we need to compute an initial guess for the intrinsic parameters.

We can only find an initial linear guess for the parameters in $\begin{bmatrix}A\end{bmatrix}$ : $x_o$ , $y_o$ , $\alpha_x$ , and $\alpha_y$ . We assume that the lens distortion parameters are zero, which is reasonable for most camera/lens combinations.

Furthermore, we assume the following:

$x_o = \frac{image \textunderscore width+1}{2}$
$y_o = \frac{image \textunderscore height+1}{2}$
$\alpha_x = \alpha_y$

We assume the first two things because the principle point estimation is usually difficult and unstable, so we assume it’s at the middle of the image (this also assumes the first pixel has a position of (1,1); i.e. one-based indexing). It’s also a good assumption for most cameras that $k_x$ = $k_y$ (assuming the pixels are square) and hence $\alpha_x$ = $\alpha_y$ = $\alpha$ . These constraints will allow us to get a better initial guess for $\alpha$ .

Let:

$\begin{bmatrix}\bar{H}\end{bmatrix}=\begin{bmatrix}1 & 0 & -x_o \\ 0 & 1 & -y_o \\0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \hat{H} \end{bmatrix} = \lambda \begin{bmatrix} \alpha & 0 & 0 \\ 0 & \alpha & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} & \vec{t} \end{bmatrix}$

Now, let:

$\vec{v_1} = \begin{bmatrix} \bar{H} \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} = \lambda \begin{bmatrix} \alpha & 0 & 0 \\ 0 & \alpha & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix}$

$\vec{v_2} = \begin{bmatrix} \bar{H} \end{bmatrix} \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} = \lambda \begin{bmatrix} \alpha & 0 & 0 \\ 0 & \alpha & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix}$

$\vec{v_3} = \begin{bmatrix} \bar{H} \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} = \lambda \begin{bmatrix} \alpha & 0 & 0 \\ 0 & \alpha & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix}$

$\vec{v_4} = \begin{bmatrix} \bar{H} \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} = \lambda \begin{bmatrix} \alpha & 0 & 0 \\ 0 & \alpha & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix}$

Then:

$\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \frac{1}{\lambda} \begin{bmatrix} \vec{r_1}^T \\ \vec{r_2}^T \end{bmatrix} \begin{bmatrix} \frac{1}{\alpha} & 0 & 0 \\ 0 & \frac{1}{\alpha} & 0 \\ 0 & 0 & 1 \end{bmatrix} \vec{v_1}$

$\begin{bmatrix} 0 \\ 1 \end{bmatrix} = \frac{1}{\lambda} \begin{bmatrix} \vec{r_1}^T \\ \vec{r_2}^T \end{bmatrix} \begin{bmatrix} \frac{1}{\alpha} & 0 & 0 \\ 0 & \frac{1}{\alpha} & 0 \\ 0 & 0 & 1 \end{bmatrix} \vec{v_2}$

$\begin{bmatrix} 1 \\ 1 \end{bmatrix} = \frac{1}{\lambda} \begin{bmatrix} \vec{r_1}^T \\ \vec{r_2}^T \end{bmatrix} \begin{bmatrix} \frac{1}{\alpha} & 0 & 0 \\ 0 & \frac{1}{\alpha} & 0 \\ 0 & 0 & 1 \end{bmatrix} \vec{v_3}$

$\begin{bmatrix} 1 \\ -1 \end{bmatrix} = \frac{1}{\lambda} \begin{bmatrix} \vec{r_1}^T \\ \vec{r_2}^T \end{bmatrix} \begin{bmatrix} \frac{1}{\alpha} & 0 & 0 \\ 0 & \frac{1}{\alpha} & 0 \\ 0 & 0 & 1 \end{bmatrix} \vec{v_4}$

Then:

$\begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \bigg( \frac{1}{\lambda} \vec{v_1}^T \begin{bmatrix} \frac{1}{\alpha} & 0 & 0 \\ 0 & \frac{1}{\alpha} & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} \end{bmatrix}\bigg) \bigg(\frac{1}{\lambda} \begin{bmatrix} \vec{r_1}^T \\ \vec{r_2}^T \end{bmatrix} \begin{bmatrix} \frac{1}{\alpha} & 0 & 0 \\ 0 & \frac{1}{\alpha} & 0 \\ 0 & 0 & 1 \end{bmatrix} \vec{v_2} \bigg) = 0$

$\vec{v_1}^T \begin{bmatrix} \frac{1}{\alpha^2} & 0 & 0 \\ 0 & \frac{1}{\alpha^2} & 0 \\ 0 & 0 & 1 \end{bmatrix} \vec{v_2} = 0$

$\frac{1}{\alpha^2} v_{11} v_{21} + \frac{1}{\alpha^2} v_{12} v_{22} + v_{13} v_{23} = 0$

We arrive at the following constraint:

$\frac{1}{\alpha^2} ( v_{11} v_{21} + v_{12} v_{22} ) = - v_{13} v_{23}$

If you repeat the same steps for: $\begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix}$ , you get the additional constraint:

$\frac{1}{\alpha^2} ( v_{31} v_{41} + v_{32} v_{42} ) = - v_{33} v_{43}$

We put these constraints into matrix form:

$\begin{bmatrix} v_{11} v_{21} + v_{12} v_{22} \\ v_{31} v_{41} + v_{32} v_{42} \end{bmatrix} (\frac{1}{\alpha^2}) = \begin{bmatrix} -v_{13} v_{23} \\ -v_{33} v_{43} \end{bmatrix}$

Let:

$\vec{a} x = \vec{b}$

where:

$\vec{a} = \begin{bmatrix} v_{11} v_{21} + v_{12} v_{22} \\ v_{31} v_{41} + v_{32} v_{42} \end{bmatrix}$
$x = \frac{1}{\alpha^2}$
$\vec{b} = \begin{bmatrix} -v_{13} v_{23} \\ -v_{33} v_{43} \end{bmatrix}$

Then, you can solve for $x$ and take the inverse and square root to recover $\alpha$ .

Some things:

You can normalize $\vec{v_1}$ , $\vec{v_2}$ , $\vec{v_3}$ , and $\vec{v_4}$ by dividing each with their respective norm before solving for $x$ . This does not effect the orthogonality constraint and prevents issues in case one of the $\vec{v}$ ‘s are very large.
You can solve for $x$ using least squares. However, you can also solve the $x$ that makes the projection of $\vec{a}$ on $\vec{b}$ equal to $\vec{b}$ :
$x \frac{<\vec{a}, \vec{b}>}{\sqrt{<\vec{b},\vec{b}>}} = \sqrt{<\vec{b},\vec{b}>}$

$x = \frac{<\vec{b},\vec{b}>}{<\vec{a},\vec{b}>}$

$\alpha = \sqrt{\frac{<\vec{a},\vec{b}>}{<\vec{b},\vec{b}>}}$

This is what Bouguet does in his Matlab toolbox and from my tests appears to work slightly better than the least squares results. I have a feeling it might be because you obtain a larger $x$ when $\vec{a}$ and $\vec{b}$ are not collinear, which in turn results in a smaller $\alpha$ and prevents over estimates, but beyond that I’m not really sure.

Initial guess of extrinsic parameters

The next step is to get an initial guess of the extrinsic parameters per calibration board image.

Since we now have an initial guess for $\begin{bmatrix} A \end{bmatrix}$ :

$\begin{bmatrix} \hat{H} \end{bmatrix} = \begin{bmatrix} \vec{\hat{h_1}} & \vec{\hat{h_2}} & \vec{\hat{h_3}} \end{bmatrix}= \lambda \begin{bmatrix} A \end{bmatrix} \begin{bmatrix} \vec{r_1} & \vec{r_2} & \vec{t} \end{bmatrix}$

$\begin{bmatrix} A \end{bmatrix}^{-1} \begin{bmatrix} \vec{\hat{h_1}} & \vec{\hat{h_2}} & \vec{\hat{h_3}} \end{bmatrix}= \lambda \begin{bmatrix} \vec{r_1} & \vec{r_2} & \vec{t} \end{bmatrix}$

This results in the following:

$\vec{r_1} = \frac{1}{\lambda} \begin{bmatrix} A \end{bmatrix}^{-1} \vec{\hat{h_1}}$

$\vec{r_2} = \frac{1}{\lambda} \begin{bmatrix} A \end{bmatrix}^{-1} \vec{\hat{h_2}}$

$\vec{t} = \frac{1}{\lambda} \begin{bmatrix} A \end{bmatrix}^{-1} \vec{\hat{h_3}}$

Since $\vec{r_1}$ and $\vec{r_2}$ are supposed to be unit vectors:

$\lambda_1 = \| \begin{bmatrix} A \end{bmatrix}^{-1} \vec{\hat{h_1}} \|$

$\lambda_2 = \| \begin{bmatrix} A \end{bmatrix}^{-1} \vec{\hat{h_2}} \|$

We can then solve for the extrinsics:

$\vec{r_1} = \frac{1}{\lambda_1} \begin{bmatrix} A \end{bmatrix}^{-1} \vec{\hat{h_1}}$

$\vec{r_2} = \frac{1}{\lambda_2} \begin{bmatrix} A \end{bmatrix}^{-1} \vec{\hat{h_2}}$

$\vec{r_3} = \vec{r_1} \times \vec{r_2}$

$\vec{t} = \frac{1}{(\lambda_1+\lambda_2)/2} \begin{bmatrix} A \end{bmatrix}^{-1} \vec{\hat{h_3}}$

A few things:

The parameterization of the homography is only unique because we constrained $\lambda$ to be positive. To see the effect this has, look at $\vec{t}$ ; in particular $t_z$ :
$t_z = \frac{\hat{h}_{3,3}}{\lambda}$

$\hat{h}_{3,3}$ is 1 (recall we set this constraint when we computed the homography). So, if $\lambda$ is constrained to be positive, the effect this has is to ensure $t_z$ is positive. This is correct, as the calibration board is in the positive z direction.
${\lambda_1}$ and ${\lambda_2}$ should be equal, but will probably be slightly off due to some combination of model misfit, noise, etc… So we take the average of the two when computing $\vec{t}$ .
$\vec{r_1}$ and $\vec{r_2}$ should be orthogonal, but again, due to model misfit, noise, etc… will probably be slightly off. To make $\begin{bmatrix} R \end{bmatrix}$ an orthogonal matrix, we can compute the best orthogonal approximation:
$\begin{bmatrix} \vec{r_1} & \vec{r_2} & \vec{r_3} \end{bmatrix} = \begin{bmatrix} R \end{bmatrix} \approx \begin{bmatrix} \hat{R} \end{bmatrix}$

Take the SVD of $\begin{bmatrix} R\end{bmatrix}$ :

$\begin{bmatrix} R \end{bmatrix} = \begin{bmatrix} U \end{bmatrix}\begin{bmatrix} D \end{bmatrix}\begin{bmatrix} V \end{bmatrix}^T$

Then:

$\begin{bmatrix} \hat{R} \end{bmatrix} = \begin{bmatrix} U \end{bmatrix} \begin{bmatrix} V \end{bmatrix}^T$

Solves:

$_{\substack{min \\ \begin{bmatrix} \hat{R}\end{bmatrix}}} \| \begin{bmatrix} \hat{R} \end{bmatrix} - \begin{bmatrix} R \end{bmatrix} \|^2_F$ subject to $\begin{bmatrix} \hat{R}\end{bmatrix}^T \begin{bmatrix} \hat{R}\end{bmatrix} = \begin{bmatrix}I\end{bmatrix}$

I won’t give a proof, but it should be somewhat intuitive.
You can (optionally) do nonlinear refinement of just the extrinsics (on a per calibration board image basis) before optimizing all parameters, but this only requires modifying the global optimization problem slightly, which I’ll talk about next.

Nonlinear Refinement of All Parameters (Single Camera)

Note that by now we have initial guesses for all intrinsic and extrinsic parameters. We can now define a cost function and optimize it across all intrinsic and extrinsic parameters.

But, before we do that, it makes things easier to parameterize the camera model with “normalized” coordinates.

Let:

$x_n = \frac{x_s}{z_s}$

$y_n = \frac{y_s}{z_s}$

$r_n = \sqrt{x_n^2 + y_n^2}$

Then:

$x_i = f x_n$

$y_i = f y_n$

$r_i = \sqrt{f^2(x_n^2 + y_n^2)} = f\sqrt{x_n^2+y_n^2} = f r_n$

Lets parameterize our camera model with normalized coordinates:

$x_{p}^{d} = k_x (x_i + \delta x_i^{(r)} + \delta x_i^{(t)}) + x_o$

$x_{p}^{d} = k_x (x_i + x_i(k_1 r_i^{2} + k_2 r_i^{4}) + 2 p_1 x_i y_i + p_2(r_i^{2} +2 x_i^{2})) + x_o$

$x_{p}^{d} = k_x (f x_n (1 + f^2 k_1 r_n^{2} + f^4 k_2 r_n^{4}) + 2 f^2 p_1 x_n y_n + f^2 p_2 (r_n^{2} +2 x_n^{2})) + x_o$

$x_{p}^{d} = f k_x ( x_n (1 + f^2 k_1 r_n^{2} + f^4 k_2 r_n^{4}) + 2 f p_1 x_n y_n + f p_2 (r_n^{2} +2 x_n^{2})) + x_o$

$x_{p}^{d} = \alpha_x ( x_n (1 + \beta_1 r_n^{2} + \beta_2 r_n^{4}) + 2 \beta_3 x_n y_n + \beta_4 (r_n^{2} +2 x_n^{2})) + x_o$

Doing the same for $y_p^d$ , we obtain:

$y_{p}^{d} = \alpha_y ( y_n (1 + \beta_1 r_n^{2} + \beta_2 r_n^{4}) + \beta_3 (r_n^{2} +2 y_n^{2}) + 2 \beta_4 x_n y_n ) + y_o$

Where:

$\beta_1 = f^2 k_1$
$\beta_2 = f^4 k_2$
$\beta_3 = f p_1$
$\beta_4 = f p_2$

Let:

$\vec{m}$ = parameterization of extrinsic parameters (expanded on soon)

Now, take the partial derivatives of $x_{p}^{d}$ :

$\frac{\partial{x^d_p}}{\partial{\alpha_x}} = x_n(1+\beta_1 r_n^2 + \beta_2 r_n^4 ) + 2 \beta_3 x_n y_n + \beta_4 (r_n^2 +2 x_n^2)$

$\frac{\partial{x^d_p}}{\partial{\alpha_y}} = 0$

$\frac{\partial{x^d_p}}{\partial{x_o}} = 1$

$\frac{\partial{x^d_p}}{\partial{y_o}} = 0$

$\frac{\partial{x^d_p}}{\partial{\beta_1}} = \alpha_x x_n r_n^2$

$\frac{\partial{x^d_p}}{\partial{\beta_2}} = \alpha_x x_n r_n^4$

$\frac{\partial{x^d_p}}{\partial{\beta_3}} = 2 \alpha_x x_n y_n$

$\frac{\partial{x^d_p}}{\partial{\beta_4}} = \alpha_x(r_n^2 + 2 x_n^2)$

$\begin{gathered}\frac{\partial{x^d_p}}{\partial{\vec{m}}} = \alpha_x[\frac{\partial{x_n}}{\partial{\vec{m}}}(1+\beta_1 r_n^2 + \beta_2 r_n^4) + x_n(2 \beta_1 r_n \frac{\partial{r_n}} {\partial{\vec{m}}} + 4 \beta_2 r_n^3 \frac{\partial{r_n}}{\partial{\vec{m}}}) + 2 \beta_3 (\frac{\partial{x_n}}{\partial{\vec{m}}} y_n + x_n\frac{\partial{y_n}}{\partial{\vec{m}}}) + \beta_4(2 r_n \frac{\partial{r_n}}{\partial{\vec{m}}} + 4 x_n \frac{\partial{x_n}}{\partial{\vec{m}}})] \end{gathered}$

Do the same for $y_{p}^{d}$ :

$\frac{\partial{y^d_p}}{\partial{\alpha_x}} = 0$

$\frac{\partial{y^d_p}}{\partial{\alpha_y}} = y_n(1+\beta_1 r_n^2 + \beta_2 r_n^4 ) + \beta_3 (r_n^2 +2 y_n^2) + 2 \beta_4 x_n y_n$

$\frac{\partial{y^d_p}}{\partial{x_o}} = 0$

$\frac{\partial{y^d_p}}{\partial{y_o}} = 1$

$\frac{\partial{y^d_p}}{\partial{\beta_1}} = \alpha_y y_n r_n^2$

$\frac{\partial{y^d_p}}{\partial{\beta_2}} = \alpha_y y_n r_n^4$

$\frac{\partial{y^d_p}}{\partial{\beta_3}} = \alpha_y(r_n^2 + 2 y_n^2)$

$\frac{\partial{y^d_p}}{\partial{\beta_4}} = 2 \alpha_y x_n y_n$

$\begin{gathered}\frac{\partial{y^d_p}}{\partial{\vec{m}}} = \alpha_y[\frac{\partial{y_n}}{\partial{\vec{m}}}(1+\beta_1 r_n^2 + \beta_2 r_n^4) + y_n(2 \beta_1 r_n \frac{\partial{r_n}} {\partial{\vec{m}}} + 4 \beta_2 r_n^3 \frac{\partial{r_n}}{\partial{\vec{m}}}) + \beta_3(2 r_n \frac{\partial{r_n}}{\partial{\vec{m}}} + 4 y_n \frac{\partial{y_n}}{\partial{\vec{m}}})+ 2 \beta_4 (\frac{\partial{x_n}}{\partial{\vec{m}}} y_n + x_n\frac{\partial{y_n}}{\partial{\vec{m}}})] \end{gathered}$

Note that:

$r_n = (x_n^2 + y_n^2)^{\frac{1}{2}}$

$\frac{\partial{r_n}}{\partial{\vec{m}}} = \frac{1}{2}(x_n^2 + y_n^2)^{-\frac{1}{2}}(2 x_n \frac{\partial{x_n}}{\partial{\vec{m}}} + 2 y_n \frac{\partial{y_n}}{\partial{\vec{m}}}) = \frac{x_n\frac{\partial{x_n}}{\partial{\vec{m}}} + y_n \frac{\partial{y_n}}{\partial{\vec{m}}}}{r_n}$

Now we must find $\frac{\partial{x_n}}{\partial{\vec{m}}}$ and $\frac{\partial{y_n}}{\partial{\vec{m}}}$ .

But, before we can do that, we must parameterize rotation. For that, I’ve elected to use Euler angles; I specify a X-Y-Z rotation sequence:

$\begin{bmatrix} R \end{bmatrix} = \begin{bmatrix} R_z(\theta_z)\end{bmatrix} \begin{bmatrix} R_y(\theta_y) \end{bmatrix} \begin{bmatrix} R_x(\theta_x) \end{bmatrix}$

$\begin{bmatrix} R \end{bmatrix} = \begin{bmatrix} cos(\theta_z) & -sin(\theta_z) & 0 \\ sin(\theta_z) & cos(\theta_z) & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} cos(\theta_y) & 0 & sin(\theta_y) \\ 0 & 1 & 0 \\ -sin(\theta_y) & 0 & cos(\theta_y) \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & cos(\theta_x) & -sin(\theta_x) \\ 0 & sin(\theta_x) & cos(\theta_x) \end{bmatrix}$

$\begin{bmatrix} R \end{bmatrix} = \begin{bmatrix} cos(\theta_y) cos(\theta_z) & -cos(\theta_x)sin(\theta_z) + sin(\theta_x)sin(\theta_y)cos(\theta_z) & sin(\theta_x)sin(\theta_z)+cos(\theta_x)sin(\theta_y)cos(\theta_z) \\ cos(\theta_y)sin(\theta_z) & cos(\theta_x)cos(\theta_z) + sin(\theta_x)sin(\theta_y)sin(\theta_z) & -sin(\theta_x)cos(\theta_z)+cos(\theta_x)sin(\theta_y)sin(\theta_z) \\ -sin(\theta_y) & sin(\theta_x)cos(\theta_y) & cos(\theta_x) cos(\theta_y) \end{bmatrix}$

Note that the inverse is:

$\theta_x = atan2(R_{32}, R_{33})$

$\theta_y = atan2(-R_{31}, \sqrt{R_{11}^2 + R_{21}^2})$

$\theta_z = atan2(R_{21}, R_{11})$

We take the Jacobian:

$\begin{bmatrix} \frac{\partial{\vec{R}}}{\partial{\vec{\theta}}} \end{bmatrix} = \begin{blockarray}{cccc} & \partial{\theta_x} & \partial{\theta_y} & \partial{\theta_z} \\ \begin{block}{c[ccc@{\hspace*{5pt}}]} \partial{R_{11}} & 0 & $-sin(\theta_y)cos(\theta_z)$ & $-cos(\theta_y)sin(\theta_z)$ \\ \partial{R_{21}} & 0 & $-sin(\theta_y)sin(\theta_z)$ & $cos(\theta_y)cos(\theta_z)$ \\ \partial{R_{31}} & 0 & $-cos(\theta_y)$ & 0 \\ \partial{R_{12}} & $sin(\theta_x)sin(\theta_z)+cos(\theta_x)sin(\theta_y)cos(\theta_z)$ & $sin(\theta_x)cos(\theta_y)cos(\theta_z)$ & $-cos(\theta_x)cos(\theta_z)-sin(\theta_x)sin(\theta_y)sin(\theta_z)$ \\ \partial{R_{22}} & $-sin(\theta_x)cos(\theta_z)+cos(\theta_x)sin(\theta_y)sin(\theta_z)$ & $sin(\theta_x)cos(\theta_y)sin(\theta_z)$ & $-cos(\theta_x)sin(\theta_z) + sin(\theta_x) sin(\theta_y) cos(\theta_z)$ \\ \partial{R_{32}} & $cos(\theta_x)cos(\theta_y)$ & $-sin(\theta_x)sin(\theta_y)$ & 0 \\ \partial{R_{13}} & $cos(\theta_x)sin(\theta_z)-sin(\theta_x)sin(\theta_y)cos(\theta_z)$ & $cos(\theta_x)cos(\theta_y)cos(\theta_z)$ & $sin(\theta_x)cos(\theta_z)-cos(\theta_x)sin(\theta_y)sin(\theta_z)$ \\ \partial{R_{23}} & $-cos(\theta_x)cos(\theta_z)-sin(\theta_x)sin(\theta_y)sin(\theta_z)$ & $cos(\theta_x)cos(\theta_y)sin(\theta_z)$ & $sin(\theta_x)sin(\theta_z) + cos(\theta_x)sin(\theta_y)cos(\theta_z)$ \\ \partial{R_{33}} & $-sin(\theta_x)cos(\theta_y)$ & $-cos(\theta_x)sin(\theta_y)$ & 0 \\ \end{block} \end{blockarray}$

Now, we can compute the full Jacobian:

$\begin{bmatrix} \frac{\partial{\vec{p_n}}}{\partial{\vec{m}}} \end{bmatrix} = \begin{bmatrix} \frac{\partial{\vec{p_n}}}{\partial{\vec{Rt}}} \end{bmatrix} \begin{bmatrix} \frac{\partial{\vec{Rt}}}{\partial{\vec{m}}} \end{bmatrix}$

Where:

$\begin{bmatrix} \frac{\partial{\vec{p_n}}}{\partial{\vec{Rt}}} \end{bmatrix} = \begin{blockarray}{cccccccccc} & \partial{R_{11}} & \partial{R_{21}} & \partial{R_{31}} & \partial{R_{12}} & \partial{R_{22}} & \partial{R_{32}} & \partial{t_x} & \partial{t_y} & \partial{t_z} \\ \begin{block}{c[ccccccccc@{\hspace*{5pt}}]} \partial{x_n} & $\frac{X_w}{z_s}$ & 0 & $\frac{-x_s X_w}{z_s^2}$ & $\frac{Y_w}{z_s}$ & 0 & $\frac{-x_s Y_w}{z_s^2}$ & $\frac{1}{z_s}$ & 0 & $\frac{-x_s}{z_s^2}$ \\ \partial{y_n} & 0 & $\frac{X_w}{z_s}$ & $\frac{-y_s X_w}{z_s^2}$ & 0 & $\frac{Y_w}{z_s}$ & $\frac{-y_s Y_w}{z_s^2}$ & 0 & $\frac{1}{z_s}$ & $\frac{-y_s}{z_s^2}}$ \\ \end{block} \end{blockarray}$

and:

$\begin{bmatrix} \frac{\partial{\vec{Rt}}}{\partial{\vec{m}}} \end{bmatrix} = \begin{blockarray}{ccccccc} & \partial{\theta_x} & \partial{\theta_y} & \partial{\theta_z} & \partial{t_x} & \partial{t_y} & \partial{t_z} \\ \begin{block}{c[ccc|ccc@{\hspace*{5pt}}]} \partial{R_{11}} & \BAmulticolumn{3}{c|}{\multirow{6}{*}{$From\ \begin{bmatrix} \frac{\partial{\vec{R}}}{\partial{\vec{\theta}}} \end{bmatrix}$}} & \BAmulticolumn{3}{c}{\multirow{6}{*}{$0$}} \\ \partial{R_{21}} & & & & & & \\ \partial{R_{31}} & & & & & & \\ \partial{R_{12}} & & & & & & \\ \partial{R_{22}} & & & & & & \\ \partial{R_{32}} & & & & & & \\ \cline{2-7}% don't use \hline \partial{t_x} & \BAmulticolumn{3}{c|}{\multirow{3}{*}{$0$}} & 1 & 0 & 0 \\ \partial{t_y} & & & & 0 & 1 & 0 \\ \partial{t_z} & & & & 0 & 0 & 1 \\ \end{block} \end{blockarray}$

We can now define our cost function for single camera calibration:

$C_{LS} = \sum_{j=1}^{M}\sum_{i=1}^{N} (x_{ij}^d-x_{ij}^m)^2 + (y_{ij}^d-y_{ij}^m)^2 = \sum_{j=1}^{M}\sum_{i=1}^{N}res_{x_{ij}}^2 + res_{y_{ij}}^2$

Where:

M = number of calibration board images
N = number of points on calibration board

We put all of our parameters (intrinsic + extrinsic) in a single vector, $\vec{p}$ :

$\vec{p} = (\overbrace{\alpha_x,\ \alpha_y,\ x_o,\ y_o,\ \beta_1,\ \beta_2,\ \beta_3,\ \beta_4}^8,\ \overbrace{\theta_{x_1},\ \theta_{y_1},\ \theta_{z_1},\ t_{x_1},\ t_{y_1},\ t_{z_1},\ ...,\ \theta_{x_M},\ \theta_{y_M},\ \theta_{z_M},\ t_{x_M},\ t_{y_M},\ t_{z_M}}^{6M})$

Which results in 8+6M total parameters.

To compute the Jacobian, we start off with the residuals:

$res_x = x^d(\vec{p})-x^m$

$res_y = y^d(\vec{p})-y^m$

Take the partial derivative:

$\frac{\partial{res_x}}{\partial{\vec{p}}} = \frac{\partial{x}^d}{\partial{\vec{p}}}$

$\frac{\partial{res_y}}{\partial{\vec{p}}} = \frac{\partial{y}^d}{\partial{\vec{p}}}$

The Jacobian has the form:

$\begin{bmatrix}J\end{bmatrix}_{ij} = \begin{bmatrix} \frac{\partial{res_i}}{\partial{\vec{p_j}}}\end{bmatrix}$

Where:

Where the $X$ ‘s represent filled blocks and $0$ ‘s are empty blocks. This makes intuitive since, as the intrinsic parameters will affect the residuals for all calibration board images. For extrinsics, only the extrinsic parameters for a specific calibration board image will affect its residuals.

To solve this, I’d recommend using the Levenberg-Marquardt method since the nonlinear optimization is not well-behaved under certain circumstances (In my tests, if very few calibration boards are provided, it can cause the Gauss-Newton method to fail):

$\Delta\vec{p} = -(\begin{bmatrix}J\end{bmatrix}^T \begin{bmatrix} J \end{bmatrix} + \lambda\begin{bmatrix}I\end{bmatrix})^{-1}\begin{bmatrix}J\end{bmatrix}^T \vec{res}$

$\vec{p_{new}} = \vec{p_{old}} + \Delta\vec{p}$

As usual, it’s recommended to use a sparse linear solver instead of directly taking the inverse.

Some things:

All initial guesses are used from the previous sections. $\beta_1$ , $\beta_2$ , $\beta_3$ , and $\beta_4$ are all initialized to zero.
$\lambda$ is usually initialized to 0.01 and will either grow or shrink every iteration depending on whether the mean squared error (MSE) increases or decreases. If MSE decreases, then $\lambda$ is decreased and we proceed to the next iteration. If MSE increases, then we increase $\lambda$ until MSE descreases. Increasing lambda has the two effects: 1) it makes the step closer to the gradient and 2) it reduces the step size. Decreasing lambda has the effect of making the step closer to the Gauss-Newton method, which is an agressive optimization. Using small steps in the direction of the gradient is incredibly robust, but slightly slow.
If there is only a single calibration board image, the principle point is removed from the optimization (simply remove the columns from the Jacobian corresponding to those two parameters, and then do not update them during the iterations), as it is very unstable.

We now have enough information to fully calibrate and optimize a camera using a calibration board.

Stereo Vision

The first thing we need to consider is how to incorporate two cameras into the model. We do this by simply adding the transformation between the two cameras as follows:

$\begin{bmatrix} \begin{bmatrix} R_R \end{bmatrix} & \vec{t_R} \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} R_s \end{bmatrix} & \vec{t_s} \\ 0 & 1 \end{bmatrix} \begin{bmatrix} \begin{bmatrix} R_L \end{bmatrix} & \vec{t_L} \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} R_s \end{bmatrix} \begin{bmatrix} R_L \end{bmatrix} & \begin{bmatrix} R_s \end{bmatrix} \vec{t_L} + \vec{t_s} \\ 0 & 1 \end{bmatrix}$

This results in:

$\begin{bmatrix} R_R \end{bmatrix} = \begin{bmatrix} R_s \end{bmatrix} \begin{bmatrix} R_L \end{bmatrix}$

$\vec{t_R} = \begin{bmatrix} R_s \end{bmatrix} \vec{t_L} + \vec{t_s}$

We can solve for $\begin{bmatrix} R_s \end{bmatrix}$ and $\vec{t_s}$ :

$\begin{bmatrix} R_s \end{bmatrix} = \begin{bmatrix} R_R \end{bmatrix} \begin{bmatrix} R_L \end{bmatrix}^T$

$\vec{t_s} = \vec{t_R} - \begin{bmatrix} R_R \end{bmatrix} \begin{bmatrix} R_L \end{bmatrix}^T\vec{t_L}$

In order to optimize the stereo model, we do the following:

We get initial guesses for the intrinsics, $\begin{bmatrix}R_L\end{bmatrix}$ , $\begin{bmatrix}R_R\end{bmatrix}$ , $\vec{t_L}$ and $\vec{t_R}$ by doing single camera optimization on each camera separately.
Instead of optimizing $\begin{bmatrix}R_L\end{bmatrix}$ , $\begin{bmatrix}R_R\end{bmatrix}$ , $\vec{t_L}$ , and $\vec{t_R}$ for each calibration board image, we instead optimize $\begin{bmatrix}R_L\end{bmatrix}$ , $\begin{bmatrix}R_s\end{bmatrix}$ , $\vec{t_L}$ , and $\vec{t_s}$ . This reduces the number of extrinsic parameters from 12M to 6M+6, where M is the number of calibration board image pairs.
To get an initial guess for $\begin{bmatrix} R_s \end{bmatrix}$ and $\vec{t_s}$ , we do the following:
$\begin{bmatrix} r_{R_{11}} & r_{R_{12}} & r_{R_{13}} \\ r_{R_{21}} & r_{R_{22}} & r_{R_{23}} \\ r_{R_{31}} & r_{R_{32}} & r_{R_{33}} \\ \end{bmatrix} = \begin{bmatrix} r_{s_{11}} & r_{s_{12}} & r_{s_{13}} \\ r_{s_{21}} & r_{s_{22}} & r_{s_{23}} \\ r_{s_{31}} & r_{s_{32}} & r_{s_{33}} \\ \end{bmatrix} \begin{bmatrix} r_{L_{11}} & r_{L_{12}} & r_{L_{13}} \\ r_{L_{21}} & r_{L_{22}} & r_{L_{23}} \\ r_{L_{31}} & r_{L_{32}} & r_{L_{33}} \\ \end{bmatrix}$

Then:

$r_{R_{11}} = r_{s_{11}}r_{L_{11}} + r_{s_{12}}r_{L_{21}} + r_{s_{13}}r_{L_{31}}$
$r_{R_{21}} = r_{s_{21}}r_{L_{11}} + r_{s_{22}}r_{L_{21}} + r_{s_{23}}r_{L_{31}}$
$r_{R_{31}} = r_{s_{31}}r_{L_{11}} + r_{s_{32}}r_{L_{21}} + r_{s_{33}}r_{L_{31}}$
$r_{R_{12}} = r_{s_{11}}r_{L_{12}} + r_{s_{12}}r_{L_{22}} + r_{s_{13}}r_{L_{32}}$
$r_{R_{22}} = r_{s_{21}}r_{L_{12}} + r_{s_{22}}r_{L_{22}} + r_{s_{23}}r_{L_{32}}$
$\vdots$

and:

$t_{R_x} = r_{s_{11}}t_{L_x} + r_{s_{12}}t_{L_y} + r_{s_{13}}t_{L_z} + t_{s_x}$
$t_{R_y} = r_{s_{21}}t_{L_x} + r_{s_{22}}t_{L_y} + r_{s_{23}}t_{L_z} + t_{s_y}$
$t_{R_z} = r_{s_{31}}t_{L_x} + r_{s_{32}}t_{L_y} + r_{s_{33}}t_{L_z} + t_{s_z}$

If we put these in matrix form:

$\begin{bmatrix} r_{L_{11}} & 0 & 0 & r_{L_{21}} & 0 & 0 & r_{L_{31}} & 0 & 0 \\ 0 & r_{L_{11}} & 0 & 0 & r_{L_{21}} & 0 & 0 & r_{L_{31}} & 0 \\ 0 & 0 & r_{L_{11}} & 0 & 0 & r_{L_{21}} & 0 & 0 & r_{L_{31}} \\ r_{L_{12}} & 0 & 0 & r_{L_{22}} & 0 & 0 & r_{L_{32}} & 0 & 0 \\ 0 & r_{L_{12}} & 0 & 0 & r_{L_{22}} & 0 & 0 & r_{L_{32}} & 0 \\ 0 & 0 & r_{L_{12}} & 0 & 0 & r_{L_{22}} & 0 & 0 & r_{L_{32}} \\ r_{L_{13}} & 0 & 0 & r_{L_{23}} & 0 & 0 & r_{L_{33}} & 0 & 0 \\ 0 & r_{L_{13}} & 0 & 0 & r_{L_{23}} & 0 & 0 & r_{L_{33}} & 0 \\ 0 & 0 & r_{L_{13}} & 0 & 0 & r_{L_{23}} & 0 & 0 & r_{L_{33}} \\ \end{bmatrix} \begin{bmatrix} r_{s_{11}} \\ r_{s_{21}} \\ r_{s_{31}} \\ r_{s_{12}} \\ r_{s_{22}} \\ r_{s_{32}} \\ r_{s_{13}} \\ r_{s_{23}} \\ r_{s_{33}} \\ \end{bmatrix} = \begin{bmatrix} r_{R_{11}} \\ r_{R_{21}} \\ r_{R_{31}} \\ r_{R_{12}} \\ r_{R_{22}} \\ r_{R_{32}} \\ r_{R_{13}} \\ r_{R_{23}} \\ r_{R_{33}} \end{bmatrix}$

and:

$\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} t_{s_x} \\ t_{s_y} \\ t_{s_z} \end{bmatrix} = \begin{bmatrix} \vec{t_R}-\begin{bmatrix}R_R\end{bmatrix}\begin{bmatrix}R_L\end{bmatrix}^T\vec{t_L} \end{bmatrix}$

We now have a model for stereo vision and a way to provide an initial guess for this model.

Some things:

$\frac{\partial{x^d_L}}{\partial{\vec{a_L}}}$ , $\frac{\partial{y^d_L}}{\partial{\vec{a_L}}}$ , $\frac{\partial{x^d_R}}{\partial{\vec{a_R}}}$ , $\frac{\partial{y^d_R}}{\partial{\vec{a_R}}}$ have the same form as single camera calibration.

$\frac{\partial{x^d_L}}{\partial{\vec{m_L}}}$ and $\frac{\partial{y^d_L}}{\partial{\vec{m_L}}}$ have the same form as single camera calibration.

$\frac{\partial{x^d_R}}{\partial{\vec{m_L}}}$ , $\frac{\partial{y^d_R}}{\partial{\vec{m_L}}}$ , $\frac{\partial{x^d_R}}{\partial{\vec{m_s}}}$ , $\frac{\partial{y^d_R}}{\partial{\vec{m_s}}}$ are new quantities that we need to derive.

We need to compute the Jacobians:

$\begin{bmatrix} \frac{\partial{\vec{p_{n_R}}}}{\partial{\vec{m_L}}} \end{bmatrix} = \begin{bmatrix} \frac{\partial{\vec{p_{n_R}}}}{\partial{\vec{Rt_R}}} \end{bmatrix} \begin{bmatrix} \frac{\partial{\vec{Rt_R}}}{\partial{\vec{Rt_L}}} \end{bmatrix} \begin{bmatrix} \frac{\partial{\vec{Rt_L}}}{\partial{\vec{m_L}}} \end{bmatrix}$

$\begin{bmatrix} \frac{\partial{\vec{p_{n_R}}}}{\partial{\vec{m_s}}} \end{bmatrix} = \begin{bmatrix} \frac{\partial{\vec{p_{n_R}}}}{\partial{\vec{Rt_R}}} \end{bmatrix} \begin{bmatrix} \frac{\partial{\vec{Rt_R}}}{\partial{\vec{Rt_s}}} \end{bmatrix} \begin{bmatrix} \frac{\partial{\vec{Rt_s}}}{\partial{\vec{m_s}}} \end{bmatrix}$

The only quantities we haven’t derived yet are:

$\begin{bmatrix} \frac{\partial{\vec{Rt_R}}}{\partial{\vec{Rt_L}}} \end{bmatrix} = \begin{blockarray}{cccccccccc} & \partial{r_{L_{11}}} & \partial{r_{L_{21}}} & \partial{r_{L_{31}}} & \partial{r_{L_{12}}} & \partial{r_{L_{22}}} & \partial{r_{L_{32}}} & \partial{t_{L_x}} & \partial{t_{L_y}} & \partial{t_{L_z}} \\ \begin{block}{c[ccccccccc@{\hspace*{5pt}}]} \partial{r_{R_{11}}} & $r_{s_{11}}$ & $r_{s_{12}}$ & $r_{s_{13}}$ & 0 & 0 & 0 & 0 & 0 & 0 \\ \partial{r_{R_{21}}} & $r_{s_{21}}$ & $r_{s_{22}}$ & $r_{s_{23}}$ & 0 & 0 &0 & 0 & 0 & 0 \\ \partial{r_{R_{31}}} & $r_{s_{31}}$ & $r_{s_{32}}$ & $r_{s_{33}}$ & 0 & 0 & 0 & 0 & 0 & 0 \\ \partial{r_{R_{12}}} & 0 & 0 & 0 & $r_{s_{11}}$ & $r_{s_{12}}$ & $r_{s_{13}}$ & 0 & 0 & 0 \\ \partial{r_{R_{22}}} & 0 & 0 & 0 & $r_{s_{21}}$ & $r_{s_{22}}$ & $r_{s_{23}}$ & 0 & 0 & 0 \\ \partial{r_{R_{32}}} & 0 & 0 & 0 & $r_{s_{31}}$ & $r_{s_{32}}$ & $r_{s_{33}}$ & 0 & 0 & 0 \\ \partial{t_{R_x}} & 0 & 0 & 0 & 0 & 0 & 0 & $r_{s_{11}}$ & $r_{s_{12}}$ & $r_{s_{13}}$ \\ \partial{t_{R_y}} & 0 & 0 & 0 & 0 & 0 & 0 & $r_{s_{21}}$ & $r_{s_{22}}$ & $r_{s_{23}}$ \\ \partial{t_{R_z}} & 0 & 0 & 0 & 0 & 0 & 0 & $r_{s_{31}}$ & $r_{s_{32}}$ & $r_{s_{33}}$ \\ \end{block} \end{blockarray}$

and

$\begin{bmatrix} \frac{\partial{\vec{Rt_R}}}{\partial{\vec{Rt_s}}} \end{bmatrix} = \begin{blockarray}{ccccccccccccc} & \partial{r_{s_{11}}} & \partial{r_{s_{21}}} & \partial{r_{s_{31}}} & \partial{r_{s_{12}}} & \partial{r_{s_{22}}} & \partial{r_{s_{32}}} & \partial{r_{s_{13}}} & \partial{r_{s_{23}}} & \partial{r_{s_{33}}} & \partial{t_{s_x}} & \partial{t_{s_y}} & \partial{t_{s_z}} \\ \begin{block}{c[cccccccccccc@{\hspace*{5pt}}]} \partial{r_{R_{11}}} & $r_{L_{11}}$ & 0 & 0 & $r_{L_{21}}$ & 0 & 0 & $r_{L_{31}}$ & 0 & 0 & 0 & 0 & 0 \\ \partial{r_{R_{21}}} & 0 & $r_{L_{11}}$ & 0 & 0 & $r_{L_{21}}$ &0 & 0 & $r_{L_{31}}$ & 0 & 0 & 0 & 0 \\ \partial{r_{R_{31}}} & 0 & 0 & $r_{L_{11}}$ & 0 & 0 & $r_{L_{21}}$ & 0 & 0 & $r_{L_{31}}$ & 0 & 0 & 0 \\ \partial{r_{R_{12}}} & $r_{L_{12}}$ & 0 & 0 & $r_{L_{22}}$ & 0 & 0 & $r_{L_{32}}$ & 0 & 0 & 0 & 0 & 0 \\ \partial{r_{R_{22}}} & 0 & $r_{L_{12}}$ & 0 & 0 & $r_{L_{22}}$ & 0 & 0 & $r_{L_{32}}$ & 0 & 0 & 0 & 0 \\ \partial{r_{R_{32}}} & 0 & 0 & $r_{L_{12}}$ & 0 & 0 & $r_{L_{22}}$ & 0 & 0 & $r_{L_{32}}$ & 0 & 0 & 0 \\ \partial{t_{R_x}} & $t_{L_x}$ & 0 & 0 & $t_{L_y}$ & 0 & 0 & $t_{L_z}$ & 0 & 0 & 1 & 0 & 0 \\ \partial{t_{R_y}} & 0 & $t_{L_x}$ & 0 & 0 & $t_{L_y}$ & 0 & 0 & $t_{L_z}$ & 0 & 0 & 1 & 0 \\ \partial{t_{R_z}} & 0 & 0 & $t_{L_x}$ & 0 & 0 & $t_{L_y}$ & 0 & 0 & $t_{L_z}$ & 0 & 0 & 1 \\ \end{block} \end{blockarray}$

The rest of the Jacobians have already been derived in the single camera calibration section.

Now, we’re ready to define our cost function for stereo calibration:

$C_{LS} = \sum_{j=1}^{M}\sum_{i=1}^{N} (x_{L_{ij}}^d-x_{L_{ij}}^m)^2 + (y_{L_{ij}}^d-y_{L_{ij}}^m)^2 + (x_{R_{ij}}^d-x_{R_{ij}}^m)^2 + (y_{R_{ij}}^d-y_{R_{ij}}^m)^2$

$C_{LS} = \sum_{j=1}^{M}\sum_{i=1}^{N} res_{x_{L_{ij}}}^2 + res_{y_{L_{ij}}}^2 + res_{x_{R_{ij}}}^2 + res_{y_{R_{ij}}}^2$

Where:

M = number of calibration board image pairs
N = number of points on calibration board

We put all of our parameters (intrinsic + extrinsic) in a single vector, $\vec{p}$ :

$\vec{p} = (\overbrace{\alpha_{x_L},\ \alpha_{y_L},\ x_{o_L},\ y_{o_L},\ \beta_{1_L},\ \beta_{2_L},\ \beta_{3_L},\ \beta_{4_L}}^8,\overbrace{\alpha_{x_R},\ \alpha_{y_R},\ x_{o_R},\ y_{o_R},\ \beta_{1_R},\ \beta_{2_R},\ \beta_{3_R},\ \beta_{4_R}}^8,\ \overbrace{\theta_{{x_L}_1},\ \theta_{{y_L}_1},\ \theta_{{z_L}_1},\ t_{{x_L}_1},\ t_{{y_L}_1},\ t_{{z_L}_1},\ ...,\ \theta_{{x_L}_M},\ \theta_{{y_L}_M},\ \theta_{{z_L}_M},\ t_{{x_L}_M},\ t_{{y_L}_M},\ t_{{z_L}_M}}^{6M},\overbrace{\theta_{x_s},\ \theta_{y_s},\ \theta_{z_s},\ t_{x_s},\ t_{y_s},\ t_{z_s}}^6)$

Which results in 22+6M total parameters.

To compute the Jacobian, we start off with the residuals:

$res_{x_L} = x_L^d(\vec{p})-x_L^m$

$res_{y_L} = y_L^d(\vec{p})-y_L^m$

$res_{x_R} = x_R^d(\vec{p})-x_R^m$

$res_{y_R} = y_R^d(\vec{p})-y_R^m$

Take the partial derivative:

$\frac{\partial{res_{x_L}}}{\partial{\vec{p}}} = \frac{\partial{x_L^d}}{\partial{\vec{p}}}$

$\frac{\partial{res_{y_L}}}{\partial{\vec{p}}} = \frac{\partial{y_L^d}}{\partial{\vec{p}}}$

$\frac{\partial{res_{x_R}}}{\partial{\vec{p}}} = \frac{\partial{x_R^d}}{\partial{\vec{p}}}$

$\frac{\partial{res_{y_R}}}{\partial{\vec{p}}} = \frac{\partial{y_R^d}}{\partial{\vec{p}}}$

The Jacobian has the form:

$\begin{bmatrix}J\end{bmatrix}_{ij} = \begin{bmatrix} \frac{\partial{res_i}}{\partial{\vec{p_j}}}\end{bmatrix}$

Where:

$\newcommand\overmat[2]{% \makebox[0pt][l]{$\smash{\color{white}\overbrace{\phantom{% \begin{matrix}#2 \end{matrix}}}^{\text{\color{black}#1}}}$}#2} \newcommand\bovermat[2]{% \makebox[0pt][l]{$\smash{\overbrace{\phantom{% \begin{matrix}#2\end{matrix}}}^{\text{#1}}}$}#2} \newcommand\coolrightbrace[2]{% \left.\vphantom{\begin{matrix} #1 \end{matrix}}\right\}#2} \begin{bmatrix} J \end{bmatrix} = \begin{blockarray}{cccccccc} & \bovermat{22+6M}{\partial{\vec{a_L}} & \partial{\vec{a_R}} & \partial{\vec{m_{L_1}}} & \partial{\vec{m_{L_2}}} & \hdots & \partial{\vec{m_{L_M}}} & \partial{\vec{m_s}}} \\ \begin{block}{c[ccccccc@{\hspace*{5pt}}]} \partial{\vec{res_{L_1}}} & X & 0 & X & 0 & \hdots & 0 & 0 \\ \partial{\vec{res_{R_1}}} & 0 & X & X & 0 & \hdots & 0 & X \\ \partial{\vec{res_{L_2}}} & X & 0 & 0 & X & \hdots & 0 & 0 \\ \partial{\vec{res_{R_2}}} & 0 & X & 0 & X & \hdots & 0 & X \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ \partial{\vec{res_{L_M}}} & X & 0 & 0 & 0 & \hdots & X & 0 \\ \partial{\vec{res_{R_M}}} & 0 & X & 0 & 0 & \hdots & X & X \\ \end{block} \end{blockarray}% \begin{matrix} \\ \coolrightbrace{\begin{blockarray}{cccccccc} & \bovermat{22+6M}{\partial{\vec{a_L}} & \partial{\vec{a_R}} & \partial{\vec{m_{L_1}}} & \partial{\vec{m_{L_2}}} & \hdots & \partial{\vec{m_{L_M}}} & \partial{\vec{m_s}}} \\ \begin{block}{c[ccccccc@{\hspace*{5pt}}]} \partial{\vec{res_{L_1}}} & X & 0 & X & 0 & \hdots & 0 & 0 \\ \partial{\vec{res_{R_1}}} & 0 & X & X & 0 & \hdots & 0 & X \\ \partial{\vec{res_{L_2}}} & X & 0 & 0 & X & \hdots & 0 & 0 \\ \partial{\vec{res_{R_2}}} & 0 & X & 0 & X & \hdots & 0 & X \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ \partial{\vec{res_{L_M}}} & X & 0 & 0 & 0 & \hdots & X & 0 \\ \partial{\vec{res_{R_M}}} & 0 & X & 0 & 0 & \hdots & X & X \\ \end{block} \end{blockarray}}{4NM} \\ \end{matrix}$

Where, like before, the $X$ ‘s represent filled blocks and $0$ ‘s are empty blocks.

To solve this, again, I’d recommend using the Levenberg-Marquardt method since it is not well-behaved under certain circumstances:

$\Delta\vec{p} = -(\begin{bmatrix}J\end{bmatrix}^T \begin{bmatrix} J \end{bmatrix} + \lambda\begin{bmatrix}I\end{bmatrix})^{-1}\begin{bmatrix}J\end{bmatrix}^T\vec{res}$

$\vec{p_{new}} = \vec{p_{old}} + \Delta\vec{p}$

Until some convergence condition.

Some things:

$\lambda$ is set and used as described in the single camera calibration section.
If there is only a single calibration board image, the principle point is again removed from the optimization similarly to the single camera calibration section.

Now, we can perform camera calibration for stereo cameras using a checkerboard calibration target.

Multi-Camera

Coming soon…