# Beautiful plots while simulating loss in two-part Procrustes problem

psychometrics
statistics
EN
Published

April 15, 2015

Today, I was working on a two part Procrustes problem and wanted to find out why my minimization algorithm sometimes does not converge properly or renders unexpected results. The loss function to be minimized is

$L(\mathbf{Q},c) = \| c \mathbf{A_1Q} - \mathbf{B_1} \|^2 + \| \mathbf{A_2Q} - \mathbf{B_2} \|^2 \rightarrow min$

with $$\| \cdot \|$$ denoting the Frobenius norm, $$c$$ is an unknown scalar and $$\mathbf{Q}$$ an unknown rotation matrix, i.e. $$\mathbf{Q}^T\mathbf{Q}=\mathbf{I}$$. $$\;\mathbf{A_1}, \mathbf{A_2}, \mathbf{B_1}$$, and $$\mathbf{B_1}$$ are four real valued matrices. The minimum for $$c$$ is easily found by partial derivation and setting to zero.

$c = \frac {tr \; \mathbf{Q}^T \mathbf{A_1}^T \mathbf{B_1}} { \| \mathbf{A_1} \|^2 }$

By plugging $$c$$ into the loss function $$L(\mathbf{Q},c)$$ we get a new loss function $$L(\mathbf{Q})$$ that only depends on $$\mathbf{Q}$$. This is the starting situation.

When trying to find out why the algorithm to minimize $$L(\mathbf{Q})$$ did not work as expected, I got stuck. So I decided to conduct a small simulation and generate random rotation matrices to study the relation between the parameter $$c$$ and the value of the loss function $$L(\mathbf{Q})$$. Before looking at the results, let’s visualize the results for the first part of the above Procrustes problem, with $$c$$ having the same solution as before.

$L(\mathbf{Q},c) = \| c \mathbf{A_1Q} - \mathbf{B_1} \|^2 \rightarrow min$

This is a well behaved relation, for each scaling parameter the loss value is identical. Now let’s look at the full two-part loss function. The result turns out to be beautiful.

For the graphic above I used the following matrices.

$A1= \begin{pmatrix} 0.0 & 0.4 & -0.5 \\ -0.4 & -0.8 & -0.5 \\ -0.1 & -0.5 & 0.2 \\ \end{pmatrix} \mkern18mu , \mkern36mu B1= \begin{pmatrix} -0.1 & -0.8 & -0.1 \\ 0.3 & 0.2 & -0.9 \\ 0.1 & -0.3 & -0.5 \\ \end{pmatrix}$ $A2= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ \end{pmatrix} \mkern18mu , \mkern36mu B2= \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ \end{pmatrix}$

And the following R-code.

tr <- function(X) sum(diag(X))

# random matrix type 1
rmat_1 <- function(n=3, p=3, min=-1, max=1){
matrix(runif(n*p, min, max), ncol=p)
}

# random matrix type 2, sparse
rmat_2 <- function(p=3) {
diag(p)[, sample(1:p, p)]
}

# generate random rotation matrix Q. Based on Q find
# optimal scaling factor c and calculate loss function value
#
one_sample <- function(n=2, p=2)
{
Q <- mixAK::rRotationMatrix(n=1, dim=p) %*%         # random rotation matrix
s <- tr( t(Q) %*% t(A1) %*% B1 ) / norm(A1, "F")^2  # scaling factor c
rss <- norm(s*A1 %*% Q - B1, "F")^2 +
norm(A2 %*% Q - B2, "F")^2
}

# find c and rss or many random rotation matrices
#
set.seed(10)  # nice case for 3 x 3
n <- 3
p <- 3
A1 <- round(rmat_1(n, p), 1)
B1 <- round(rmat_1(n, p), 1)
A2 <- rmat_2(p)
B2 <- rmat_2(p)

x <- rdply(40000, one_sample(3,3))
plot(x$s, x$rss, pch=16, cex=.4, xlab="c", ylab="L(Q)", col="#00000010")

Below you find some more graphics with different matrices as inputs.

Here, we do not have a one-to-one relation between the scaling parameter and the loss function any more. I do not quite know what to make of this yet. But for now I am happy that it has aesthetic value.