1 Introduction

In the following the randomized Power Method that is used for the computation of the Singular Value decomposition (SVD) of matrices is implemented and then compared with the build in svd() method of R. Moreover there is a chapter about utilzing the SVD for the compression of grey scale images.

2 The Randomized Power Method

2.1 Preparations

Load Packages:

library(tidyverse) 
library(plotly)
library(printr)

Function for generating random matrices: (In order to test the implementation later on)

gen_rand_mat <- function(n = 1000,m = 100, s = 2) {
  set.seed(s)
  vec <- sample(rnorm(m*n, mean = 1:10,sd = 1:33))
  return(matrix(sample(vec),nrow = n,ncol = m))
}

2.2 The theory

This is just a short recap of the formulas used and the proof for the assertions will not be given. More details can be found in your favourite Linear Algebra/ Data Analysis book or just google it :).

Definition: Singular value decomposition

Let \(A \in \mathbb{K}^{n \times m}\) then the following factorization is called the Singular value decomposition of A:

\(A = U\Sigma V^H\)

where \(U,V\) orthogonal matrices of compatible size and \(\Sigma\) is diagonal with positive entries.

Theorem: Randomized Power Method

Let \(A \in \mathbb{K}^{n \times m}\) and \(x \in \mathbb{K}^{n}\) be a radnom unit length vector. Let \(V\) be the space spanned by the left singular vectors of A corresponding to singular values greater than \((1-\epsilon)\sigma_{1}\). Let \(i\) be \(\Omega(\frac{ln(d^{\beta}/\epsilon)}{\epsilon})\), for \(\beta \geq 0.5\). Let \(\omega^*\) be the unit vector after \(i\) iterations of the power method, defined as follows:

\(\omega^{*} = \frac{(AA^{H})^{i}x}{\parallel(AA^{H})^{i}x\parallel_{2}}\)

Then the probability that \(\omega^*\) has a component of at most \(O(\frac{\epsilon}{\alpha d^{\beta}})\) orthogonal to V is at least \(1-2\alpha \sqrt{2d-1}\).

This implies that for the implementation there are three parameters that have to be set for the algorithm: \(\epsilon,\alpha,\beta\). Their influence is clear from the formulas and will later be discussed using visualizations.

Moreover it should be emphasised that \(x\) is used to not perform a matrix-matrix multiplikation in each iteration but a matrix-vector multiplikation. To further reduce the resulting multiplikation one could use \(A^{H}A\) as most of the time \(n>m\).

Another fact worth mentioning is that the above gives only the formula for comupting \(u_1\) the first column of \(U\). The computation of the first singular vector (first entry of \(\Sigma\)) is then as easy as computing \(\sigma_{1}= \parallel A^H u_1 \parallel_2\) and \(v_1 = \frac{A^H v_1}{\sigma_1}\). But still this gives only the first of the \(r\) (\(r = rank(A)\)) singular values and left/right singular vectors that are needed for the full decomposition. By applying the above method iteratively on \(A - \sigma_1 u_1 v_{1}^{H}\) one can get the full SVD. In the following implementation only the first step is implemented i.e. the first singular value and its singular vectors are computed.

2.3 The implementation

rpm_algo <- function(matr, eps = 0.1, alph = 0.01, bet = 0.5){
  stopifnot(bet >= 0.5,alph <= 1,alph >= 0, eps <= 1,eps >= 0)
  dim_matr <- dim(matr)
  # generate random unit vector:
  set.seed(2)
  unit_vec_raw <- rnorm(dim_matr[1])
  unit_vec <- unit_vec_raw/(sqrt(sum(unit_vec_raw^2)))
  # calculate the iteration number:
  iter_num <- ceiling(log((dim_matr[2]^bet)/eps)/eps)
  # do the iterations:
  x <- unit_vec
  AAt <- matr %*% t(matr)
  for (i in 1:iter_num) {
    temp_vec <- AAt %*% x
    x <- temp_vec/sqrt(sum(temp_vec^2))
  }
  singular_vec <- sqrt(sum((t(matr) %*% x)^2))
  rvec <- (t(matr) %*% x)/singular_vec
  orth_dist_bound <- eps/(alph*dim_matr[2]^bet)
  prob_bound <- 1 - 2*alph*sqrt(2*dim_matr[2]-1)
  return(list(lvec = x,
              rvec = rvec,
              singular_vec = singular_vec,
              orth_dist_bound = orth_dist_bound,
              prob_bound = prob_bound,
              eps = eps,
              alph = alph,
              bet = bet))
}

2.4 Test the implementation

Evaluate the algorithm for random matrices:

# generate random matrices
matr_list <- sapply(1:10, function(i){
  gen_rand_mat(n = 100,m = 10,s = i)
},simplify = FALSE)
names(matr_list) <- paste0("mat",1:10)

tune_para <- expand.grid(eps = seq(0.05,0.5,length.out = 10),
                         alph = seq(0.001,0.004, length.out = 10),
                         bet = seq(0.5,5,length.out = 10))

# use the build in svd() function for these matrices
u1_svd <- sapply(matr_list,function(matr){
  svd_temp <- svd(matr,nu = 1,nv = 0)
  return(svd_temp$u)
},simplify = TRUE)
colnames(u1_svd) <- paste0("mat",1:10)

# compare the results of the algorithm with the ones of the svd() function
rpm_res <- sapply(1:dim(tune_para)[1], function(i){
  temp_params <- tune_para[i,]
  individual_res <- sapply(names(matr_list), function(matr_name){
    res_list <- rpm_algo(matr = matr_list[[matr_name]],eps = temp_params$eps,alph = temp_params$alph,bet = temp_params$bet)
    u1_from_svd <- u1_svd[,matr_name]
    orth_projection <- sum(u1_from_svd*as.vector(res_list$lvec))*u1_from_svd
    real_orth_error <- sqrt(sum((orth_projection-as.vector(res_list$lvec))^2))
    return(c(eps = temp_params$eps,alph = temp_params$alph,bet = temp_params$bet,
             orth_dist_bound = res_list$orth_dist_bound,
             prob_bound = res_list$prob_bound,
             real_orth_error = real_orth_error))
  },simplify = TRUE)
  return(rowMeans(individual_res))
},simplify = TRUE)
rpm_res <- as_tibble(t(rpm_res))

Visualize the bounds (in red/triangles) against the actual error (in purple/points) by parameter.

# visualize the results of the comparison
plot_orth_error_vs_bound <- rpm_res %>%
  pivot_longer(cols = c("eps","alph","bet"),names_to = "para",values_to = "para_val") %>%
  ggplot()+
    geom_jitter(aes(x = para_val, y = orth_dist_bound, col = prob_bound), shape = 2, alpha = 0.5)+
    geom_jitter(aes(x = para_val, y = real_orth_error, col = prob_bound), shape = 16, alpha = 0.5)+
    geom_smooth(aes(x = para_val, y = orth_dist_bound),se = FALSE, col = "red", method = "gam",
                formula = y ~ s(x, bs = "cs"))+
    geom_smooth(aes(x = para_val, y = real_orth_error),se = FALSE, col = "purple", method = "gam",
                formula = y ~ s(x, bs = "cs"))+
    facet_wrap(~para,scales = "free")+
    labs(x = "parameter value",y = "log orthogonal error distance")+
    scale_y_log10()+
    theme_classic()
#plot_orth_error_vs_bound
ggplotly(plot_orth_error_vs_bound)

We observe that on average the bound from the theorem holds. But as it only holds with high probability we will have a closer look:

# realtive frequency of the bound not holding
mean(rpm_res$orth_dist_bound < rpm_res$real_orth_error)

## [1] 0.038

# mean violation of the bound
mean(abs((rpm_res$orth_dist_bound - rpm_res$real_orth_error)[rpm_res$orth_dist_bound < rpm_res$real_orth_error]))

## [1] 0.001709332

# statistical summary of the errors
summary(rpm_res$real_orth_error)

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0	7.13e-05	0.0030237	0.0322212	0.0232572	0.4318084

From this we see that if it violates the bound it does not baldy violate it. So the bounds look good. In the next step a closer look on the parameters is achieved by plotting without the bounds.

plot_orth_error <- rpm_res %>%
  pivot_longer(cols = c("eps","alph","bet"),names_to = "para",values_to = "para_val") %>%
  ggplot()+
    geom_jitter(aes(x = para_val, y = real_orth_error, col = prob_bound), shape = 16, alpha = 0.5)+
    geom_smooth(aes(x = para_val, y = real_orth_error),se = FALSE, col = "purple", method = "gam",
                formula = y ~ s(x, bs = "cs"))+
    facet_wrap(~para,scales = "free")+
    labs(x = "parameter value",y = "log orthogonal error distance (real)")+
    scale_y_log10()+
    theme_classic() 
#plot_orth_error
ggplotly(plot_orth_error)

From this one can get some meaningful insights:

\(\alpha\) does not influence the real orthogonal error distance. Rising \(\alpha\) tightens the bound for the orthogonal error disance but this to the cost of a lower probabilty of the bound holding.
\(\beta\) does influence the real orthogonal error distance. Rising \(\beta\) leads to better bounds and a lower real error to the cost of more iterations.
\(\epsilon\) does influence the real orthogonal error distance. Rising \(\epsilon\) leads to bigger bounds and bigger real error but reduces the number of iterations (because \(\sigma_1 >> \sigma_2\) highly favors convergence).

2.5 A real world example

In the following a quick example with real world data will demonstrate that the build in function to compute the first singular value and vectors yields the same results as the one build above relying on the Randomized Power Method.

Read in and clean the data:

countries_data <- read_delim("countries_data.txt", delim = " ",skip = 2)
col_count <- colnames(countries_data)
col_count[1] <- "country"
col_count[3] <- "avg_income"
colnames(countries_data) <- col_count

countries_data <- countries_data %>%
  mutate_at(vars(!starts_with("c")),function(x){as.numeric(str_trim(x))}) %>%
  mutate_if(is.numeric,scale)

countries_matrix <- as.matrix(countries_data[,-1])
rownames(countries_matrix) <- countries_data$country
countries_matrix <- t(countries_matrix)

Comparison:

countries_rpm <- rpm_algo(countries_matrix, alph = 0.1/(2*2*sqrt(3)), bet = 0.5,0.1)

c("diff left singvector" = sum(abs(svd(countries_matrix,nu = 1, nv = 0)$u - countries_rpm$lvec)),
  "diff singvalue" = abs(countries_rpm$singular_vec - svd(countries_matrix,nu = 0, nv = 0)$d[1]),
  "diff right singvector" = sum(abs(svd(countries_matrix,nu = 0, nv = 1)$v - countries_rpm$rvec)))

##  diff left singvector        diff singvalue diff right singvector 
##          1.289517e-08          8.881784e-16          2.279701e-08

Very small difference between build in SVD and the implemented function! :)

3 Image compression using SVD

The method used to compress the image is the best k rank approximation of a matrix using its SVD. As the SVD can be expressed by \(A = \sum_{l=1}^{r} \sigma_l u_l v_l^H\) with r being the rank of the matrix, one can also use \(k<r\) to reduce the information to be stored to display the image. The result is \(A_k = \sum_{l=1}^{k} \sigma_l u_l v_l^H\) the k rank approximation of A. Actually one can proof that this approximation is the best k rank approximation possible with respect of norms like the Frobenius norm.

The implementation given an algorithm that performs the SVD is very easy:

best_k_rank <- function(svd_obj,k){
  Ak <- svd_obj$d[1]*(svd_obj$v[,1] %*% t(svd_obj$u[,1]))
  for (i in 2:(k)) {
    Ak <- Ak + svd_obj$d[i]*(svd_obj$v[,i] %*% t(svd_obj$u[,i]))
  }
  return(Ak)
}

Now the picture of choice (greyscale for this approach) is loaded and displayed:

library(png)
file <- system.file('extdata/parrots.png',package='imager')
my_image =  readPNG(file)
img_mat=my_image[,,1] # will hold the grayscale values divided by 255
img_mat_vis=t(apply(img_mat, 2, rev)) # otherwise the image will be rotated
image(img_mat_vis, col  = gray((0:255)/255))

Compute the SVD and plot the singular values on a logarithmic scale:

svd_image <- svd(img_mat)
tibble(id = 1:(length(svd_image$d)-1),
       sing = svd_image$d[1:(length(svd_image$d)-1)])%>%
  ggplot(aes(x = id, y = sing)) +
    geom_line()+
    geom_point()+
    scale_y_log10()+
    labs(x = "#singular vector", y = "value")+
    theme_classic()

From this one can conclude that meaningful k’s should not be less than 20 so we will use this as a starting point.

img_mat_20 <- best_k_rank(svd_image,20)
img_mat_20 <- t(apply(img_mat_20, 1, rev))
image(img_mat_20, col = gray((0:255)/255))

# reduction in memeory usage:
as.numeric(object.size(list(svd_image$d[1:20],svd_image$u[,1:20],svd_image$v[,1:20]))/object.size(img_mat))

## [1] 0.06532856

This obviously caputures the most important features of the image but it is still quite blurry. The reduction w.r.t. memory size is very large (more then 15 times less memory!). So the next step is to step up the numbers and try out 50 and 150 as k.

img_mat_50 <- best_k_rank(svd_image,50)
img_mat_50 <- t(apply(img_mat_50, 1, rev))
image(img_mat_50, col = gray((0:255)/255))

# reduction in memeory usage:
as.numeric(object.size(list(svd_image$d[1:50],svd_image$u[,1:50],svd_image$v[,1:50]))/object.size(img_mat))

## [1] 0.1630544

img_mat_150 <- best_k_rank(svd_image,150)
img_mat_150 <- t(apply(img_mat_150, 1, rev))
image(img_mat_150, col = gray((0:255)/255))

# reduction in memeory usage:
as.numeric(object.size(list(svd_image$d[1:150],svd_image$u[,1:150],svd_image$v[,1:150]))/object.size(img_mat))

## [1] 0.4888072

The results speak for themselves and even for k equal to 100 one has a reduction in memory needed of roughly a half and one can hardly detect a difference in the two pictures. Still one should note that this method is quite heuristically and it can be quite computationally expensive, especially when considering the Power Method for the computattion of the SVD. But the ease of implementation is very straight forward which is of course nice :D.

Randomized Power Method and SVD image compression

Emanuel Sommer

12 5 2020