Out of Tensor factorisation – Nipun Batra Blog

In a previous post, we had looked at predicting for users who weren’t a part of the original matrix factorisation. In this post, we’ll look at the same for 3-d tensors. In case you want to learn more about tensor factorisations, look at my earlier post.

General Tensor Factorisation

General tensor factorisation for a 3d tensor A (M X N X O) would produce 3 factors- X (M X K), Y (N X K) and Z (O X K). The \(A_{ijl}\) entry can be found as (Khatri product) :

\[ A_{ijl} = \sum_k{X_{ik}Y_{jk}Z_{lk}}\]

Learning \(X_M\) factor

However, we’d assume that the \(M^{th}\) entry isn’t a part of this decomposition. So, how do we obtain the X factors correspondonding to \(M^{th}\) entry? We learn the Y and Z factors from the tensor A (excluding the \(M^{th}\) row entries). We assume the Y and Z learnt to be shared across the entries across rows of A (1 through M).

The above figure shows the latent factor for X (\(X_{M}\)) corresponding to the \(M^{th}\) entry of X that we wish to learn. On the LHS, we see the matrix corresponding to \(A_{M}\). The highlighted entry of \(A_{M}\) is created by element-wise multiplication of \(X_M, Y_0, Z_0\) and then summing. Thus, each of the N X O entries of \(A_M\) are created by multiplying \(X_M\) with a row from Y and a row from Z. In general,

\[A_{M, n, o} = \sum_k{X_{M, k} \times Y_{n, k} \times Z_{o, k}}\]

Now, to learn \(X_M\), we plan to use least squares. For that, we need to reduce the problem into \(\alpha x = \beta\) We do this as follows:

We flatten out the A_M matrix into a vector containing N X O entries and call it \(\beta\)
We create a matrix by element-wise multiplication of each row of Y with each row of Z to create \(\alpha\) of shape (N X O, K)

We can now write,

\[ \alpha X_M^T \approx \beta \] Thus, X_M^T = Least Squares (\(\alpha, \beta\))

Ofcourse, \(\beta\) can have missing entries, which we mask out. Thus, we can write:

\(X_M^T\) = Least Squares (\(\alpha [Mask], \beta [Mask]\))

In case we’re doing a non-negative tensor factorisation, we can instead learn \(X_M^T\) as follows: \(X_M^T\) = Non-negative Least Squares (\(\alpha [Mask], \beta [Mask]\))

Code example

import tensorly
from tensorly.decomposition import parafac, non_negative_parafac
import numpy as np

Creating the tensor to be decomposed

M, N, O = 10, 4, 3 #user, movie, feature
t = np.arange(M*N*O).reshape(M, N, O).astype('float32')
t[0] #First entry

array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.]], dtype=float32)

Setting a few entries of the last user to be unavailable/unknown

t_orig = t.copy() # creating a copy
t[-1,:,:][0, 0] = np.NAN
t[-1,:,:][2, 2] = np.NAN
t[-1,:,:]

array([[  nan,  109.,  110.],
       [ 111.,  112.,  113.],
       [ 114.,  115.,   nan],
       [ 117.,  118.,  119.]], dtype=float32)

Standard Non-negative PARAFAC decomposition

K = 2
# Notice, we factorise a tensor with one less user. thus, t[:-1, :, :]
X, Y, Z = non_negative_parafac(t[:-1,:,:], rank=K)

X.shape, Y.shape, Z.shape

((9, 2), (4, 2), (3, 2))

array([[ 0.48012616,  1.13542261],
       [ 0.49409014,  2.98947262],
       [ 0.5072998 ,  5.03375154],
       [ 0.52051081,  7.07682331]])

array([[ 0.57589198,  1.55655956],
       [ 0.58183329,  1.7695134 ],
       [ 0.58778163,  1.98182137]])

Creating \(\alpha\) by element-wise multiplication of Y, Z and reshaping

alpha = np.einsum('nk, ok -> nok', Y, Z).reshape((N*O, K))
print alpha

print "\nShape of alpha = ", alpha.shape

[[  0.27650081   1.76735291]
 [  0.27935339   2.00914552]
 [  0.28220934   2.25020479]
 [  0.28454255   4.65329218]
 [  0.28747809   5.28991186]
 [  0.29041711   5.92460074]
 [  0.29214989   7.83533407]
 [  0.29516391   8.9072908 ]
 [  0.29818151   9.9759964 ]
 [  0.299758    11.01549695]
 [  0.30285052  12.52253365]
 [  0.30594669  14.02499969]]

Shape of alpha =  (12, 2)

from scipy.optimize import nnls

Creating \(\beta\)

beta = t[-1,:,:].reshape(N*O, 1)
mask = ~np.isnan(beta).flatten()
beta[mask].reshape(-1, 1)

array([[ 109.],
       [ 110.],
       [ 111.],
       [ 112.],
       [ 113.],
       [ 114.],
       [ 115.],
       [ 117.],
       [ 118.],
       [ 119.]], dtype=float32)

Learning \(X_M\)

X_M = nnls(alpha[mask], beta[mask].reshape(-1, ))[0].reshape((1, K))
X_M

array([[ 389.73825036,    0.        ]])

Comparing X_M with other entries from X

array([[  7.40340055e-01,   7.62705972e-01],
       [  4.14288653e+01,   7.57249713e-01],
       [  8.51282259e+01,   6.56239315e-01],
       [  1.29063811e+02,   5.46019997e-01],
       [  1.73739412e+02,   4.06496594e-01],
       [  2.19798887e+02,   2.11453297e-01],
       [  2.64609697e+02,   6.54705290e-02],
       [  3.01392149e+02,   2.39700484e-01],
       [  3.39963876e+02,   3.41824756e-01]])

It seems that the first column captures the increasing trend of values in the tensor

Predicting missing entries using tensor multiplication

np.round(np.einsum('ir, jr, kr -> ijk', X_M, Y, Z))

array([[[ 108.,  109.,  110.],
        [ 111.,  112.,  113.],
        [ 114.,  115.,  116.],
        [ 117.,  118.,  119.]]])

Actual entries

t_orig[-1, :, :]

array([[ 108.,  109.,  110.],
       [ 111.,  112.,  113.],
       [ 114.,  115.,  116.],
       [ 117.,  118.,  119.]], dtype=float32)

Not bad! We’re exactly there!