# Out of Tensor factorisation

Out of tensor factorisation
ML
Author

Nipun Batra

Published

April 20, 2017

In a previous post, we had looked at predicting for users who weren’t a part of the original matrix factorisation. In this post, we’ll look at the same for 3-d tensors. In case you want to learn more about tensor factorisations, look at my earlier post.

### General Tensor Factorisation

General tensor factorisation for a 3d tensor A (M X N X O) would produce 3 factors- X (M X K), Y (N X K) and Z (O X K). The $$A_{ijl}$$ entry can be found as (Khatri product) :

$A_{ijl} = \sum_k{X_{ik}Y_{jk}Z_{lk}}$

### Learning $$X_M$$ factor

However, we’d assume that the $$M^{th}$$ entry isn’t a part of this decomposition. So, how do we obtain the X factors correspondonding to $$M^{th}$$ entry? We learn the Y and Z factors from the tensor A (excluding the $$M^{th}$$ row entries). We assume the Y and Z learnt to be shared across the entries across rows of A (1 through M).

The above figure shows the latent factor for X ($$X_{M}$$) corresponding to the $$M^{th}$$ entry of X that we wish to learn. On the LHS, we see the matrix corresponding to $$A_{M}$$. The highlighted entry of $$A_{M}$$ is created by element-wise multiplication of $$X_M, Y_0, Z_0$$ and then summing. Thus, each of the N X O entries of $$A_M$$ are created by multiplying $$X_M$$ with a row from Y and a row from Z. In general,

$A_{M, n, o} = \sum_k{X_{M, k} \times Y_{n, k} \times Z_{o, k}}$

Now, to learn $$X_M$$, we plan to use least squares. For that, we need to reduce the problem into $$\alpha x = \beta$$ We do this as follows:

1. We flatten out the A_M matrix into a vector containing N X O entries and call it $$\beta$$
2. We create a matrix by element-wise multiplication of each row of Y with each row of Z to create $$\alpha$$ of shape (N X O, K)

We can now write,

$\alpha X_M^T \approx \beta$ Thus, X_M^T = Least Squares ($$\alpha, \beta$$)

Ofcourse, $$\beta$$ can have missing entries, which we mask out. Thus, we can write:

$$X_M^T$$ = Least Squares ($$\alpha [Mask], \beta [Mask]$$)

In case we’re doing a non-negative tensor factorisation, we can instead learn $$X_M^T$$ as follows: $$X_M^T$$ = Non-negative Least Squares ($$\alpha [Mask], \beta [Mask]$$)

### Code example

import tensorly
from tensorly.decomposition import parafac, non_negative_parafac
import numpy as np

#### Creating the tensor to be decomposed

M, N, O = 10, 4, 3 #user, movie, feature
t = np.arange(M*N*O).reshape(M, N, O).astype('float32')
t[0] #First entry
array([[  0.,   1.,   2.],
[  3.,   4.,   5.],
[  6.,   7.,   8.],
[  9.,  10.,  11.]], dtype=float32)

#### Setting a few entries of the last user to be unavailable/unknown

t_orig = t.copy() # creating a copy
t[-1,:,:][0, 0] = np.NAN
t[-1,:,:][2, 2] = np.NAN
t[-1,:,:]
array([[  nan,  109.,  110.],
[ 111.,  112.,  113.],
[ 114.,  115.,   nan],
[ 117.,  118.,  119.]], dtype=float32)

#### Standard Non-negative PARAFAC decomposition

K = 2
# Notice, we factorise a tensor with one less user. thus, t[:-1, :, :]
X, Y, Z = non_negative_parafac(t[:-1,:,:], rank=K)
X.shape, Y.shape, Z.shape
((9, 2), (4, 2), (3, 2))
Y
array([[ 0.48012616,  1.13542261],
[ 0.49409014,  2.98947262],
[ 0.5072998 ,  5.03375154],
[ 0.52051081,  7.07682331]])
Z
array([[ 0.57589198,  1.55655956],
[ 0.58183329,  1.7695134 ],
[ 0.58778163,  1.98182137]])

#### Creating $$\alpha$$ by element-wise multiplication of Y, Z and reshaping

alpha = np.einsum('nk, ok -> nok', Y, Z).reshape((N*O, K))
print alpha

print "\nShape of alpha = ", alpha.shape
[[  0.27650081   1.76735291]
[  0.27935339   2.00914552]
[  0.28220934   2.25020479]
[  0.28454255   4.65329218]
[  0.28747809   5.28991186]
[  0.29041711   5.92460074]
[  0.29214989   7.83533407]
[  0.29516391   8.9072908 ]
[  0.29818151   9.9759964 ]
[  0.299758    11.01549695]
[  0.30285052  12.52253365]
[  0.30594669  14.02499969]]

Shape of alpha =  (12, 2)
from scipy.optimize import nnls

#### Creating $$\beta$$

beta = t[-1,:,:].reshape(N*O, 1)
beta[mask].reshape(-1, 1)
array([[ 109.],
[ 110.],
[ 111.],
[ 112.],
[ 113.],
[ 114.],
[ 115.],
[ 117.],
[ 118.],
[ 119.]], dtype=float32)

#### Learning $$X_M$$

X_M = nnls(alpha[mask], beta[mask].reshape(-1, ))[0].reshape((1, K))
X_M
array([[ 389.73825036,    0.        ]])

#### Comparing X_M with other entries from X

X
array([[  7.40340055e-01,   7.62705972e-01],
[  4.14288653e+01,   7.57249713e-01],
[  8.51282259e+01,   6.56239315e-01],
[  1.29063811e+02,   5.46019997e-01],
[  1.73739412e+02,   4.06496594e-01],
[  2.19798887e+02,   2.11453297e-01],
[  2.64609697e+02,   6.54705290e-02],
[  3.01392149e+02,   2.39700484e-01],
[  3.39963876e+02,   3.41824756e-01]])

It seems that the first column captures the increasing trend of values in the tensor

#### Predicting missing entries using tensor multiplication

np.round(np.einsum('ir, jr, kr -> ijk', X_M, Y, Z))
array([[[ 108.,  109.,  110.],
[ 111.,  112.,  113.],
[ 114.,  115.,  116.],
[ 117.,  118.,  119.]]])

#### Actual entries

t_orig[-1, :, :]
array([[ 108.,  109.,  110.],
[ 111.,  112.,  113.],
[ 114.,  115.,  116.],
[ 117.,  118.,  119.]], dtype=float32)