Out of Tensor factorisation

In a previous post, we had looked at predicting for users who weren't a part of the original matrix factorisation. In this post, we'll look at the same for 3-d tensors. In case you want to learn more about tensor factorisations, look at my earlier post.

General Tensor Factorisation

General tensor factorisation for a 3d tensor A (M X N X O) would produce 3 factors- X (M X K), Y (N X K) and Z (O X K). The $A_{ijl}$ entry can be found as (Khatri product) :

$$ A_{ijl} = \sum_k{X_{ik}Y_{jk}Z_{lk}}$$

Learning $X_M$ factor

However, we'd assume that the $M^{th}$ entry isn't a part of this decomposition. So, how do we obtain the X factors correspondonding to $M^{th}$ entry? We learn the Y and Z factors from the tensor A (excluding the $M^{th}$ row entries). We assume the Y and Z learnt to be shared across the entries across rows of A (1 through M).

The above figure shows the latent factor for X ($X_{M}$) corresponding to the $M^{th}$ entry of X that we wish to learn. On the LHS, we see the matrix corresponding to $A_{M}$. The highlighted entry of $A_{M}$ is created by element-wise multiplication of $X_M, Y_0, Z_0$ and then summing. Thus, each of the N X O entries of $A_M$ are created by multiplying $X_M$ with a row from Y and a row from Z. In general,

$$A_{M, n, o} = \sum_k{X_{M, k} \times Y_{n, k} \times Z_{o, k}}$$

Now, to learn $X_M$, we plan to use least squares. For that, we need to reduce the problem into $\alpha x = \beta$ We do this as follows:

  1. We flatten out the A_M matrix into a vector containing N X O entries and call it $\beta$
  2. We create a matrix by element-wise multiplication of each row of Y with each row of Z to create $\alpha$ of shape (N X O, K)

We can now write,

$$ \alpha X_M^T \approx \beta $$

Thus, X_M^T = Least Squares ($\alpha, \beta$)

Ofcourse, $\beta$ can have missing entries, which we mask out. Thus, we can write:

$X_M^T$ = Least Squares ($\alpha [Mask], \beta [Mask]$)

In case we're doing a non-negative tensor factorisation, we can instead learn $X_M^T$ as follows: $X_M^T$ = Non-negative Least Squares ($\alpha [Mask], \beta [Mask]$)

Code example

In [1]:
import tensorly
from tensorly.decomposition import parafac, non_negative_parafac
import numpy as np

Creating the tensor to be decomposed

In [2]:
M, N, O = 10, 4, 3 #user, movie, feature
t = np.arange(M*N*O).reshape(M, N, O).astype('float32')
t[0] #First entry
Out[2]:
array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.]], dtype=float32)

Setting a few entries of the last user to be unavailable/unknown

In [3]:
t_orig = t.copy() # creating a copy
t[-1,:,:][0, 0] = np.NAN
t[-1,:,:][2, 2] = np.NAN
t[-1,:,:]
Out[3]:
array([[  nan,  109.,  110.],
       [ 111.,  112.,  113.],
       [ 114.,  115.,   nan],
       [ 117.,  118.,  119.]], dtype=float32)

Standard Non-negative PARAFAC decomposition

In [4]:
K = 2
# Notice, we factorise a tensor with one less user. thus, t[:-1, :, :]
X, Y, Z = non_negative_parafac(t[:-1,:,:], rank=K)
In [5]:
X.shape, Y.shape, Z.shape
Out[5]:
((9, 2), (4, 2), (3, 2))
In [6]:
Y
Out[6]:
array([[ 0.48012616,  1.13542261],
       [ 0.49409014,  2.98947262],
       [ 0.5072998 ,  5.03375154],
       [ 0.52051081,  7.07682331]])
In [7]:
Z
Out[7]:
array([[ 0.57589198,  1.55655956],
       [ 0.58183329,  1.7695134 ],
       [ 0.58778163,  1.98182137]])

Creating $\alpha$ by element-wise multiplication of Y, Z and reshaping

In [8]:
alpha = np.einsum('nk, ok -> nok', Y, Z).reshape((N*O, K))
print alpha

print "\nShape of alpha = ", alpha.shape
[[  0.27650081   1.76735291]
 [  0.27935339   2.00914552]
 [  0.28220934   2.25020479]
 [  0.28454255   4.65329218]
 [  0.28747809   5.28991186]
 [  0.29041711   5.92460074]
 [  0.29214989   7.83533407]
 [  0.29516391   8.9072908 ]
 [  0.29818151   9.9759964 ]
 [  0.299758    11.01549695]
 [  0.30285052  12.52253365]
 [  0.30594669  14.02499969]]

Shape of alpha =  (12, 2)
In [9]:
from scipy.optimize import nnls

Creating $\beta$

In [10]:
beta = t[-1,:,:].reshape(N*O, 1)
mask = ~np.isnan(beta).flatten()
beta[mask].reshape(-1, 1)
Out[10]:
array([[ 109.],
       [ 110.],
       [ 111.],
       [ 112.],
       [ 113.],
       [ 114.],
       [ 115.],
       [ 117.],
       [ 118.],
       [ 119.]], dtype=float32)

Learning $X_M$

In [11]:
X_M = nnls(alpha[mask], beta[mask].reshape(-1, ))[0].reshape((1, K))
X_M
Out[11]:
array([[ 389.73825036,    0.        ]])

Comparing X_M with other entries from X

In [12]:
X
Out[12]:
array([[  7.40340055e-01,   7.62705972e-01],
       [  4.14288653e+01,   7.57249713e-01],
       [  8.51282259e+01,   6.56239315e-01],
       [  1.29063811e+02,   5.46019997e-01],
       [  1.73739412e+02,   4.06496594e-01],
       [  2.19798887e+02,   2.11453297e-01],
       [  2.64609697e+02,   6.54705290e-02],
       [  3.01392149e+02,   2.39700484e-01],
       [  3.39963876e+02,   3.41824756e-01]])

It seems that the first column captures the increasing trend of values in the tensor

Predicting missing entries using tensor multiplication

In [13]:
np.round(np.einsum('ir, jr, kr -> ijk', X_M, Y, Z))
Out[13]:
array([[[ 108.,  109.,  110.],
        [ 111.,  112.,  113.],
        [ 114.,  115.,  116.],
        [ 117.,  118.,  119.]]])

Actual entries

In [14]:
t_orig[-1, :, :]
Out[14]:
array([[ 108.,  109.,  110.],
       [ 111.,  112.,  113.],
       [ 114.,  115.,  116.],
       [ 117.,  118.,  119.]], dtype=float32)

Not bad! We're exactly there!