import tensorly
from tensorly.decomposition import parafac, non_negative_parafac
import numpy as np
In a previous post, we had looked at predicting for users who weren’t a part of the original matrix factorisation. In this post, we’ll look at the same for 3-d tensors. In case you want to learn more about tensor factorisations, look at my earlier post.
General Tensor Factorisation
General tensor factorisation for a 3d tensor A (M X N X O) would produce 3 factors- X (M X K), Y (N X K) and Z (O X K). The \(A_{ijl}\) entry can be found as (Khatri product) :
\[ A_{ijl} = \sum_k{X_{ik}Y_{jk}Z_{lk}}\]
Learning \(X_M\) factor
However, we’d assume that the \(M^{th}\) entry isn’t a part of this decomposition. So, how do we obtain the X factors correspondonding to \(M^{th}\) entry? We learn the Y and Z factors from the tensor A (excluding the \(M^{th}\) row entries). We assume the Y and Z learnt to be shared across the entries across rows of A (1 through M).
The above figure shows the latent factor for X (\(X_{M}\)) corresponding to the \(M^{th}\) entry of X that we wish to learn. On the LHS, we see the matrix corresponding to \(A_{M}\). The highlighted entry of \(A_{M}\) is created by element-wise multiplication of \(X_M, Y_0, Z_0\) and then summing. Thus, each of the N X O entries of \(A_M\) are created by multiplying \(X_M\) with a row from Y and a row from Z. In general,
\[A_{M, n, o} = \sum_k{X_{M, k} \times Y_{n, k} \times Z_{o, k}}\]
Now, to learn \(X_M\), we plan to use least squares. For that, we need to reduce the problem into \(\alpha x = \beta\) We do this as follows:
- We flatten out the A_M matrix into a vector containing N X O entries and call it \(\beta\)
- We create a matrix by element-wise multiplication of each row of Y with each row of Z to create \(\alpha\) of shape (N X O, K)
We can now write,
\[ \alpha X_M^T \approx \beta \] Thus, X_M^T = Least Squares (\(\alpha, \beta\))
Ofcourse, \(\beta\) can have missing entries, which we mask out. Thus, we can write:
\(X_M^T\) = Least Squares (\(\alpha [Mask], \beta [Mask]\))
In case we’re doing a non-negative tensor factorisation, we can instead learn \(X_M^T\) as follows: \(X_M^T\) = Non-negative Least Squares (\(\alpha [Mask], \beta [Mask]\))
Code example
Creating the tensor to be decomposed
= 10, 4, 3 #user, movie, feature
M, N, O = np.arange(M*N*O).reshape(M, N, O).astype('float32')
t 0] #First entry t[
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.]], dtype=float32)
Standard Non-negative PARAFAC decomposition
= 2
K # Notice, we factorise a tensor with one less user. thus, t[:-1, :, :]
= non_negative_parafac(t[:-1,:,:], rank=K) X, Y, Z
X.shape, Y.shape, Z.shape
((9, 2), (4, 2), (3, 2))
Y
array([[ 0.48012616, 1.13542261],
[ 0.49409014, 2.98947262],
[ 0.5072998 , 5.03375154],
[ 0.52051081, 7.07682331]])
Z
array([[ 0.57589198, 1.55655956],
[ 0.58183329, 1.7695134 ],
[ 0.58778163, 1.98182137]])
Creating \(\alpha\) by element-wise multiplication of Y, Z and reshaping
= np.einsum('nk, ok -> nok', Y, Z).reshape((N*O, K))
alpha print alpha
print "\nShape of alpha = ", alpha.shape
[[ 0.27650081 1.76735291]
[ 0.27935339 2.00914552]
[ 0.28220934 2.25020479]
[ 0.28454255 4.65329218]
[ 0.28747809 5.28991186]
[ 0.29041711 5.92460074]
[ 0.29214989 7.83533407]
[ 0.29516391 8.9072908 ]
[ 0.29818151 9.9759964 ]
[ 0.299758 11.01549695]
[ 0.30285052 12.52253365]
[ 0.30594669 14.02499969]]
Shape of alpha = (12, 2)
from scipy.optimize import nnls
Creating \(\beta\)
= t[-1,:,:].reshape(N*O, 1)
beta = ~np.isnan(beta).flatten()
mask -1, 1) beta[mask].reshape(
array([[ 109.],
[ 110.],
[ 111.],
[ 112.],
[ 113.],
[ 114.],
[ 115.],
[ 117.],
[ 118.],
[ 119.]], dtype=float32)
Learning \(X_M\)
= nnls(alpha[mask], beta[mask].reshape(-1, ))[0].reshape((1, K))
X_M X_M
array([[ 389.73825036, 0. ]])
Comparing X_M with other entries from X
X
array([[ 7.40340055e-01, 7.62705972e-01],
[ 4.14288653e+01, 7.57249713e-01],
[ 8.51282259e+01, 6.56239315e-01],
[ 1.29063811e+02, 5.46019997e-01],
[ 1.73739412e+02, 4.06496594e-01],
[ 2.19798887e+02, 2.11453297e-01],
[ 2.64609697e+02, 6.54705290e-02],
[ 3.01392149e+02, 2.39700484e-01],
[ 3.39963876e+02, 3.41824756e-01]])
It seems that the first column captures the increasing trend of values in the tensor
Predicting missing entries using tensor multiplication
round(np.einsum('ir, jr, kr -> ijk', X_M, Y, Z)) np.
array([[[ 108., 109., 110.],
[ 111., 112., 113.],
[ 114., 115., 116.],
[ 117., 118., 119.]]])
Actual entries
-1, :, :] t_orig[
array([[ 108., 109., 110.],
[ 111., 112., 113.],
[ 114., 115., 116.],
[ 117., 118., 119.]], dtype=float32)
Not bad! We’re exactly there!