What if we to predict for entries not within the matrix?!
ML
Author
Nipun Batra
Published
April 19, 2017
I have written a bunch of posts on this blog about non-negative matrix factorisation (NNMF). However, all of them involved the test user to be a part of the matrix that we factorise to learn the latent factors. Is that always the case? Read on to find more!
Standard Problem
Our goal is given a matrix A, decompose it into two non-negative factors, as follows:
$ A_{M N} W_{M K} H_{K N} $, such that $ W_{M K} $ and $ H_{K N} $
Our Problem- Out of matrix factorisation
Imagine that we have trained the model for M-1 users on N movies. Now, the \(M^{th}\) user has rated some movies. Do we retrain the model from scratch to consider \(M^{th}\) user? This can be a very expensive operation!
Instead, as shown in above figure, we will learn the user factor for the \(M^{th}\) user. We can do this the shared movie factor (H) has already been learnt.
We can formulate as follows:
\[
A[M,:] = W[M,:]H
\]
Taking transpose both sides
\[
A[M,:]^T = H^T W[M,:]^T
\]
However, \(A[M,:]^T\) will have missing entries. Thus, we can mask those entries from the calculation as shown below.
If instead we want the factors to be non-negative, we can use non-negative least squares instead of usual least squares for this estimation.
Code example
I’ll now present a simple code example to illustrate the procedure.
Defining matrix A
import numpy as npimport pandas as pdM, N =20, 10np.random.seed(0)A_orig = np.abs(np.random.uniform(low=0.0, high=1.0, size=(M,N)))pd.DataFrame(A_orig).head()
0
1
2
3
4
5
6
7
8
9
0
0.548814
0.715189
0.602763
0.544883
0.423655
0.645894
0.437587
0.891773
0.963663
0.383442
1
0.791725
0.528895
0.568045
0.925597
0.071036
0.087129
0.020218
0.832620
0.778157
0.870012
2
0.978618
0.799159
0.461479
0.780529
0.118274
0.639921
0.143353
0.944669
0.521848
0.414662
3
0.264556
0.774234
0.456150
0.568434
0.018790
0.617635
0.612096
0.616934
0.943748
0.681820
4
0.359508
0.437032
0.697631
0.060225
0.666767
0.670638
0.210383
0.128926
0.315428
0.363711
Masking a few entries
A = A_orig.copy()A[0, 0] = np.NANA[3, 1] = np.NANA[6, 3] = np.NAN# Masking for last user. A[19, 2] = np.NANA[19, 7] = np.NAN
We will be using A2 (first 19 users) matrix for learning the movie factors and the user factors for the 19 users
A2 = A[:-1,:]A2.shape
(19, 10)
A_df = pd.DataFrame(A)A_df.head()
0
1
2
3
4
5
6
7
8
9
0
NaN
0.715189
0.602763
0.544883
0.423655
0.645894
0.437587
0.891773
0.963663
0.383442
1
0.791725
0.528895
0.568045
0.925597
0.071036
0.087129
0.020218
0.832620
0.778157
0.870012
2
0.978618
0.799159
0.461479
0.780529
0.118274
0.639921
0.143353
0.944669
0.521848
0.414662
3
0.264556
NaN
0.456150
0.568434
0.018790
0.617635
0.612096
0.616934
0.943748
0.681820
4
0.359508
0.437032
0.697631
0.060225
0.666767
0.670638
0.210383
0.128926
0.315428
0.363711
Defining matrices W and H (learning on M-1 users and N movies)
However, since A has missing entries, we have to define the cost in terms of the entries present in A
def cost(A, W, H):from numpy import linalg mask = pd.DataFrame(A).notnull().values WH = np.dot(W, H) WH_mask = WH[mask] A_mask = A[mask] A_WH_mask = A_mask-WH_mask# Since now A_WH_mask is a vector, we use L2 instead of Frobenius norm for matrixreturn linalg.norm(A_WH_mask, 2)
Let us just try to see the cost of the initial set of values of W and H we randomly assigned. Notice, we pass A2!
cost(A2, W, H)
7.2333001567031294
Alternating NNLS procedure
num_iter =1000num_display_cost =max(int(num_iter/10), 1)from scipy.optimize import nnlsfor i inrange(num_iter):if i%2==0:# Learn H, given A and Wfor j inrange(N): mask_rows = pd.Series(A2[:,j]).notnull() H[:,j] = nnls(W[mask_rows], A2[:,j][mask_rows])[0]else:for j inrange(M-1): mask_rows = pd.Series(A2[j,:]).notnull() W[j,:] = nnls(H.transpose()[mask_rows], A2[j,:][mask_rows])[0] WH = np.dot(W, H) c = cost(A2, W, H)if i%num_display_cost==0:print i, c
There you go, we are able to get ratings for the \(M^{th}\) user for the movies that they have not seen. We only trained the model on the other users! Ofcourse, these numbers might not look so impressive. However, this was just a toy example based on random data. In reality, we could expect better results!