Nipun Batra Blog - Out of matrix non-negative matrix factorisation

I have written a bunch of posts on this blog about non-negative matrix factorisation (NNMF). However, all of them involved the test user to be a part of the matrix that we factorise to learn the latent factors. Is that always the case? Read on to find more!

Standard Problem

Our goal is given a matrix A, decompose it into two non-negative factors, as follows:

$ A_{M N} W_{M K} H_{K N} $, such that $ W_{M K} $ and $ H_{K N} $

Our Problem- Out of matrix factorisation

Imagine that we have trained the model for M-1 users on N movies. Now, the $M^{th}$ user has rated some movies. Do we retrain the model from scratch to consider $M^{th}$ user? This can be a very expensive operation!

Instead, as shown in above figure, we will learn the user factor for the $M^{th}$ user. We can do this the shared movie factor (H) has already been learnt.

We can formulate as follows:

\[ A[M,:] = W[M,:]H \]

Taking transpose both sides

\[ A[M,:]^T = H^T W[M,:]^T \]

However, $A[M,:]^T$ will have missing entries. Thus, we can mask those entries from the calculation as shown below.

Thus, we can write

\[ W[M,:]^T = \mathrm{Least Squares} (H^T[Mask], A[M,:]^T[Mask]) \]

If instead we want the factors to be non-negative, we can use non-negative least squares instead of usual least squares for this estimation.

Code example

I’ll now present a simple code example to illustrate the procedure.

Defining matrix A

import numpy as np
import pandas as pd

M, N = 20, 10

np.random.seed(0)
A_orig = np.abs(np.random.uniform(low=0.0, high=1.0, size=(M,N)))
pd.DataFrame(A_orig).head()

	0	1	2	3	4	5	6	7	8	9
0	0.548814	0.715189	0.602763	0.544883	0.423655	0.645894	0.437587	0.891773	0.963663	0.383442
1	0.791725	0.528895	0.568045	0.925597	0.071036	0.087129	0.020218	0.832620	0.778157	0.870012
2	0.978618	0.799159	0.461479	0.780529	0.118274	0.639921	0.143353	0.944669	0.521848	0.414662
3	0.264556	0.774234	0.456150	0.568434	0.018790	0.617635	0.612096	0.616934	0.943748	0.681820
4	0.359508	0.437032	0.697631	0.060225	0.666767	0.670638	0.210383	0.128926	0.315428	0.363711

Masking a few entries

A = A_orig.copy()
A[0, 0] = np.NAN
A[3, 1] = np.NAN
A[6, 3] = np.NAN

# Masking for last user. 
A[19, 2] = np.NAN
A[19, 7] = np.NAN

We will be using A2 (first 19 users) matrix for learning the movie factors and the user factors for the 19 users

A2 = A[:-1,:]
A2.shape

(19, 10)

A_df = pd.DataFrame(A)
A_df.head()

	0	1	2	3	4	5	6	7	8	9
0	NaN	0.715189	0.602763	0.544883	0.423655	0.645894	0.437587	0.891773	0.963663	0.383442
1	0.791725	0.528895	0.568045	0.925597	0.071036	0.087129	0.020218	0.832620	0.778157	0.870012
2	0.978618	0.799159	0.461479	0.780529	0.118274	0.639921	0.143353	0.944669	0.521848	0.414662
3	0.264556	NaN	0.456150	0.568434	0.018790	0.617635	0.612096	0.616934	0.943748	0.681820
4	0.359508	0.437032	0.697631	0.060225	0.666767	0.670638	0.210383	0.128926	0.315428	0.363711

Defining matrices W and H (learning on M-1 users and N movies)

K = 4
W = np.abs(np.random.uniform(low=0, high=1, size=(M-1, K)))
H = np.abs(np.random.uniform(low=0, high=1, size=(K, N)))
W = np.divide(W, K*W.max())
H = np.divide(H, K*H.max())

pd.DataFrame(W).head()

	0	1	2	3
0	0.078709	0.175784	0.095359	0.045339
1	0.006230	0.016976	0.171505	0.114531
2	0.135453	0.226355	0.250000	0.054753
3	0.167387	0.066473	0.005213	0.191444
4	0.080785	0.096801	0.148514	0.209789

pd.DataFrame(H).head()

	0	1	2	3	4	5	6	7	8	9
0	0.239700	0.203498	0.160529	0.222617	0.074611	0.216164	0.157328	0.003370	0.088415	0.037721
1	0.250000	0.121806	0.126649	0.162827	0.093851	0.034858	0.209333	0.048340	0.130195	0.057117
2	0.024914	0.219537	0.247731	0.244654	0.230833	0.197093	0.084828	0.020651	0.103694	0.059133
3	0.033735	0.013604	0.184756	0.002910	0.196210	0.037417	0.020248	0.022815	0.171121	0.062477

Defining the cost that we want to minimise

def cost(A, W, H):
    from numpy import linalg
    WH = np.dot(W, H)
    A_WH = A-WH
    return linalg.norm(A_WH, 'fro')

However, since A has missing entries, we have to define the cost in terms of the entries present in A

def cost(A, W, H):
    from numpy import linalg
    mask = pd.DataFrame(A).notnull().values
    WH = np.dot(W, H)
    WH_mask = WH[mask]
    A_mask = A[mask]
    A_WH_mask = A_mask-WH_mask
    # Since now A_WH_mask is a vector, we use L2 instead of Frobenius norm for matrix
    return linalg.norm(A_WH_mask, 2)

Let us just try to see the cost of the initial set of values of W and H we randomly assigned. Notice, we pass A2!

cost(A2, W, H)

7.2333001567031294

Alternating NNLS procedure

num_iter = 1000
num_display_cost = max(int(num_iter/10), 1)
from scipy.optimize import nnls

for i in range(num_iter):
    if i%2 ==0:
        # Learn H, given A and W
        for j in range(N):
            mask_rows = pd.Series(A2[:,j]).notnull()
            H[:,j] = nnls(W[mask_rows], A2[:,j][mask_rows])[0]
    else:
        for j in range(M-1):
            mask_rows = pd.Series(A2[j,:]).notnull()
            W[j,:] = nnls(H.transpose()[mask_rows], A2[j,:][mask_rows])[0]
    WH = np.dot(W, H)
    c = cost(A2, W, H)
    if i%num_display_cost==0:
        print i, c

0 3.74162948918
100 2.25416363991
200 2.25258698617
300 2.25229707846
400 2.25131714233
500 2.24968386447
600 2.24967129897
700 2.24965023589
800 2.24961410381
900 2.24955008837

A_pred = pd.DataFrame(np.dot(W, H))
A_pred.head()

	0	1	2	3	4	5	6	7	8	9
0	0.590301	0.653038	0.531940	0.623272	0.584763	0.630835	0.574041	0.700139	0.841706	0.565808
1	0.802724	0.532299	0.482430	1.017968	0.149923	0.449312	0.097775	0.708727	0.506595	0.846219
2	0.764296	0.563711	0.527292	0.905236	0.306275	0.505674	0.223192	0.705882	0.604356	0.757878
3	0.373539	0.745239	0.334948	0.663219	0.132686	0.551844	0.760420	0.598546	0.808108	0.627732
4	0.467623	0.331457	0.617263	0.239449	0.634455	0.370041	0.294412	0.288539	0.484822	0.126945

Learning home factors for $M^{th}$ home

A_m = A[-1,:]
A_m_transpose = A_m.T
mask = ~np.isnan(A_m_transpose)
W_m = nnls(H.T[mask], A_m_transpose[mask])[0]

W_m

array([ 0.12248095,  0.20778687,  0.15185613,  0.        ])

Predicting for $M^{th}$ home

ratings_m_home = np.dot(H.T, W_m)

ratings_m_home[~mask]

array([ 0.4245947 ,  0.57447552])

A_orig[-1,:][~mask]

array([ 0.18619301,  0.25435648])

There you go, we are able to get ratings for the $M^{th}$ user for the movies that they have not seen. We only trained the model on the other users! Ofcourse, these numbers might not look so impressive. However, this was just a toy example based on random data. In reality, we could expect better results!

Out of matrix non-negative matrix factorisation