Non-negative matrix factorization using Tensorflow

In a previous post, we had seen how to perfom non-negative matrix factorization (NNMF) using non-negative least squares (NNLS). In this post, we will look at performing NNMF using TensorFlow. As before, we will look at factorizing matrices that may contain missing entries (for the problem of movie recommendation, etc.). Like explained in a previous post, we would be using projected gradient descent for this problem. In other words, we would be computing the gradient, then ensuring that the weights are non-negative, then perform gradient descent.

Customary imports

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
np.random.seed(0)

Creating the matrix to be decomposed

In [2]:
A_orig = np.array([[3, 4, 5, 2],
                   [4, 4, 3, 3],
                   [5, 5, 4, 4]], dtype=np.float32).T

A_orig_df = pd.DataFrame(A_orig)
In [3]:
A_orig_df #(4 users, 3 movies)
Out[3]:
0 1 2
0 3.0 4.0 5.0
1 4.0 4.0 5.0
2 5.0 3.0 4.0
3 2.0 3.0 4.0

Masking some entries

In [4]:
A_df_masked = A_orig_df.copy()
A_df_masked.iloc[0,0]=np.NAN
In [5]:
np_mask = A_df_masked.notnull()
np_mask
Out[5]:
0 1 2
0 False True True
1 True True True
2 True True True
3 True True True

Basic Tensorflow setup

In [6]:
# Boolean mask for computing cost only on valid (not missing) entries
tf_mask = tf.Variable(np_mask.values)

A = tf.constant(A_df_masked.values)
shape = A_df_masked.values.shape

#latent factors
rank = 3 

# Initializing random H and W
temp_H = np.random.randn(rank, shape[1]).astype(np.float32)
temp_H = np.divide(temp_H, temp_H.max())

temp_W = np.random.randn(shape[0], rank).astype(np.float32)
temp_W = np.divide(temp_W, temp_W.max())

H =  tf.Variable(temp_H)
W = tf.Variable(temp_W)
WH = tf.matmul(W, H)

Cost function

In [7]:
#cost of Frobenius norm
cost = tf.reduce_sum(tf.pow(tf.boolean_mask(A, tf_mask) - tf.boolean_mask(WH, tf_mask), 2))

Misc. Tensorflow

In [8]:
# Learning rate
lr = 0.001
# Number of steps
steps = 1000
train_step = tf.train.GradientDescentOptimizer(lr).minimize(cost)
init = tf.global_variables_initializer()

Ensuring non-negativity

In [9]:
# Clipping operation. This ensure that W and H learnt are non-negative
clip_W = W.assign(tf.maximum(tf.zeros_like(W), W))
clip_H = H.assign(tf.maximum(tf.zeros_like(H), H))
clip = tf.group(clip_W, clip_H)

Main Tensorflow routine

In [10]:
steps = 1000
with tf.Session() as sess:
    sess.run(init)
    for i in range(steps):
        sess.run(train_step)
        sess.run(clip)
        if i%100==0:
            print("\nCost: %f" % sess.run(cost))
            print("*"*40)
    learnt_W = sess.run(W)
    learnt_H = sess.run(H)
Cost: 148.859848
****************************************

Cost: 3.930172
****************************************

Cost: 2.068570
****************************************

Cost: 1.418309
****************************************

Cost: 0.819721
****************************************

Cost: 0.399933
****************************************

Cost: 0.176080
****************************************

Cost: 0.079007
****************************************

Cost: 0.041353
****************************************

Cost: 0.027041
****************************************

Computing the prediction

In [11]:
learnt_H
Out[11]:
array([[ 0.86129224,  1.3388027 ,  1.97224879],
       [ 2.16338873,  0.97277433,  1.17212451],
       [ 0.25879648,  1.07861733,  1.09541821]], dtype=float32)
In [12]:
learnt_W
Out[12]:
array([[ 1.15797794,  0.97454673,  1.41825044],
       [ 1.44136858,  1.16967547,  0.79135358],
       [ 0.81640321,  1.98227394,  0.02636297],
       [ 1.38819814,  0.29285902,  0.8031919 ]], dtype=float32)
In [13]:
pred = np.dot(learnt_W, learnt_H)
pred_df = pd.DataFrame(pred)
pred_df.round()
Out[13]:
0 1 2
0 3.0 4.0 5.0
1 4.0 4.0 5.0
2 5.0 3.0 4.0
3 2.0 3.0 4.0

Not bad! Just to recall our originial matrix.

In [14]:
A_orig_df
Out[14]:
0 1 2
0 3.0 4.0 5.0
1 4.0 4.0 5.0
2 5.0 3.0 4.0
3 2.0 3.0 4.0
>>