Constrained linear regression using Tensorflow

In this post, we will perform a constrained linear regression using Tensorflow. In linear regression, we create a model of the form $Y = W.X + b$. In this post, we will constrain $W$ and $b$ to be non-negative quantities. Such constraints are often useful in different applications. Say, if we want to model household total energy as the air conditioner energy plus some additive noise, then the contribution of both these quantities should be non-negative in nature.

We would be using gradient descent for our regression problem. We calculate the gradients wrt W and b, as usual: $$ \begin{eqnarray} \delta W = \frac{\delta cost}{\delta W} \\ \delta b = \frac{\delta cost}{\delta b} \\ \end{eqnarray} $$

The gist of our approach is that instead of updating as: $$ \begin{eqnarray} W = W - \delta W \times learning rate\\ b = b - \delta b \times learning rate\\ \end{eqnarray} $$

we update as:

$$ \begin{eqnarray} W = max(0, W - \delta W \times learning rate)\\ b = max(0, b - \delta b \times learning rate)\\ \end{eqnarray} $$

Customary imports

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

Creating sample data

In [3]:
n_samples = 50
train_X = np.linspace(1, 50, n_samples)
train_Y = 10*train_X + 6 +40*np.random.randn(50)
train_Y = train_Y.reshape(n_samples, 1)
train_X = train_X.reshape(n_samples, 1)
In [4]:
%matplotlib inline

Plotting sample data

In [5]:
plt.plot(train_X, train_Y, 'ro')
Out[5]:
[<matplotlib.lines.Line2D at 0x10bc8e3d0>]

Solution

Creating regression model

In [6]:
# Model linear regression y = Wx + b. x is of shape `num_samples, 1`
x = tf.placeholder(tf.float32, [None, 1])

# Here we initialize W and b to be negative just to illustrate our point!
W = tf.Variable(-100*tf.ones([1, 1]))
b = tf.Variable(-100*tf.ones([1]))

product = tf.matmul(x,W)
y = product + b

Projected gradient, clipping!

In [7]:
# Clipping operation. 
clip_W = W.assign(tf.maximum(0., W))
clip_b = b.assign(tf.maximum(0., b))
clip = tf.group(clip_W, clip_b)

Defining the cost

In [8]:
y_ = tf.placeholder(tf.float32, [None, 1])

# Cost function 
cost = tf.reduce_sum(tf.pow(y-y_, 2))/(2*n_samples)

Defining params for gradient descent and initializing

In [9]:
lr = 0.001
steps = 10
# Training using Gradient Descent to minimize cost
train_step = tf.train.GradientDescentOptimizer(lr).minimize(cost)

init = tf.global_variables_initializer()

TensorFlow session

In [10]:
with tf.Session() as sess:
    sess.run(init)
    for i in range(steps):
        print("*"*40)
        print("Iteration Number %d" %i)
        print("*"*40)
        print("\nBefore gradient computation")
        print("-"*40)
    
        print("W: %f" % sess.run(W))
        print("b: %f" % sess.run(b))
        feed = { x: train_X, y_: train_Y }
        sess.run(train_step, feed_dict=feed)
        print("\nAfter gradient computation")
        print("-"*40)
        print("W: %f" % sess.run(W))
        print("b: %f" % sess.run(b))
        print("\nAfter gradient projection")
        print("-"*40)
        # THIS line would ensure the projection step happens!
        sess.run(clip)
        print("W: %f" % sess.run(W))
        print("b: %f" % sess.run(b))
        print("\nCost: %f" % sess.run(cost, feed_dict=feed))
        print("*"*40)
    learnt_W = sess.run(W)
    learnt_b = sess.run(b)
****************************************
Iteration Number 0
****************************************

Before gradient computation
----------------------------------------
W: -100.000000
b: -100.000000

After gradient computation
----------------------------------------
W: -3.079384
b: -97.098839

After gradient projection
----------------------------------------
W: 0.000000
b: 0.000000

Cost: 43069.238281
****************************************
****************************************
Iteration Number 1
****************************************

Before gradient computation
----------------------------------------
W: 0.000000
b: 0.000000

After gradient computation
----------------------------------------
W: 8.520617
b: 0.251163

After gradient projection
----------------------------------------
W: 8.520617
b: 0.251163

Cost: 1623.781128
****************************************
****************************************
Iteration Number 2
****************************************

Before gradient computation
----------------------------------------
W: 8.520617
b: 0.251163

After gradient computation
----------------------------------------
W: 9.719881
b: 0.284800

After gradient projection
----------------------------------------
W: 9.719881
b: 0.284800

Cost: 802.807739
****************************************
****************************************
Iteration Number 3
****************************************

Before gradient computation
----------------------------------------
W: 9.719881
b: 0.284800

After gradient computation
----------------------------------------
W: 9.888719
b: 0.287822

After gradient projection
----------------------------------------
W: 9.888719
b: 0.287822

Cost: 786.541626
****************************************
****************************************
Iteration Number 4
****************************************

Before gradient computation
----------------------------------------
W: 9.888719
b: 0.287822

After gradient computation
----------------------------------------
W: 9.912533
b: 0.286535

After gradient projection
----------------------------------------
W: 9.912533
b: 0.286535

Cost: 786.215454
****************************************
****************************************
Iteration Number 5
****************************************

Before gradient computation
----------------------------------------
W: 9.912533
b: 0.286535

After gradient computation
----------------------------------------
W: 9.915935
b: 0.284642

After gradient projection
----------------------------------------
W: 9.915935
b: 0.284642

Cost: 786.205078
****************************************
****************************************
Iteration Number 6
****************************************

Before gradient computation
----------------------------------------
W: 9.915935
b: 0.284642

After gradient computation
----------------------------------------
W: 9.916465
b: 0.282664

After gradient projection
----------------------------------------
W: 9.916465
b: 0.282664

Cost: 786.200989
****************************************
****************************************
Iteration Number 7
****************************************

Before gradient computation
----------------------------------------
W: 9.916465
b: 0.282664

After gradient computation
----------------------------------------
W: 9.916590
b: 0.280675

After gradient projection
----------------------------------------
W: 9.916590
b: 0.280675

Cost: 786.197021
****************************************
****************************************
Iteration Number 8
****************************************

Before gradient computation
----------------------------------------
W: 9.916590
b: 0.280675

After gradient computation
----------------------------------------
W: 9.916658
b: 0.278685

After gradient projection
----------------------------------------
W: 9.916658
b: 0.278685

Cost: 786.193115
****************************************
****************************************
Iteration Number 9
****************************************

Before gradient computation
----------------------------------------
W: 9.916658
b: 0.278685

After gradient computation
----------------------------------------
W: 9.916718
b: 0.276695

After gradient projection
----------------------------------------
W: 9.916718
b: 0.276695

Cost: 786.189087
****************************************

As we can see from the output, the clipping operation ensures that W and b remain non-negative.

Plotting the results

In [17]:
plt.plot(train_X, train_Y, 'ro')
pred_Y = np.multiply(train_X, learnt_W)+learnt_b
plt.plot(train_X, pred_Y, 'b')
plt.title("Y = {:0.2f}X + {:0.2f}".format(learnt_W[0, 0], learnt_b[0]))
Out[17]:
<matplotlib.text.Text at 0x112d846d0>