Tensorflow is an open source machine learning (ML) library from Google. It has particularly became popular because of the support for Deep Learning. Apart from that it's highly scalable and can run on Android. The documentation is well maintained and several tutorials available for different expertise levels. To learn more about downloading and installing Tesnorflow, visit official website.
To scratch the surface of this incredible ML library, we will implement Linear Regression to predict prices of houses in Boston area. The Boston dataset is available at UCI Machine Learning Repository. It is also available in R and scikit-learn. If you don't know the maths behind Linear Regression, some great resources are provided at the end of this tutorial.
Let's get started.
Firstly, we will import the required libraries which includes: TensorFlow, Numpy and Matplotlib.
Now, lets define some methods for preparing the dataset for Linear Regression model training. The first method named
read_dataset can be used to read text (e.g. csv or tsv) to Numpy array. Second method
read_boston_data is more specific to this tutorial. It will read Boston dataset from scikit-learn library and will return features and labels separately. The third method
feature_normalize will be used to normalize the features of the dataset. Feature normalization is an important pre-processing step, as features on different scale can easily mislead the learning process. The last method
append_bias_reshape is used to append bias term (all 1's) to the normalized features. It corresponds to
b term in equation of straight line. Moreover, this method will reshape the features so that it will be feed to TensorFlow library. If you are new to Python, I would say pay close attention to the shape of Numpy arrays. This thing can easily go wrong and sometimes take considerable time for debugging.
We will now call the above defined methods and divide the dataset into two parts for training and testing. Using cross validation is also an option, but I am leaving this to you to try it out.
TensorFlow has it's own data structures for holding features, labels and weights etc. Lets define those including some variable required to hold important data related to Linear Regression algorithm.
Linear Regression implementation is pretty straight forward in TensorFlow. Three lines of code is all that is required. First line will multiply features matrix to weights matrix and can be used for prediction. The second line is cost or loss function (squared error of regression line). Finally, the third line perform one step of gradient descent optimization to minimize the cost function.
We have all the required parts of the code to implement Linear Regression. One missing detail is training of Linear Regression model. Following code will do this for us:
Now we have a trained Linear Regression model. We are now able to make predictions on unseen data. But first plot the cost as a function of number of iterations. It's pretty straight forward using Matplotlib.
Make some predictions on test dataset and calculate mean square error (it should be around 22.23). The plot below also showing the original and predicted prices for houses and the learned regression line.
When you are done working with TensorFlow session, do call
sess object as
I hope you liked this tutorial. If you want to learn more about maths behind Linear Regression or ML in general, do check the following resources:
- Linear Regression on Khan Academy
- Machine learning course on Coursera by Andrew Ng
- Introduction to Linear Regression Analysis by Douglas Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining
If you have any question or feedback, please do reach out to me by commenting below.