Hi there!! This post is supplementary to the presentation hosted here and all the codes can be found here.
TensorFlow (TF) is an open-source numerical computation python library maintained by Google Inc. It has enabled long-standing powerful deep learning techniques accessible for both production and research. Owing to its strong community backing and flexible design it has become the mainstay in the AI fraternity. In this post, I’ll walk you through the nitty-gritty of getting started with TF. The audience is expected to be either completely new or beginners to it and who wants to use it either for making any product or do research. So, let’s get started.
Before getting into the main content, let me first explain how we are going to go about getting acquainted with the nuts and bolts of Tensorflow. The entire write-up can be divided as
Every now and then, you’ll be asked to answer a quiz question and/or summarize your understanding in few lines.
The solutions to these quiz questions are there at the end of this blog post. But don’t scroll down to see the answer without giving your gray cells a stretch. This is for your own benefit, and so unless you have pretty strong reasons to not seek the best, go on and scroll straight away.
Summarizing in a few sentences will help you out to recall the main takeaways and help you feed them into your long-term memory. This critical way of recollecting anything is a scientifically proven way of how to learn anything smartly. You can read more about it here and here. You can either post your summaries here as a comment or discuss with someone else. This will help you to have more outcome from today’s talk.
We’ll start by looking at the significance of TensorFlow in the ML demographics first. We’ll then skim over at least six ways of making use of TensorFlow either directly or indirectly. We won’t go into detail of most of these but useful links are provided at the end for those who would like to delve into those. Next, we’ll see the premises upon which this write-up is based upon. This will help you to put everything into perspective. The fourth part will make use of higher-level APIs which are easier to use but don’t give a lot of control on your Machine Learning (ML) architecture. Hence, it is not suitable for research. The fifth chapter will again solve the same problem as in the second chapter but this time will make use finer level controls in TF. Albeit its steep learning curve, this is an extremely useful part to know as an ML enthusiast. We’ll end it by looking at what should one know next about TensorFlow that we didn’t cover today.
Part 1:: TensorFlow(TF) in today’s ML landscape
TensorFlow is made and maintained by Google. It’s an open-source numerical computational library. It has a range of features that it provides in its framework for Machine Learning (Deep Learning especially) based production and research that it has garnered attention from one and all. TensorBoard, deployment capabilities, multi-device support, and a huge community support are few of its positive traits.
This graph from Google Trends speaks for how prevalent TensorFlow has now become amongst its other counterparts.
Part 2:: Ways to use TensorFlow
There are multiple ways to use TensorFlow either directly or indirectly. At least six of the ways that I am aware of are:
- As a Keras Backend.
- Using Keras wrapper to call TF functions as its high-level library.
- Using tensorflow.contrib.keras
- Using TFLearn as a high-level TF library
- By using TF’s high-level functions.
- By manually defining TF’s computation graph and setting a session to run it.
Subjectively speaking, the ease of use decreases from top to bottom while the flexibility of usage increases. So there is always going to be a trade-off between how deeply you would want to play with your models and how quickly and easily you would like to set things up. Usually, ML researchers like to go with the last option as it gives them the maximum freedom to introduced well-tailored components in their models.
We’ll only cover the last two ways of using TensorFlow. There are links provided at the end for those who would like to delve into them as well.
Part 3:: The premises for this tutorial
Demo problem to be tackled:
For the purpose of talk today, we’ll solve the problem of hand-written image recognition task using DL Architectures whose dataset is hosted as MNIST (Modified National Institute of Standards and Technology”) on Yann LeCunn’s website.
MNIST has served as a classic dataset for benchmarking different algorithms mainly for computer vision tasks over last few years. The task is to determine digits from handwritten digits.
Abstracting ML model development:
The process of finding an acceptable ML model for solving any task entails an iterative process of three tasks –
- Making INFERENCE,
- determining LOSS, and
- UPDATING the parameters to make better inferences.
It is to be noted that by the term parameters, we mean weights and biases of any DL architecture.
This holds true invariably for all ML tasks. Likewise, we’ll see in our next two parts how our entire focus would be in designing ways in TF to substantiate this abstraction of three steps.
Quiz:
This brings us to our first quiz question. Give your best shot to answer this question. Match the rows in columns 1 with the rows in column 2.
Summary:
Answer this to be best of your knowledge.
Can the combination of the three abstract steps described above called as an OPTIMIZATION?
Part 4:: TensorFlow using not too flexible high-level library
Its code is here.
TF high-level API is tf.contrib.learn :: helps to
- manage datasets,
- estimators,
- inference, and
- training.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns, hidden_units=[HIDDEN_LAYER_1, HIDDEN_LAYER_2], n_classes=NUM_CLASSES, model_dir="./model_data")
classifier.fit(input_fn=get_train_inputs, steps=2000)
Well, these two lines of code encapsulate all those three steps that we talked about. Albeit, its relative ease of use, it doesn’t give enough flexibility to researchers.
We’ll see in the next chapter how TF also gives enough power to researchers by giving ways to tune every single detail of the ML architecture.
Summary:
Answer this to finish up this part.
How would you customize the inference step as highlighted in the image on the above demonstration to incorporate a new and never used before activation function, say h = y(x, weights, biases)?
Chapter 3:: Finer controls in TF
For understanding the finer controls in TF, we need to understand the mechanism that runs under the hood. Well, what happened in the last demo, TF constructed a highly computationally efficient computation graph on top of which it performs all its calculations. So, essentially TF mechanism follows two steps:
- Building a computational graph
- Running the computational graph
Let’s take the two different but intertwined processes of computational graph generation and computations separately, quoting the exact lines from the demonstration MNIST code .
Computational Graph Generation:
Like any other computational graph, this graph is also composed of nodes. These nodes make use the following functional units to accomplish its different goals.
- Tensors::
It is the most fundamental piece shaped as an array inside TF. Every node in the TF graph does input-output through Tensors. A few examples (taken from https://www.tensorflow.org/get_started/get_started) would be:-- 3 # a rank 0 tensor; this is a scalar with shape []
-- [1. ,2., 3.] # a rank 1 tensor; this is a vector with shape [3]
[[1., 2., 3.],
-- [4., 5., 6.]] # a rank 2 tensor; a matrix with shape [2, 3]
-- [[[1., 2., 3.]],
[[7., 8., 9.]]] # a rank 3 tensor with shape [2, 1, 3]
- Constants:
It is a type of node whose value doesn’t change over time. It can be defined as
constant_node = tf.constant(3, tf.int64)
where, the first argument is the value that we want the constant node to hold, and the second argument is its datatype. tf.float32 is the default data type of the constant nodes.
- Placeholders:
Placeholders are means to accept inputs into the computational graph from the user. They can be defined as
placeholder_node = tf.placeholder(tf.float32, shape=[None, 10])
It is to be noted that the second argument in this example shape is optional. But because this is important, I have put it here. The example says the shape of the input should be a 2D array that can contain any number of rows, but each row should contain exactly ten columns. Note that None here means any number of rows is acceptable as input. Can you think of any reason why this might be useful?
These are the lines from our MNIST demonstration code.
# Allocating nodes in the computational graph to accept inputs images = tf.placeholder(tf.float32, shape=(None, IMAGE_PIXELS)) true_labels = tf.placeholder(tf.int32, shape=(None, NUM_CLASSES))
- Variables:
These are the nodes that contain the parametric values of any model. Remember that the parameters of a DL model are its weights and biases. Hence, we define our weights and biases (in fact any new unconventional parameter that you want to test) nodes like the following (from https://www.tensorflow.org/get_started/get_started)
weight = tf.Variable([0.3], tf.float32)
bias = tf.Variable([-0.3], tf.float32)
Note that every variable should be given an initial value as given as first argument in the example. It is also worth noting here that all variables are initialized in the computational graph separately using tf.global_variables_initializer() function which we will be covering in a few moments.
These are the lines from our MNIST demonstration code.
# Defining the first hidden layer weights_layer_1 = tf.Variable(tf.truncated_normal([IMAGE_PIXELS, HIDDEN_LAYER_1], stddev=1.0 / math.sqrt(float(IMAGE_PIXELS)))) biases_layer_1 = tf.Variable(tf.zeros([HIDDEN_LAYER_1])) hidden_output_1 = tf.nn.relu(tf.matmul(images, weights_layer_1) + biases_layer_1) # Defining the second hidden layer weights_layer_2 = tf.Variable(tf.truncated_normal([HIDDEN_LAYER_1, HIDDEN_LAYER_2], stddev=1.0 / math.sqrt(float(HIDDEN_LAYER_1)))) biases_layer_2 = tf.Variable(tf.zeros([HIDDEN_LAYER_2])) hidden_output_2 = tf.nn.relu(tf.matmul(hidden_output_1, weights_layer_2) + biases_layer_2) # Defining the outputs weights_output = tf.Variable(tf.truncated_normal([HIDDEN_LAYER_2, NUM_CLASSES], stddev=1.0 / math.sqrt(float(HIDDEN_LAYER_2)))) biases_output = tf.Variable(tf.zeros([NUM_CLASSES])) prediction = tf.matmul(hidden_output_2, weights_output) + biases_output
Remember the three abstract steps that we claimed comprised the development of any ML model. Here is it for your reference again.
Before moving onto how to run computations on the computational graph, let’s break down our demonstration MNIST code to see how we achieved designing these three steps.
Shown below are the portions of our demonstration MNIST code with their appropriate mapping to the steps they fulfill as highlighted in the images.
-
# Defining the first hidden layer weights_layer_1 = tf.Variable(tf.truncated_normal([IMAGE_PIXELS, HIDDEN_LAYER_1], stddev=1.0 / math.sqrt(float(IMAGE_PIXELS)))) biases_layer_1 = tf.Variable(tf.zeros([HIDDEN_LAYER_1])) hidden_output_1 = tf.nn.relu(tf.matmul(images, weights_layer_1) + biases_layer_1)
# Defining the second hidden layer weights_layer_2 = tf.Variable(tf.truncated_normal([HIDDEN_LAYER_1, HIDDEN_LAYER_2], stddev=1.0 / math.sqrt(float(HIDDEN_LAYER_1)))) biases_layer_2 = tf.Variable(tf.zeros([HIDDEN_LAYER_2])) hidden_output_2 = tf.nn.relu(tf.matmul(hidden_output_1, weights_layer_2) + biases_layer_2)
# Defining the outputs weights_output = tf.Variable(tf.truncated_normal([HIDDEN_LAYER_2, NUM_CLASSES], stddev=1.0 / math.sqrt(float(HIDDEN_LAYER_2)))) biases_output = tf.Variable(tf.zeros([NUM_CLASSES])) prediction = tf.matmul(hidden_output_2, weights_output) + biases_output
-
# Evaluating the loss function loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(labels=true_labels, logits=prediction))
-
# Updating the parameters optimizer = tf.train.GradientDescentOptimizer(0.01) training = optimizer.minimize(loss)
print(placeholder_node)
this gives as output
Tensor("Placeholder_1:0", shape=(?, 10), dtype=float32)
sess = tf.Session()
sess = tf.InteractiveSession()
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(constant_node))
linear_prediction = weight * placeholder_node + bias
print(sess.run(linear_prediction, {placeholder_node:[[1],[2],[3]]}))
[[ 0. ] [ 0.30000001] [ 0.60000002]]
for train_step in range(2000):
batch = mnist.train.next_batch(BATCH_SIZE)
if train_step % 100 == 0:
loss_value, _ = sess.run([loss, training], {images: batch[0], true_labels: batch[1]})
print('Loss at ', str(train_step), ' training step is ', str(loss_value))
Answer this to finish up this part.
What are the possible tuning options you feel you have with this level of granular control on ML architectures using TF?
End Notes:
This guide will help you getting started with TensorFlow. However, TF is way richer than what has been covered here. Also, given the fact now Keras will be merged into TF core as a high-level API, it makes more sense than ever to take a look at Keras as well.
In the end, I would say to complete the few liner summaries as it will help you to retain most of what you have gone through. As far as implementing anything in TF is concerned, first, take a close look at the code given in the repo. Then try to code as much as you can without looking it. Of course, do take a look if you can’t remember the stuff at all. Also, don’t hesitate to reach out in case you stumble upon any seemingly unsolvable task or doubts or concerns. I’ve shared all my social media links that I use frequently.
Thank you, everyone, for your time and happy coding!!
- My personal recommended website to start learning DL.
- Softmax
- Stanford’s Course on TensorFlow.
- How to use TensorFlow in object-oriented style.
- TFLearn
- Keras to call TF functions without using it as its backend.
- Email — sttsanjay@gmail.com
- https://sanjaykthakur.wordpress.com/
- https://twitter.com/WanabeMarkovian
- https://www.facebook.com/sttsanjay
Slides and codes are available here.
Answers:
Quiz 1:
1 – b, 2 – a, 3 – c