TensorFlow provides an amazing framework to quickly set up, train and deploy a Machine Learning model. In this article, we will look at all its awesomeness as I go through one of my basic projects which uses TensorFlow to classify a face as a male or female. Later, I used the trained model to create a deployable system and used it to classify faces on a live camera feed. You can read more about it here.

Outline

We will look at how to build a classic feed-forward network using TensorFlow’s estimator API and train it to classify faces as male or female. For sake of clarity and keeping it concise, I will not go into basics of python programming or Neural Networks. To easily follow the content, you should:

  • be familiar with programming, and know at least basics of python
  • know what neural networks are, and what backpropagation is
  • know what Gradient Decent or optimizer is

I will go through some basics of the setup for anyone who is getting started and wants to follow my code.

Getting Started

For this project, I used a python virtual environment (python 2.7) with tensorflow, numpy, and scikit-image ( you can also use any other library to read and show the image like scipy, PIL or OpenCV etc. ). It’s a good idea to use virtualenv, and have a separate python setup for each project to avoid conflicting dependencies and any issues due to system-wide installed packages. You can easily install virtualenv using pip (python’s package manager).

$ pip install virtualenv

We can create a new virtual python environment using this command:

$ virtualenv  ~/mf_classifier/venv

You can use any other directory name and path instead of ~/mf_classifier/venv. If you choose to use some other directory, please change the path in all commands in this article accordingly.

Now we can “activate” the newly created virtualenv. Once you activate the virtualenv in a terminal, the current terminal session or window will start referring to python inside ~/mf_classifier/venv instead of python installed in the system. This effectively creates a separate python environment which will be different from python installed on the system. To activate the virtualenv, we can run:

$ source ~/mf_classifier/venv/bin/activate

Notice that after running the command, we can see (venv) (or any other directory name you chose earlier) in-front of prompt in the terminal, indicating that the virtualenv is currently active. With virtualenv active, we will install all our requirements:

(venv)$ pip install numpy==1.14.2 tensorflow==1.6.0 scikit-image==0.13.1

Feel free to use the newer versions, but you might need to make few changes to code specified in this article if the newer versions are not backward compatible.

I recommend using jupyter (ipython) notebook to run and experiment with our neural network code. This makes it really easy to experiment with parameters and re-run without running the whole code. Also, the training code in my git repository is an ipython notebook, and you will need ipython/jupyter if you want to open it. To install the jupyter run:

(venv)$ pip install jupyter

You can run the notebook using command jupyter notebook.

This wraps up our setup. Let’s start working on our project :)

Training and Evaluating the Neural Network (NN)

We are going to train a simple feed-forward Neural Network to classify faces as male face or female face. It is a supervised learning algorithm, so the first thing you need is labeled data. You can get the data from my GitHub repository for this project. It is organized in 2 folders, one for male and one for female, so it is a labeled dataset.

We will go through the code in parts, but you can see the complete notebook here.

Importing requirements

Lets import requirements for our NN and data.

import os
import random
import tensorflow as tf
import numpy as np
from skimage.io import imread, imshow

We will be using os to list all files in our dataset directory, random to shuffle the dataset, skimage (scikit-image) for reading and displaying image, and numpy and tensorflow for manipulating data, creating NN, and training.

Reading images and splitting the dataset

Now we need to read all images and create test and train inputs and labels

random.seed(1234)

X, Y = [], []
dir_ = {'data/male/': 1, 'data/female/': 0}
for d, y in dir_.items():
    for i in os.listdir(d):
        f = os.path.join(d, i)
        if os.path.isfile(f) and i.endswith(".png"):
            img = imread(f)/255.0
            X.append(img)
            Y.append(y)

tmp = list(zip(X, Y))
random.shuffle(tmp)
X, Y = zip(*tmp)
test_size = int(0.05 * len(X))
X_test, Y_test = np.array(X[-test_size:], dtype=np.float32), np.array(Y[-test_size:])
X_train, Y_train = np.array(X[0:-test_size], dtype=np.float32), np.array(Y[0:-test_size])

We are first setting a seed for the random module by calling random.seed(1234). A seed is a number which defines how random number is generated. Passing same seed will always produce same order of random numbers. For example, after setting seed to 1234, the first call to random.random() always returns 0.4407325991753527, and the second call always returns 0.9109759624491242. Even if we are using random numbers, constant seed ensures that the output and code behavior will be exactly same across multiple runs on any system. Here we want the random.shuffle(tmp) to shuffle the list in exactly the same way no matter how many time we run it, so we set a constant seed.

Next, we iterate each directory ( both male and female ) and read all the images ( files ending in .png ) and store it in X and corresponding label in Y ( 1 for male and 0 for female ).

Note: Label needs to be a number. If you want to use strings you will need to convert them into numbers before using them inside your model ( you can use feature columns for conversion )

Next, we shuffle them. To shuffle, we first combine the input and labels using list(zip(X, Y)). This gives us a list. We shuffle this list using random.shuffle and then split it back into inputs X and labels Y using zip(*tmp). We then split the data into test set and train set, converting it into numpy arrays ( test size is only 5% since we have very less data for training, ideally we should have more than 20% )

Defining our NN

There are several ways we can create NN or other ML models in TensorFlow. You can use low-level APIs and manually construct an NN or use high-level APIs like Estimators and Karas. We will be using estimators for this project. Estimator is a general API to create any model which “estimates” a function and try to predict a value. For the purpose of learning and understanding, we will be building a custom estimator and will define our NN, layer by layer, by leveraging TensorFlow’s layer API. Once we know how to make a custom estimator, we can create any type of Neural Network. But for quick implementation of simple or common tasks, TensorFlow provides us some pre-made Estimators. The NN in this article can also be made using pre-made DNNClassifier estimator. You can visit the official TensorFlow website here to learn more about estimators and how to use pre-made estimators. You can also visit here to view an official tutorial on using custom estimators.

To define custom estimator, we need to define a function that explains our NN or ML model, and defines how to train and evaluate our model or predict the values. We define our function classifier_model_fn as:

def classifier_model_fn(features, labels, mode):
    # Input Layer
    input_layer = tf.reshape(features["x"], [-1, 25*25])
    layer_1 = tf.layers.dense(inputs=input_layer, units=200, activation=tf.nn.relu)
    layer_output = tf.layers.dense(inputs=layer_1, units=2)
    predictions = {
        "prob": tf.nn.softmax(layer_output),
        "output": tf.argmax(input=layer_output, axis=1)
    }
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions,
                                          export_outputs={
                                              "prediction": tf.estimator.export.ClassificationOutput(
                                                  scores=predictions["prob"]
                                              )
                                          })

    # Calculate Loss (for both TRAIN and EVAL modes)
    # out_clipped = tf.clip_by_value(layer_output,1e-10,1e10)
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=layer_output)

    # Configure the Training Op (for TRAIN mode)
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

    # Add evaluation metrics (for EVAL mode)
    eval_metric_ops = {
        "accuracy": tf.metrics.accuracy(
        labels=labels, predictions=predictions["output"])}
    return tf.estimator.EstimatorSpec(
        mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

Let’s break down the code.

Layers of neural network:

The code is pretty straightforward. Our input is a 25X25 image, so we first reshaped it using tf.reshape(features["x"], [-1, 25*25]). Here -1 means the dimension is calculated automatically. If we send 3 images as input, the input becomes [3, 25*25], if we send a single image, it becomes [1, 25*25].

Next, we defined our first layer using tf.layers.dense(inputs=input_layer, units=200, activation=tf.nn.relu). We defined it as a dense layer, which means its a fully connected layer, that is every neuron in this layer is connected to every input node. We passed our reshaped inputs, set number of neuron units as 200, and activation as relu. Activation is the function used to calculate output from weighted sum of inputs. Relu can be defined as \(f(x) = max(0, x)\). Instead of relu, you can also try \(tanh(x)\) or classic sigmoid \({1}/({1+e^{-x}})\).

Next we define our output layer tf.layers.dense(inputs=layer_1, units=2). We have 2 neurons, one tries to identify males and other females. Notice how we didn’t specify any activation in this layer. This is because we need only weighted sum from the last layer as we are applying softmax activation later ( in predictions dictionary ).

Next we define the outputs we need in predictions dictionary. We will output the predicted class by checking which node/neuron gives max value tf.argmax(input=layer_output, axis=1) and probabilities by applying softmax activation tf.nn.softmax(layer_output) ( softmax converts the inputs in range (0,1) such that the sum of all nodes = 1, so it can be treated as probabilities )

Pipeline specifications:

Next, we need to define the specifications of our model, that is, how our model works in different modes. There are 3 modes an Estimator can be run in:

  • Train
  • Eval
  • Predict

The mode is automatically passed by TensorFlow as a parameter to our model function. We will check what mode is estimator being run in, and will accordingly return an EstimatorSpec instance. This instance will be used by TensorFlow to define and run our model. EstimatorSpec class provides us a way to define all the specifications of the model for a specific mode.

If the estimator is running in predict mode if mode == tf.estimator.ModeKeys.PREDICT, we tell TensorFlow what we want to predict by specifying predictions parameter in tf.estimator.EstimatorSpec(mode=mode, predictions=predictions, export_outputs={"prediction": tf.estimator.export.ClassificationOutput(scores=predictions["prob"])}) , and then we return this instance. The export_outputs are used to tell TensorFlow what outputs we need from our final model once we export and run it using tensorflow_serving. We will look more into this later in another article.

If estimator is running in train mode if mode == tf.estimator.ModeKeys.TRAIN:, we tell TensorFlow how to calculate the loss by specifying loss parameter, and how to optimize or improve our model by specifying train_op parameter: tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op). Here our loss is tf.losses.sparse_softmax_cross_entropy, which is a standard way to compute loss if the output layer activation is softmax.

Note: In tf.losses.sparse_softmax_cross_entropy we pass our predictions as logits instead of actual predictions (probabilities). logits is the output of the last layer before applying softmax.

train_op is defined as optimizer.minimize(loss=loss, global_step=tf.train.get_global_step()), that is, we are telling TensorFlow to minimize the loss by using this optimizer. The optimizer we are using (tf.train.AdamOptimizer) is a variant of stochastic gradient descent, which is like a gradient descent with adaptive learning rate for each parameter. You can also try out vanilla gradient descent tf.train.GradientDescentOptimizer.

If estimator is not running in train mode or predict mode, that is, it is running in evaluation mode tf.estimator.ModeKeys.EVAL, we tell TensorFlow calculate to accuracy on our input tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops), where eval_metric_ops are the values we want to calculate eval_metric_ops = {"accuracy": tf.metrics.accuracy(labels=labels, predictions=predictions["output"])}.

Now we have our model and its specifications defined, so let’s create a model using these specifications.

Creating an Estimator

We can create an estimator defined using our model function classifier_model_fn by creating an instance of class tf.estimator.Estimator:

mf_classifier = tf.estimator.Estimator(
    model_fn=classifier_model_fn, model_dir="/tmp/mf_classifier")

We pass our model function as a parameter model_fn. Notice that we have also defined a model directory model_dir. This should be an empty directory (TensorFlow creates it, if it doesn’t exist). TensorFlow will be saving the progress of our estimator and checkpoints in this directory. This way we have the latest parameters of our model backed up. This makes it easy to continue the training from the last checkpoint or evaluate or use our model without the need to retrain it every time. This way we can even split our code in different python scripts or notebooks, as long as we are using same model function and model directory. One of the main use of model directory is to track the progress and analyze our model using tensorboard (Note: with estimators, you don’t need to merge summaries or create FileWriters for tensorboard, since estimators are doing it internally).

Note: Since our model and it’s parameters are saved in the model directory, you will need to delete or clear this directory every time you change the model or want to run the training from the start

Now we have our model, so lets train and evaluate our model.

Training the Estimator

Training the model is straightforward. First, we define our inputs x, expected outputs y and other training parameters such as the number of iterations/epochs and batch size:

train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": X_train},
    y=Y_train,
    batch_size=100,
    num_epochs=None,
    shuffle=True)

Here we are using tf.estimator.inputs.numpy_input_fn to directly create inputs from numpy arrays. We are using our train dataset as input for training.

Next, we train the model from the inputs:

mf_classifier.train(
    input_fn=train_input_fn,
    steps=8000)

As your model is being trained, you will see logs such as INFO:tensorflow:loss = 0.5272835, step = 101 (0.197 sec) to indicate current progress. Once model is trained, we can evaluate its performance using our test dataset.

Evaluating our Estimator

Evaluation is exactly same as training. The only difference is that we use test dataset instead of training dataset, and call mf_classifier.evaluate instead of mf_classifier.train and print the result:

eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": X_test},
    y=Y_test,
    shuffle=False)
eval_results = mf_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

Since we are evaluating and not training, we don’t need to pass training parameters such as num_epochs and batch_size. Once you run it, you will see output like this: {'loss': 0.30619115, 'global_step': 8000, 'accuracy': 0.8969697}

We got an accuracy of 89.69%. Let’s see what it predicts for a random test image.

Predicting Output

Lets see what image we have at index 104 in our training data, and which gender does our model predicts:

check_index=110
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": X_test[check_index]}, shuffle=False)
predict_results = mf_classifier.predict(input_fn=predict_input_fn, predict_keys=["output"])
imshow(X_test[check_index])
print(["male" if i["output"] else "female" for i in predict_results])

This will show you an image and print the prediction for that image as shown below:

prediction on a test image

Feel free to check random images.

Next Steps

What we created is a very basic model. Although it gives us decent accuracy, there is a lot of scope for improvements. You can try out following things and see if it gives you better results:

  • Try changing NN structure by adding/removing neurons or adding layers ( for more layers you might need more data )
  • Try a different NN architecture such as Convolution NN
  • Try using a bigger dataset
  • Try data augmentation techniques to increase size of dataset such as flipping some images, changing color, resizing etc.
  • Try using batch normalization, dropouts and other regularization techniques
  • Try transfer learning on already trained NN

I also used this model to make a realtime application which detect faces and classify them as male or female on a live camera feed. You can read more about it here.