In my article Gender Classification, I explained how I trained a basic a NN to classify faces based on gender using TensorFlow. In this article, we will look at how to take the trained model, and convert it into a program which classifies all faces on a live camera feed. To introduce tensorflow-serving, we will set up a tensorflow-serving server to serve our NN model. Our program will then use this server for classification. Complete code can be found here.


We will be exporting the model trained in Gender Classification by adding few lines to the code. Then, we will setup tensorflow-serving server to serve this model using docker ( you can also set it up on your system instead of using docker ). Next, we will write a code to detect faces on a live camera feed using OpenCV and classify them as male or female using our NN. Since the focus of this article is to look at tensorflow-serving and classify faces, we won’t cover face detection in depth and will use a basic pre-trained OpenCV classifier instead.

You can find the complete code here. I will not go through the code for training the NN as I already did that in Gender Classification.


To understand this article, you should already know or familiar with:

  • Python 2.7
  • virtualenv in python
  • tensorflow
  • Neural Network (NN) and backpropagation
  • Estimators and its specifications (EstimatorSpec) in TensorFlow

Also, I will encourage you to have a look at Gender Classification, as most of above points are covered there.


Apart from requirements covered here, we will need:

  • Docker
  • OpenCV for python (version
  • tensorflow-serving-api for python (version 1.6.0)
  • grpcio for python (version 1.10.1)

To install Docker please visit Docker’s website and follow the instructions for your system. Make sure docker is running on your system.

To install python requirements, please run following commands inside your python virtualenv:

(venv)$ pip install opencv== tensorflow-serving-api==1.6.0 grpcio==1.10.1

Exporting the model

For exporting the model, we need to define what output our final model needs to return. This has to be done while defining the model. Then we can train our model, and export it into a directory. This directory will be used by tensorflow-serving to serve our model.

Defining export outputs

We already defined the export outputs when we defined our model function by defining export_outputs in prediction specifications:

if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions,
                                              "prediction": tf.estimator.export.ClassificationOutput(

TensorFlow provides following classes for defining export outputs:

  • tf.estimator.export.ClassificationOutput
  • tf.estimator.export.PredictOutput
  • tf.estimator.export.RegressionOutput

All of the above are a subclass of tf.estimator.export.ExportOutput. I will recommend using ClassificationOutput or RegressionOutput than more general PredictOutput class wherever possible.

The export outputs have to be defined using one of the above classes. You can return multiple outputs if required.

Since our model is doing a classification, we returned tf.estimator.export.ClassificationOutput and named the output as prediction. We passed predictions["prob"] as scores which contains the probabilities/scores of each class (male and female).

Exporting the model into a directory

Once we have trained our model and we are satisfied with its performance, we can export it by running the following code:

                                    "x": tf.FixedLenFeature(shape=[25,25], dtype=tf.float32)

This will export the model in directory models/NN/. Each export call saves the model with a timestamp. When we serve a model, tensorflow_serving picks the latest model based on this timestamp. We also need to define what inputs our model accept by providing tf.estimator.export.build_parsing_serving_input_receiver_fn({"x": tf.FixedLenFeature(shape=[25,25], dtype=tf.float32)}) as a parameter. Here we are telling TensorFlow that we will send an input as x, which will have a shape of [25,25], and datatype of float32. tf.estimator.export.build_parsing_serving_input_receiver_fn tells the TensorFlow that we will send the input as a serialized string according to TensorFlow’s Example protocol. This might seem confusing, but this is the easiest way to set up our model.

Serving the model

For serving the model, we need tensorflow-serving. Instead of setting it up, we will be using a docker image jagpreets/tensorflow-serving. Please ensure that you have docker running on your system. This docker image has everything ready and you just need to specify the port number and exported model directory while running it:

docker run -d -p <port>:9000 -v <exported model directory>:/model jagpreets/tensorflow-serving:v0

This will download the docker image if it doesn’t exist and run it. This might take time depending on your internet speed. If you want to see the corresponding dockerfile, you can find it in my GitHub repository here

Replace <port> with the port number, and <exported model directory> with complete path to your exported model directory. For example, I wanted to run it at port number 9000, so for me the command was docker run -d -p 9001:9000 -v /Users/jagpreet/live_gender_classifier/models/NN:/model jagpreets/tensorflow-serving:v0.

Note: If you use a port number other than 9000, please update the code in next section accordingly.

After running this docker, we will have the model being served at localhost:9000 or any other port that you chose. It is a grpc server, and we can use this service using grpcio.

Creating the live classifier

We will be using OpenCV to take frames/snapshots from the camera and to detect faces. We will then resize detected faces and send them to our NN which is being served by tensorflow-serving. Once we get our predictions, we will add them to the image as boxes and text, and display the image ( for example, see the image on the top ). We will keep repeating this to get a live camera feed with our predictions. The complete code for this can be found here.

A note on code design

Since it might take time for NN to return the predictions ( depending on your system and complexity of NN ), running this sequentially might result in a very laggy output and stuttering of camera feed. To fix this, we will be using multiprocessing to create 2 separate processes. One process will be responsible for taking snapshots and displaying them, other will be responsible for detecting gender using our NN. We will be using a shared Queue for communicating between them. We will send a new image for detection only when we get predictions from the previous image, till then we will display the last prediction only. This way, even if our NN take time, the camera feed will appear smooth and most of the times people won’t even notice the lag.

Note: We are using multiprocessing instead of threads because in python threads do not give real parallelism as they are unable to utilize the available processors completely ( python always run on a single processor due to GIL as explained here). multiprocessing uses separate processes instead of threads, effectively bypassing this restriction. As a side-effect we need to use special data structures, such as a shared queue, to share data between processes.

Importing requirements

We will be using OpenCV cv2 for taking and manipulating images, multiprocessing for running 2 processes as explained above, tensorflow_serving.apis for creating a request for our NN served by tensorflow_serving, and grpc for requesting output from tensorflow_serving.

import cv2
from multiprocessing import Process, Queue

from tensorflow_serving.apis import prediction_service_pb2
from tensorflow_serving.apis import classification_pb2
from grpc.beta import implementations

Defining the model for face detection

We will be using a pre-trained cascade classifier model, which is provided by OpenCV, to detect faces. The pre-trained classifier is provided as an XML file, you can get it from my GitHub repository here.

To load the classifier we run:

face_cascade = cv2.CascadeClassifier('./models/opencv/haarcascade_frontalface_default.xml')

Make sure the XML file on your system. Change ./models/opencv/haarcascade_frontalface_default.xml to the location of the XML file if required.

Note: As the name of the XML file suggests, this model only detects the front view of a face and will not detect a face if you are looking even slightly left/right or up/down. We are not looking at more robust face detection as the NN we are using to classify faces is trained only on front view of faces, so this model perfectly satisfies our requirements.

Requesting a prediction from tensorflow_serving

We define a function which takes a image of face as input ( size should be 25X25 ), and returns the prediction classifying the image as male or female.

def get_prediction(img):
    channel = implementations.insecure_channel("localhost", 9000)
    stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
    request = classification_pb2.ClassificationRequest() = "default"

    example = request.input.example_list.examples.add()

    result = stub.Classify(request, 5.0).result.classifications[0]  # 5 secs timeout

    return "male" if result.classes[0].score < result.classes[1].score else "female"

The code is straightforward. First few lines create a connection with tensorflow_serving, which is running at port 9000 ( inside our docker ) and creates a request object. We also specify which model to use for prediction by defining Our tensorflow_serving is serving only one model, and it is named as default ( tensorflow_serving name the model as default unless we specify a different name ).

Next, we define our input in form of a example object ( example is a protocol defined by tensorflow for serializing data ). We add a new example object request.input.example_list.examples.add(), and define our inputs x as float by flattening the image array ( converting it into 1D array ) example.features.feature['x'].float_list.value.extend(img.flatten()).

Then, we simply classify the input by calling stub.Classify. This returns a response object. Our result will be available in result.classifications in response, which will be an array and we need only the first element. We can check scores/probabilities of each class i from result.classes[i].score. Our function return male or female depending on which probability is greater.

Detecting faces in an image

We define a function which takes a single channel gray scale image, and returns coordinates of all detected faces as a tuple ((top_left_x, top_left_y), (bottom_right_x, bottom_right_y)).

def get_faces(img_gray):
    faces = face_cascade.detectMultiScale(img_gray, 1.3, 5)
    detected_rect = []
    for (x, y, w, h) in faces:
    return detected_rect

The code is straightforward. We call face_cascade.detectMultiScale ( face_cascade was defined above ), it returns all faces as (top_left_x, top_left_y, width, height). We then convert it into required format, and return all coordinates.

Prediction process

Now we define a function which will be run on our prediction process ( created using multiprocessing ).

def prediction_thread(inp_q, pred_q):
    while True:
        inp = inp_q.get()
        if inp is None:
        img, faces = inp
        predictions = []
        for i in faces:
            inp = cv2.resize(img[i[0][1]:i[1][1], i[0][0]:i[1][0]], dsize=(25,25))
            predictions.append((i, get_prediction(inp)))

This function takes 2 parameters:

  • inp_q: The shared queue will be used to send input to this process. Each element is expected to be a tuple containing an image and the coordinates of detected faces.
  • pred_q: The shared queue where we will send our predictions. The main process will read predictions from this queue.

We will then read inputs img, faces from inp_q and push predictions in pred_q in a loop, until we get None as input which signals the process to stop. img is a single channel gray scale image. Each coordinates in faces is a tuple containing 2 tuples ((top_left_x, top_left_y), (bottom_right_x, bottom_right_y)) ( same as one returned by function get_faces in above section ).

We iterate on all face coordinates that we got as input faces. For each face we get the image of the face using those coordinates img[i[0][1]:i[1][1], i[0][0]:i[1][0]], resize the image into 25X25 size cv2.resize, get prediction et_prediction(inp) and put it in a list predictions.append along with coordinates. Then we push all predictions in pred_q. These will be read by main process, which will then send another input into inp_q.

Main process

Now the last part of the code, our main function:

def main():
    cap = cv2.VideoCapture(0)
    inp_q = Queue(2)
    pred_q = Queue(2)
    pred_t = Process(target=prediction_thread, args=(inp_q, pred_q))
    last_pred = []
    while True:
        _, img =
        img = cv2.resize(img, fx=0.5, fy=0.5, dsize=(0, 0))
        if not pred_q.empty():
            last_pred = pred_q.get()
            img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            faces = get_faces(img)
        for i in last_pred:
            pos = i[0]
            pred = i[1]
            cv2.rectangle(img, pos[0], pos[1], color=(255,0,0), thickness=2)
            text_x = (pos[0][0] + pos[1][0]) / 2
            text_y = (pos[0][1] + pos[1][1]) / 2
            cv2.putText(img, pred, (text_x, text_y), cv2.FONT_HERSHEY_SCRIPT_SIMPLEX, 0.8, (0,0,255), thickness=2)
        cv2.imshow("camera", img)
        if cv2.waitKey(1) == 27:

We first defined our video source as camera cap = cv2.VideoCapture(0). Then we defined our shared queues inp_q and pred_q. Next, we defined a separate process pred_t = Process(target=prediction_thread, args=(inp_q, pred_q)) and started this process pred_t.start(). This process will run the function prediction_thread that we defined above. Next we start a loop while True: which will take live feed from camera and show our predictions.

In the loop, we will take image from camera _, img = and resize it img = cv2.resize(img, fx=0.5, fy=0.5, dsize=(0, 0)) ( since we don’t need to process and show large image ). If we have some predictions in the pred_q, we will get those predictions and put them in last_pred last_pred = pred_q.get(), and send the current image ( as single channel gray scale ) cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) and faces in this image faces = get_faces(img) as input for prediction process in inp_q inp_q.put((img_gray,faces)).

We then take our last predictions last_pred, draw them as rectangle for faces cv2.rectangle(img, pos[0], pos[1], color=(255,0,0), thickness=2) and text for predicted gender cv2.putText(img, pred, (text_x, text_y), cv2.FONT_HERSHEY_SCRIPT_SIMPLEX, 0.8, (0,0,255), thickness=2). We show this on screen using cv2.imshow("camera", img) and if user is pressing escape if cv2.waitKey(1) == 27 we break the loop.

Before exiting, we tell prediction process to stop by sending None inp_q.put(None), wait for it to stop pred_t.join(), release the camera cap.release() and close the display window cv2.destroyAllWindows().

Now we only thing left is to run out main function:

if __name__ == "__main__":

This wraps up our code. You can now run this python script and see it identify the faces as male or female on live camera input.

Next Steps

To identify faces, we used a basic OpenCV model which can only detect front view of faces. Since before classifying faces as male or female we need to detect faces first, using a better and more robust face detection method will significantly improve the performance of this application. We can also use a NN trained to detect faces instead of OpenCV model. Also, instead of using 2 different models to detect faces and then classify them, we can train a single model which directly detect male faces and female faces. YOLO NN is state of the art when it comes to detection. Feel free to experiment with it.