Using pre-trained models
Let's now see the main components of using pre-trained TensorFlow object detection models for inference in the Python notebook. First, some key constants are defined:
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
NUM_CLASSES = 90
The notebook code downloads and uses a pre-trained object detection model, ssd_mobilenet_v1_coco_2017_11_17 (built with the SSD method, which we talked briefly in the previous section, on top of the MobileNet CNN model, which we covered in the previous chapter). A complete list of pre-trained models supported by the TensorFlow Object Detection API is at the TensorFlow detection model zoo: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md, and most of them are trained with the MS COCO dataset. The exact model used for inference is the frozen_inference_graph.pb file (in the downloaded ssd_mobilenet_v1_coco_2017_11_17.tar.gz file), which is used for off-the-shelf inference as well as retraining.
The mscoco_label_map.pbtxt label file, located in models/research/object_detection/data/mscoco_label_map.pbtxt, has 90 (NUM_CLASSES) items for the types of objects that the ssd_mobilenet_v1_coco_2017_11_17 model can detect. The first and last two items of it are:
item {
name: "/m/01g317"
id: 1
display_name: "person"
}
item {
name: "/m/0199g"
id: 2
display_name: "bicycle"
}
…
item {
name: "/m/03wvsk"
id: 89
display_name: "hair drier"
}
item {
name: "/m/012xff"
id: 90
display_name: "toothbrush"
}
We talked about Protobuf in step 3 earlier, and the proto file that describes the data in mscoco_label_map.pbtxt is string_int_label_map.proto, located in models/research/object_detection/protos, with the following content:
syntax = "proto2";
package object_detection.protos;
message StringIntLabelMapItem {
optional string name = 1;
optional int32 id = 2;
optional string display_name = 3;
};
message StringIntLabelMap {
repeated StringIntLabelMapItem item = 1;
};
So basically, the protoc compiler creates code based on string_int_label_map.proto and the code can then be used to efficiently serialize the data in mscoco_label_map.pbtxt. Later, when a CNN detects an object and returns an integer ID, it can be converted to the name or display_name for humans to read.
After the model is downloaded, unzipped, and loaded into memory, the label map file also gets loaded, and some test images, located in models/research/object_detection/test_images where you can add any of your own test images for detection test, are ready to be used. Next, appropriate input and output tensors are defined:
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
Again, if you're wondering where those input and output tensor names come from in the SSD model downloaded and saved in models/research/object_detection/ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb, you can use the following code in iPython to find out:
import tensorflow as tf
g=tf.GraphDef()
g.ParseFromString(open("object_detection/ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb","rb").read())
x=[n.name for n in g.node]
x[-4:]
x[:5]
The last two statements will return:
[u'detection_boxes',
u'detection_scores',
u'detection_classes',
u'num_detections']
and
[u'Const', u'Const_1', u'Const_2', u'image_tensor', u'ToFloat']
Another way is to use the summarize graph tool we described in the previous chapter:
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph= models/research/object_detection/ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb
This will generate the following output:
Found 1 possible inputs: (name=image_tensor, type=uint8(4), shape=[?,?,?,3])
No variables spotted.
Found 4 possible outputs: (name=detection_boxes, op=Identity) (name=detection_scores, op=Identity (name=detection_classes, op=Identity) (name=num_detections, op=Identity)
After each test image is loaded, the actual detection runs:
image = Image.open(image_path)
image_np = load_image_into_numpy_array(image)
image_np_expanded = np.expand_dims(image_np, axis=0)
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
Finally, the detected results are visualized using the matplotlib library. If you use the default two test images that come with the tensorflow/models repo, you’ll see the results in Figure 3.1:
Figure 3.1 Detected objects with bounding boxes and confidence scores
In the Using object detection models in iOS section, we’ll see how to use the same model and draw the same detected result on an iOS device.
You can also test other pre-trained models in the Tensorflow detection model zoo mentioned earlier. For example, if you use the faster_rcnn_inception_v2_coco model by replacing MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17' in the object_detection_tutorial.ipynb notebook with MODEL_NAME = ' faster_rcnn_inception_v2_coco_2017_11_08', which you get from the URL in the TensorFlow detection model zoo page, or MODEL_NAME = ' faster_rcnn_resnet101_coco_2017_11_08', you can see similar detection results with two other Faster RCNN-based models, but they take longer.
Also, using the summarize_graph tool on the two faster_rcnn models generates the same info on input and output:
Found 1 possible inputs: (name=image_tensor, type=uint8(4), shape=[?,?,?,3])
Found 4 possible outputs: (name=detection_boxes, op=Identity) (name=detection_scores, op=Identity) (name=detection_classes, op=Identity) (name=num_detections, op=Identity)
Generally, MobileNet-based models are the fastest, but less accurate (with smaller mAP value) than other large Inception- or Resnet-CNN-based models. By the way, the sizes of the downloaded ssd_mobilenet_v1_coco, faster_rcnn_inception_v2_coco_2017_11_08, and faster_rcnn_resnet101_coco_2017_11_08 files are 76 MB, 149 MB, and 593 MB, respectively. As we’ll see later, on mobile devices, MobileNet-based models, such as ssd_mobilenet_v1_coco, run a lot faster and sometimes a large model, such as faster_rcnn_resnet101_coco_2017_11_08, would simply crash on an older iPhone. Hopefully, the problems you have can be solved using a MobileNet-based model, or a retrained MobileNet model, or a future version of ssd_mobilenet that will certainly offer even better accuracy, although the v1 of ssd_mobilenet is already good enough for many use cases.