How to build an age and gender multi-task predictor with deep learning in TensorFlow Cole Murray · Follow Published in We’ve moved to freeCodeCamp.org/news 5 min read · Dec 13, 2018 Listen Share More Source: https://www.governmentciomedia.com/ai-takes-face-recognition-new-frontiers In my last tutorial, you learned about how to combine a convolutional neural network and Long short-term memory (LTSM) to create captions given an image. In this tutorial, you’ll learn how to build and train a multi-task machine learning model to predict the age and gender of a subject in an image. Overview Introduction to age and gender model Building a Multi-task Tensorflow Estimator Training Prerequisites basic understanding of convolutional neural networks (CNN) basic understanding of TensorFlow GPU (optional) Introduction to Age and Gender Model In 2015, researchers from Computer Vision Lab, D-ITET, published a paper DEX and made public their IMDB-WIKI consisting of 500K+ face images with age and gender labels. IMDB-WIKI Dataset source: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/ DEX outlines an neural network architecture involving a pretrained imagenet vgg16 model that estimates the apparent age in face images. DEX placed first in ChaLearn LAP 2015 — a competition that deals with recognizing people in an image — outperforming human reference. Age as a classification problem A conventional way of tackling an age estimation problem with an image as input would be using a regression-based model with mean-squared error as the loss function. DEX models this problem as a classification task, using a softmax classifier with each age represented as a unique class ranging from 1 to 101 and cross-entropy as the loss function. Multi-task learning Multi-task learning is a technique of training on multiple tasks through a shared architecture. Layers at the beginning of the network will learn a joint generalized representation, preventing overfitting to a specific task that may contain noise. By training with a multi-task network, the network can be trained in parallel on both tasks. This reduces the infrastructure complexity to only one training pipeline. Additionally, the computation required for training is reduced as both tasks are trained simultaneously. Multi-task CNN source: https://murraycole.com Building a multi-task network in TensorFlow Below you’ll use TensorFlow’s estimator abstraction to create the model. The model will be trained from raw image input to predict the age and gender of the face image. Project Structure . ├── ├── │ │ │ ├── │ │ │ │ ├── Dockerfile age_gender_estimation_tutorial ├── cnn_estimator.py ├── cnn_model.py └── dataset.py bin ├── download-imdb.sh ├── predict.py ├── preprocess_imdb.py └── train.py requirements.txt Environment For the environment, you’ll use Docker to install dependencies. A GPU version is also provided for convenience. 1 FROM tensorflow/tensorflow:1.12.0-py3 2 3 4 RUN apt-get update \ && apt-get install -y libsm6 libxrender-dev libxext6 5 6 ADD $PWD/requirements.txt /requirements.txt 7 RUN pip3 install -r /requirements.txt 8 9 CMD ["/bin/bash"] view raw Dockerfile hosted with ❤ by GitHub Dockerfile (CPU version) 1 FROM tensorflow/tensorflow:1.12.0-gpu-py3 2 3 4 RUN apt-get update \ && apt-get install -y libsm6 libxrender-dev libxext6 5 6 ADD $PWD/requirements.txt /requirements.txt 7 RUN pip3 install -r /requirements txt 7 RUN pip3 install -r /requirements.txt Dockerfile.gpu (GPU version) 8 9 CMD ["/bin/bash"] view raw scipy==1.1.0 Dockerfile.gpu hosted with ❤ by GitHub 1 2 numpy==1.15.4 3 opencv-python==3.4.4.19 4 tqdm==4.28.1 view raw requirements.txt hosted with ❤ by GitHub requirements.txt docker build -t colemurray/age-gender-estimation-tutorial -f Dockerfile . Data To train this model, you’ll use the IMDB-WIKI dataset, consisting of 500K+ images. For simplicity, you’ll download the pre-cropped imdb images (7GB). Run the script below to download the data. 1 #!/usr/bin/env bash 2 3 if [[ ! -d "data" ]] 4 then 5 6 mkdir "data" fi 7 8 curl https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_crop.tar -O 9 tar -xzvf imdb_crop -C data 10 download-imdb-crop.sh hosted with ❤ by GitHub chmod +x bin/download-imdb-crop.sh ./bin/download-imdb-crop.sh Preprocessing view raw You’ll now process the dataset to clean out low-quality images and crop the input to a fixed image size. Additionally, you’ll format the data as a CSV to simplify reading into TensorFlow. 1 import argparse as argparse 2 import csv 3 import os 4 import random 5 from datetime import datetime 6 7 import cv2 8 import numpy as np 9 from scipy.io import loadmat 10 from tqdm import tqdm 11 12 headers = ['filename', 'age', 'gender'] 13 14 15 16 def calc_age(taken, dob): birth = datetime.fromordinal(max(int(dob) - 366, 1)) 17 18 # assume the photo was taken in the middle of the year 19 if birth.month < 7: 20 21 return taken - birth.year else: 22 return taken - birth.year - 1 23 24 25 def load_db(mat_path): 26 db = loadmat(mat_path)['imdb'][0, 0] 27 num_records = len(db["face_score"][0]) 28 29 return db, num_records 30 31 32 def get_meta(db): 33 full_path = db["full_path"][0] 34 dob = db["dob"][0] 35 gender = db["gender"][0] 36 photo_taken = db["photo_taken"][0] 37 face_score = db["face_score"][0] 38 second_face_score = db["second_face_score"][0] 39 age = [calc_age(photo_taken[i], dob[i]) for i in range(len(dob))] # Matlab serial date number # year 40 41 return full_path, dob, gender, photo_taken, face_score, second_face_score, age 42 43 44 45 def main(input_db, photo_dir, output_dir, min_score=1.0, img_size=165, split_ratio=0.8): """ 45 46 Takes imdb dataset db and performs processing such as cropping and quality checks, wr docker run -v $PWD:/opt/app \ 47 -e PYTHONPATH=$PYTHONPATH:/opt/app \ 48 :param split_ratio: -it colemurray/age-gender-estimation-tutorial \ python3 /opt/app/bin/preprocess_imdb.py \ 49 :param input_db: Path to imdb db --db-path /opt/app/data/imdb_crop/imdb.mat \ 50 :param photo_dir: Path to photo's directory --photo-dir /opt/app/data/imdb_crop \ 51 :param output_dir: Directory to write output to --output-dir /opt/app/var \ 52 :param min_score: minimum score to filter face quality, --min-score 1.0 \ 53 :param img_size: size to crop images to --img-size 224 range [0, 1.0] 54 55 """ 56 crop_dir = os.path.join(output_dir, 'crop') After approximately 20 minutes, you’ll have a processed dataset. 57 58 if not os.path.exists(output_dir): Next, you’ll use TensorFlow’s data pipeline module 59 os.makedirs(output_dir) tf.data to provide data to the estimator. Tf.data is an abstraction to read and manipulate a dataset in parallel, 60 if not os.path.exists(crop_dir): 61 os.makedirs(crop_dir) utilizing C++ threads for performance. 62 63 db, num_records = load_db(input_db) Here, you’ll utilize TensorFlow’s CSV Reader to parse the data, preprocess the images, 64 create batches, and shuffle. 65 indices = list(range(num_records)) 66 random.shuffle(indices) 67 68 train_indices = indices[:int(len(indices) * split_ratio)] 69 test_indices = indices[int(len(indices) * split_ratio):] 70 71 train_csv = open(os.path.join(output_dir, 'train.csv'), 'w') 72 train_writer = csv.writer(train_csv, delimiter=',', ) 73 train_writer.writerow(headers) 74 75 val_csv = open(os.path.join(output_dir, 'val.csv'), 'w') 76 val_writer = csv.writer(val_csv, delimiter=',') 77 val_writer.writerow(headers) 78 79 clean_and_resize(db, photo_dir, train_indices, min_score, img_size, train_writer, cro 80 81 clean_and_resize(db, photo_dir, test_indices, min_score, img_size, val_writer, crop_d 82 83 84 def clean_and_resize(db, photo_dir, indices, min_score, img_size, writer, crop_dir): 85 """ 86 Cleans records and writes output to :param writer 87 :param db: 88 :param photo_dir: 89 :param indices: 90 1 91 2 :param min_score: import os :param img_size: 92 3 93 4 :param crop_dir: import tensorflow as tf :param writer: 94 5 95 6 96 7 97 8 98 9 99 10 100 11 101 12 102 13 103 14 104 15 105 16 :return: """ def csv_record_input_fn(img_dir, filenames, img_size=150, repeat_count=-1, shuffle=True, full_path, dob, gender, photo_taken, face_score, second_face_score, age = get_meta(db batch_size=16, random=True): for i in tqdm(indices): """ filename = str(full_path[i][0]) Creates tensorflow dataset iterator over records from :param{filenames}. if not os.path.exists(os.path.join(crop_dir, os.path.dirname(filename))): os.makedirs(os.path.join(crop_dir, :param img_dir: Path to directory of croppedos.path.dirname(filename))) images :param filenames: array of file paths to load rows from img_path = os.path.join(photo_dir, filename) :param img_size: size of image :param repeat_count: number of times for iterator to repeat ifshuffle: float(face_score[i]) < min_score: :param flag for shuffling dataset continue number of examples in batch :param batch_size: 106 17 107 18 :param random: flag for random distortion to the image if (~np.isnan(second_face_score[i])) and second_face_score[i] > 0.0: :return: Iterator of dataset 108 19 109 20 """ 110 21 111 22 if ~(0 <= age[i] <= 100): def parse_csv_row(line): continue defaults = [[""], [0], [0]] 112 23 113 24 114 25 115 26 continue filename, age, gender = tf.decode_csv(line, defaults) if np.isnan(gender[i]): filename = os.path.join(img_dir) + '/' + filename continue 116 27 117 28 image_string = tf.read_file(filename) img_gender = int(gender[i]) image = tf.image.decode_image(image_string, channels=3) img_age = int(age[i]) image = tf.cast(image, tf.float32) 118 29 119 30 image = tf.image.per_image_standardization(image) img = cv2.imread(img_path)img_size, 3]) image.set_shape([img_size, 120 31 121 32 122 33 123 34 124 35 125 36 126 37 127 38 128 39 129 40 130 41 131 42 132 43 133 44 134 45 crop = cv2.resize(img, (img_size, img_size)) crop_filepath = os.path.join(crop_dir, filename) age = tf.cast(age, tf.int64) cv2.imwrite(crop_filepath, crop) gender = tf.cast(gender, tf.int64) writer.writerow([filename, img_age, img_gender]) if random: image = tf.image.random_flip_left_right(image) if __name__ =={'image': '__main__': return image}, dict(gender=gender, age=age) parser = argparse.ArgumentParser() parser.add_argument('--db-path', required=True) dataset = tf.data.TextLineDataset(filenames).skip(1) parser.add_argument('--photo-dir', dataset = dataset.map(parse_csv_row)required=True) parser.add_argument('--output-dir', required=True) if shuffle: parser.add_argument('--min-score', required=False, type=float, default=1.0) dataset = dataset.shuffle(buffer_size=2000) parser.add_argument('--img-size', dataset = dataset.batch(batch_size)type=int, required=False, default=224) parser.add argument('--split-ratio', dataset = dataset repeat(repeat count)type=float, required=False, default=0.8) 45 Model 135 46 136 Below, 47 p _ g ( p , yp dataset = dataset.repeat(repeat_count) dataset = dataset.prefetch(batch_size * 10) args = parser.parse_args() you’ll create a basic CNN model. The model , q , ) consists of three convolutions and 137 48 fully connected iterator = dataset.make_one_shot_iterator() two layers, with a softmax classifier head for each task. 138 49 139 main(input_db=args.db_path, photo_dir=args.photo_dir, output_dir=args.output_dir, return iterator.get_next() min_score=args.min_score, img_size=args.img_size) dataset.py hosted with ❤ by GitHub preprocess_imdb.py hosted with ❤ by GitHub view raw view raw 1 import tensorflow as tf 2 3 4 def network(feature_input, labels, mode): 5 """ 6 Creates a simple multi-layer convolutional neural network 7 8 :param feature_input: 9 :param labels: 10 :param mode: 11 :return: 12 """ 13 filters = [32, 64, 128] 14 dropout_rates = [0.2, 0.4, 0.7] 15 conv_layer = feature_input 16 17 18 for filter_num, dropout_rate in zip(filters, dropout_rates): conv_layer = conv_block(conv_layer, mode, filters=filter_num, dropout=dropout_rate 19 20 # Dense Layer 21 pool4_flat = tf.layers.flatten(conv_layer) 22 dense = tf.layers.dense(inputs=pool4_flat, units=1024, activation=tf.nn.relu) 23 dropout = tf.layers.dropout( 24 inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN) 25 26 # Age Head 27 age_dense = tf.layers.dense(inputs=dropout, units=1024) 28 age_logits = tf.layers.dense(inputs=age_dense, units=101) 29 30 # Gender head 31 gender_dense = tf.layers.dense(inputs=dropout, units=1024) 32 gender_logits = tf.layers.dense(inputs=gender_dense, units=2) 33 34 return age_logits, gender_logits 35 36 37 38 def conv_block(input_layer, mode, filters=64, dropout=0.0): conv = tf.layers.conv2d( 39 inputs=input_layer, 40 filters=filters, 41 kernel_size=[5, 5], 42 padding="same", 43 activation=tf.nn.relu) 44 45 pool = tf.layers.max_pooling2d(inputs=conv, pool_size=[2, 2], strides=2) 45 Joint loss function 46 dropout_layer = tf.layers.dropout( For operation, you’ll use the Adam Optimizer. For a loss function, you’ll 47 the training inputs=pool, rate=dropout, training=mode == tf.estimator.ModeKeys.TRAIN) 48 average the cross-entropy error of each head, creating a shared loss function between 49 return dropout_layer the heads. view raw cnn_model.py hosted with ❤ by GitHub age and gender joint loss function TensorFlow estimator TensorFlow estimators provide a simple abstraction for graph creation and runtime processing. TensorFlow has specified an interface model_fn , that can be used to create custom estimators. Below, you’ll take the network created above and create training, eval, and predict. These specifications will be used by TensorFlow’s estimator class to alter the behavior of the graph. 1 import tensorflow as tf 2 3 from age_gender_estimation_tutorial.cnn_model import network 4 5 6 def model_fn(features, labels, mode, params): 7 """ 8 Creates model_fn for Tensorflow estimator. This function takes features and input, an 9 is responsible for the creation and processing of the Tensorflow graph for training, 10 11 Expected feature: {'image': image tensor } 12 13 :param features: dictionary of input features 14 :param labels: dictionary of ground truth labels 15 :param mode: graph mode 16 :param params: params to configure model 17 :return: Estimator spec dependent on mode 18 """ 19 learning_rate = params['learning_rate'] 20 image_input = features['image'] 21 22 age_logits, logits = network(image_input, labels, mode) 23 24 if mode == tf.estimator.ModeKeys.PREDICT: 25 return get_prediction_spec(age_logits, logits) 26 27 joint_loss = get_loss(age_logits, logits, labels) 28 29 if mode == tf.estimator.ModeKeys.TRAIN: 30 return get_training_spec(learning_rate, joint_loss) 31 32 else: 33 return get_eval_spec(logits, age_logits, labels, joint_loss) 34 35 36 def get_prediction_spec(age_logits, logits): 37 """ 38 Creates estimator spec for prediction 39 40 :param age_logits: logits of age task 41 :param logits: logits of gender task 42 :return: Estimator spec 43 """ 44 predictions = { 45 "classes": tf argmax(input=logits axis=1) 45 classes : tf.argmax(input=logits, axis=1), Train 46 "age_class": tf.argmax(input=age_logits, name='age_class', axis=1), 47 that you’ve "age_prob": tf.nn.softmax(age_logits, name='age_prob'), Now preprocessed the data and created the model architecture and data 48 pipeline, you’ll"probabilities": begin trainingtf.nn.softmax(logits, the model. name="softmax_tensor") 49 } 50 return tf.estimator.EstimatorSpec(mode=tf.estimator.ModeKeys.PREDICT, predictions=pre 51 52 53 def get_loss(age_logits, gender_logits, labels): 54 """ 55 Creates joint loss function 56 57 :param age_logits: logits of age 58 :param gender_logits: logits of gender task 59 :param labels: ground-truth labels of age and gender 60 :return: joint loss of age and gender 61 """ 62 gender_loss = tf.losses.sparse_softmax_cross_entropy(labels=labels['gender'], logits= 63 age_loss = tf.losses.sparse_softmax_cross_entropy(labels=labels['age'], logits=age_lo 64 joint_loss = gender_loss + age_loss 65 return joint_loss 66 67 68 def get_eval_spec(gender_logits, age_logits, labels, loss): 69 """ 70 Creates eval spec for tensorflow estimator 71 :param gender_logits: logits of gender task 72 :param age_logits: logits of age task 73 :param labels: ground truth labels for age and gender 74 :param loss: loss op 75 :return: Eval estimator spec 76 """ 77 eval_metric_ops = { 78 "gender_accuracy": tf.metrics.accuracy( 79 labels=labels['gender'], predictions=tf.argmax(gender_logits, axis=1)), 80 'age_accuracy': tf.metrics.accuracy(labels=labels['age'], predictions=tf.argmax(a 81 'age_precision': tf.metrics.sparse_precision_at_k(labels=labels['age'], 82 predictions=age_logits, k=10) 83 } 84 return tf.estimator.EstimatorSpec( 85 mode=tf.estimator.ModeKeys.EVAL, loss=loss, eval_metric_ops=eval_metric_ops) 86 87 88 89 def get_training_spec(learning_rate, joint_loss): """ 90 1 Creates training estimator spec import argparse 91 2 92 3 :param learning for optimizer import tensorflow as rate tf 93 4 :param joint_loss: loss op 94 5 Training estimator spec from :return: medium_age_estimation_tutorial.cnn_estimator import model_fn, serving_fn 95 6 from """ medium_age_estimation_tutorial.dataset import csv_record_input_fn 96 7 97 8 optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) gender_train_op = optimizer.minimize( tf.logging.set_verbosity(tf.logging.INFO) 98 9 99 10 100 11 loss=joint_loss, global_step=tf.train.get_global_step()) if __name__ == '__main__': return=tf.estimator.EstimatorSpec(mode=tf.estimator.ModeKeys.TRAIN, loss=joint_loss, parser argparse.ArgumentParser() 101 12 102 13 103 14 parser.add_argument('--img-dir') defparser.add_argument('--train-csv') serving_fn(): 104 15 receiver_tensor = { parser.add_argument('--val-csv') 105 16 'image': tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3]) parser.add_argument('--model-dir') 106 17 } parser.add_argument('--img-size', type=int, default=160) 107 18 parser.add_argument('--num-steps', type=int, default=200000) 108 19 109 20 features = { tf.image.resize_images(receiver_tensor['image'], [224, 224]) args 'image': = parser.parse_args() 110 21 111 22 } config = tf.estimator.RunConfig(model_dir=args.model_dir, 112 23 return tf.estimator.export.ServingInputReceiver(features, receiver_tensor) save_checkpoints_steps=1500, 24 25 cnn_estimator.py hosted with ❤ by GitHub view raw ) 26 27 estimator = tf.estimator.Estimator( 28 model_fn=model_fn, config=config, params={ 29 'learning_rate': 0.0001 30 }) 31 32 train_spec = tf.estimator.TrainSpec( 33 input_fn=lambda: csv_record_input_fn(args.img_dir, args.train_csv, args.img_size, 34 max_steps=args.num_steps, 35 ) 36 eval_spec = tf.estimator.EvalSpec( 37 input_fn=lambda: csv_record_input_fn(args.img_dir, args.val_csv, args.img_size, ba 38 39 random=False), ) 40 41 tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) 42 43 estimator.export_savedmodel(export_dir_base='{}/serving'.format(args.model_dir), 44 serving_input_receiver_fn=serving_fn, 45 as text=True) 45 as_text=True) docker run -v $PWD:/opt/app \ view raw train.py hosted with ❤ by GitHub -e PYTHONPATH=$PYTHONPATH:/opt/app \ -it colemurray/age-gender-estimation-tutorial:gpu \ python3 /opt/app/bin/train.py \ --img-dir /opt/app/var/crop \ --train-csv /opt/app/var/train.csv \ --val-csv /opt/app/var/val.csv \ --model-dir /opt/app/var/cnn-model \ --img-size 224 \ --num-steps 200000 Predict Below, you’ll load your age and gender TensorFlow model. The model will be loaded from disk and predict on the provided image. 1 import logging 2 from argparse import ArgumentParser Open in app Search Medium 3 4 import tensorflow as tf 5 from scipy.misc import imread 6 from tensorflow.contrib import predictor 2 7 8 logging.basicConfig(level=logging.INFO) 9 logger = logging.getLogger(__name__) 10 11 tf.logging.set_verbosity(tf.logging.INFO) 12 13 if __name__ == '__main__': 14 parser = ArgumentParser(add_help=True) 15 parser.add_argument('--model-dir', required=True) 16 parser.add_argument('--image-path', required=True) 17 18 args = parser.parse_args() 19 20 prediction_fn = predictor.from_saved_model(export_dir=args.model_dir, signature_def_ke 21 22 batch = [] 23 24 image = imread(args.image_path) 25 output = prediction_fn({ 26 'image': [image] 27 }) 28 print(output) predict.py predict.py hosted with ❤ by GitHub view raw # Update the model path below with your model docker run -v $PWD:/opt/app \ -e PYTHONPATH=$PYTHONPATH:/opt/app \ -it colemurray/age-gender-estimation-tutorial \ python3 /opt/app/bin/predict.py \ --image-path /opt/app/var/crop/25/nm0000325_rm2755562752_1956-17_2002.jpg \ --model-dir /opt/app/var/cnn-model-3/serving/<TIMESTAMP> Predicted: M/46 Actual: M/46 Conclusion In this tutorial, you learned how to build and train a multi-task network for predicting a subject’s age and image. By using a shared architecture, both targets can be trained and predicted simultaneously. Next Steps: Evaluate on Your Own Dataset Try a different network architecture Experiment with Different Hyperparameters Questions/issues? Open an issue here on GitHub Complete code here. Call to Action If you enjoyed this tutorial, follow and recommend! Interested in learning more about Deep Learning / Machine Learning? Check out my other tutorials: - Building an image caption generator with Deep Learning in Tensorflow - Building a Facial Recognition Pipeline with Deep Learning in Tensorflow - Deep Learning CNN’s in Tensorflow with GPUs - Deep Learning with Keras on Google Compute Engine - Recommendation Systems with Apache Spark on Google Compute Engine Other places you can find me: Cole Murray (@_ColeMurray) | Twitter The latest Tweets from Cole Murray (@_ColeMurray). Interests in: Machine Learning, Big Data, Android, React/flux… twitter.com Machine Learning Deep Learning TensorFlow Data Science Technology