Uploaded by John Smith

How to build an age and gender multi-task predictor with deep learning in TensorFlow by Cole Murray We’ve moved to freeCodeCamp.org news Medium

advertisement
How to build an age and gender multi-task
predictor with deep learning in TensorFlow
Cole Murray · Follow
Published in We’ve moved to freeCodeCamp.org/news
5 min read · Dec 13, 2018
Listen
Share
More
Source: https://www.governmentciomedia.com/ai-takes-face-recognition-new-frontiers
In my last tutorial, you learned about how to combine a convolutional neural network
and Long short-term memory (LTSM) to create captions given an image. In this
tutorial, you’ll learn how to build and train a multi-task machine learning model to
predict the age and gender of a subject in an image.
Overview
Introduction to age and gender model
Building a Multi-task Tensorflow Estimator
Training
Prerequisites
basic understanding of convolutional neural networks (CNN)
basic understanding of TensorFlow
GPU (optional)
Introduction to Age and Gender Model
In 2015, researchers from Computer Vision Lab, D-ITET, published a paper DEX and
made public their IMDB-WIKI consisting of 500K+ face images with age and gender
labels.
IMDB-WIKI Dataset source: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/
DEX outlines an neural network architecture involving a pretrained imagenet vgg16
model that estimates the apparent age in face images. DEX placed first in ChaLearn
LAP 2015 — a competition that deals with recognizing people in an image —
outperforming human reference.
Age as a classification problem
A conventional way of tackling an age estimation problem with an image as input
would be using a regression-based model with mean-squared error as the loss
function. DEX models this problem as a classification task, using a softmax classifier
with each age represented as a unique class ranging from 1 to 101 and cross-entropy as
the loss function.
Multi-task learning
Multi-task learning is a technique of training on multiple tasks through a shared
architecture. Layers at the beginning of the network will learn a joint generalized
representation, preventing overfitting to a specific task that may contain noise.
By training with a multi-task network, the network can be trained in parallel on both
tasks. This reduces the infrastructure complexity to only one training pipeline.
Additionally, the computation required for training is reduced as both tasks are trained
simultaneously.
Multi-task CNN source: https://murraycole.com
Building a multi-task network in TensorFlow
Below you’ll use TensorFlow’s estimator abstraction to create the model. The model
will be trained from raw image input to predict the age and gender of the face image.
Project Structure
.
├──
├──
│
│
│
├──
│
│
│
│
├──
Dockerfile
age_gender_estimation_tutorial
├── cnn_estimator.py
├── cnn_model.py
└── dataset.py
bin
├── download-imdb.sh
├── predict.py
├── preprocess_imdb.py
└── train.py
requirements.txt
Environment
For the environment, you’ll use Docker to install dependencies. A GPU version is also
provided for convenience.
1
FROM tensorflow/tensorflow:1.12.0-py3
2
3
4
RUN apt-get update \
&& apt-get install -y libsm6 libxrender-dev libxext6
5
6
ADD $PWD/requirements.txt /requirements.txt
7
RUN pip3 install -r /requirements.txt
8
9
CMD ["/bin/bash"]
view raw
Dockerfile hosted with ❤ by GitHub
Dockerfile (CPU version)
1
FROM tensorflow/tensorflow:1.12.0-gpu-py3
2
3
4
RUN apt-get update \
&& apt-get install -y libsm6 libxrender-dev libxext6
5
6
ADD $PWD/requirements.txt /requirements.txt
7
RUN pip3 install -r /requirements txt
7
RUN pip3 install -r /requirements.txt
Dockerfile.gpu (GPU version)
8
9
CMD ["/bin/bash"]
view raw
scipy==1.1.0
Dockerfile.gpu
hosted with ❤ by GitHub
1
2
numpy==1.15.4
3
opencv-python==3.4.4.19
4
tqdm==4.28.1
view raw
requirements.txt hosted with ❤ by GitHub
requirements.txt
docker build -t colemurray/age-gender-estimation-tutorial -f
Dockerfile .
Data
To train this model, you’ll use the IMDB-WIKI dataset, consisting of 500K+ images. For
simplicity, you’ll download the pre-cropped imdb images (7GB). Run the script below to
download the data.
1
#!/usr/bin/env bash
2
3
if [[ ! -d "data" ]]
4
then
5
6
mkdir "data"
fi
7
8
curl https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_crop.tar -O
9
tar -xzvf imdb_crop -C data
10
download-imdb-crop.sh hosted with ❤ by GitHub
chmod +x bin/download-imdb-crop.sh
./bin/download-imdb-crop.sh
Preprocessing
view raw
You’ll now process the dataset to clean out low-quality images and crop the input to a
fixed image size. Additionally, you’ll format the data as a CSV to simplify reading into
TensorFlow.
1
import argparse as argparse
2
import csv
3
import os
4
import random
5
from datetime import datetime
6
7
import cv2
8
import numpy as np
9
from scipy.io import loadmat
10
from tqdm import tqdm
11
12
headers = ['filename', 'age', 'gender']
13
14
15
16
def calc_age(taken, dob):
birth = datetime.fromordinal(max(int(dob) - 366, 1))
17
18
# assume the photo was taken in the middle of the year
19
if birth.month < 7:
20
21
return taken - birth.year
else:
22
return taken - birth.year - 1
23
24
25
def load_db(mat_path):
26
db = loadmat(mat_path)['imdb'][0, 0]
27
num_records = len(db["face_score"][0])
28
29
return db, num_records
30
31
32
def get_meta(db):
33
full_path = db["full_path"][0]
34
dob = db["dob"][0]
35
gender = db["gender"][0]
36
photo_taken = db["photo_taken"][0]
37
face_score = db["face_score"][0]
38
second_face_score = db["second_face_score"][0]
39
age = [calc_age(photo_taken[i], dob[i]) for i in range(len(dob))]
# Matlab serial date number
# year
40
41
return full_path, dob, gender, photo_taken, face_score, second_face_score, age
42
43
44
45
def main(input_db, photo_dir, output_dir, min_score=1.0, img_size=165, split_ratio=0.8):
"""
45
46
Takes imdb dataset db and performs processing such as cropping and quality checks, wr
docker run -v $PWD:/opt/app \
47
-e PYTHONPATH=$PYTHONPATH:/opt/app \
48
:param split_ratio:
-it colemurray/age-gender-estimation-tutorial
\
python3
/opt/app/bin/preprocess_imdb.py
\
49
:param input_db: Path to imdb db
--db-path
/opt/app/data/imdb_crop/imdb.mat
\
50
:param
photo_dir: Path to photo's directory
--photo-dir
/opt/app/data/imdb_crop \
51
:param output_dir: Directory to write output to
--output-dir /opt/app/var \
52
:param min_score: minimum score to filter face quality,
--min-score 1.0 \
53
:param img_size: size to crop images to
--img-size
224
range [0, 1.0]
54
55
"""
56
crop_dir = os.path.join(output_dir, 'crop')
After approximately 20 minutes, you’ll have a processed dataset.
57
58
if not os.path.exists(output_dir):
Next,
you’ll use
TensorFlow’s data pipeline module
59
os.makedirs(output_dir)
tf.data
to provide data to the
estimator.
Tf.data
is an abstraction to read and manipulate a dataset in parallel,
60
if
not os.path.exists(crop_dir):
61
os.makedirs(crop_dir)
utilizing
C++ threads
for performance.
62
63
db, num_records = load_db(input_db)
Here, you’ll utilize TensorFlow’s CSV Reader to parse the data, preprocess the images,
64
create
batches,
and shuffle.
65
indices = list(range(num_records))
66
random.shuffle(indices)
67
68
train_indices = indices[:int(len(indices) * split_ratio)]
69
test_indices = indices[int(len(indices) * split_ratio):]
70
71
train_csv = open(os.path.join(output_dir, 'train.csv'), 'w')
72
train_writer = csv.writer(train_csv, delimiter=',', )
73
train_writer.writerow(headers)
74
75
val_csv = open(os.path.join(output_dir, 'val.csv'), 'w')
76
val_writer = csv.writer(val_csv, delimiter=',')
77
val_writer.writerow(headers)
78
79
clean_and_resize(db, photo_dir, train_indices, min_score, img_size, train_writer, cro
80
81
clean_and_resize(db, photo_dir, test_indices, min_score, img_size, val_writer, crop_d
82
83
84
def clean_and_resize(db, photo_dir, indices, min_score, img_size, writer, crop_dir):
85
"""
86
Cleans records and writes output to :param writer
87
:param db:
88
:param photo_dir:
89
:param indices:
90
1
91
2
:param
min_score:
import
os
:param img_size:
92
3
93
4
:param
crop_dir:
import
tensorflow
as tf
:param writer:
94
5
95
6
96
7
97
8
98
9
99
10
100
11
101
12
102
13
103
14
104
15
105
16
:return:
"""
def csv_record_input_fn(img_dir,
filenames, img_size=150, repeat_count=-1, shuffle=True,
full_path, dob, gender,
photo_taken,
face_score, second_face_score, age = get_meta(db
batch_size=16,
random=True):
for i in tqdm(indices):
"""
filename
= str(full_path[i][0])
Creates
tensorflow
dataset iterator over records from :param{filenames}.
if not os.path.exists(os.path.join(crop_dir, os.path.dirname(filename))):
os.makedirs(os.path.join(crop_dir,
:param img_dir:
Path to directory of croppedos.path.dirname(filename)))
images
:param filenames: array of file paths to load rows from
img_path
= os.path.join(photo_dir,
filename)
:param
img_size:
size of image
:param repeat_count: number of times for iterator to repeat
ifshuffle:
float(face_score[i])
< min_score:
:param
flag for shuffling
dataset
continue number of examples in batch
:param batch_size:
106
17
107
18
:param random: flag for random distortion to the image
if (~np.isnan(second_face_score[i]))
and second_face_score[i] > 0.0:
:return:
Iterator of dataset
108
19
109
20
"""
110
21
111
22
if ~(0 <= age[i] <= 100):
def parse_csv_row(line):
continue
defaults
= [[""], [0], [0]]
112
23
113
24
114
25
115
26
continue
filename, age, gender = tf.decode_csv(line, defaults)
if np.isnan(gender[i]):
filename
= os.path.join(img_dir) + '/' + filename
continue
116
27
117
28
image_string = tf.read_file(filename)
img_gender
= int(gender[i])
image
= tf.image.decode_image(image_string,
channels=3)
img_age
=
int(age[i])
image = tf.cast(image, tf.float32)
118
29
119
30
image = tf.image.per_image_standardization(image)
img = cv2.imread(img_path)img_size, 3])
image.set_shape([img_size,
120
31
121
32
122
33
123
34
124
35
125
36
126
37
127
38
128
39
129
40
130
41
131
42
132
43
133
44
134
45
crop = cv2.resize(img, (img_size, img_size))
crop_filepath
= os.path.join(crop_dir,
filename)
age
= tf.cast(age,
tf.int64)
cv2.imwrite(crop_filepath,
crop)
gender
= tf.cast(gender, tf.int64)
writer.writerow([filename,
img_age, img_gender])
if
random:
image = tf.image.random_flip_left_right(image)
if __name__
=={'image':
'__main__':
return
image}, dict(gender=gender, age=age)
parser = argparse.ArgumentParser()
parser.add_argument('--db-path',
required=True)
dataset
= tf.data.TextLineDataset(filenames).skip(1)
parser.add_argument('--photo-dir',
dataset
= dataset.map(parse_csv_row)required=True)
parser.add_argument('--output-dir',
required=True)
if
shuffle:
parser.add_argument('--min-score',
required=False, type=float, default=1.0)
dataset = dataset.shuffle(buffer_size=2000)
parser.add_argument('--img-size',
dataset
= dataset.batch(batch_size)type=int, required=False, default=224)
parser.add
argument('--split-ratio',
dataset
= dataset
repeat(repeat count)type=float, required=False, default=0.8)
45
Model
135
46
136
Below,
47
p
_ g
(
p
, yp
dataset
= dataset.repeat(repeat_count)
dataset = dataset.prefetch(batch_size * 10)
args
= parser.parse_args()
you’ll
create
a basic CNN model. The model
,
q
,
)
consists of three convolutions and
137
48 fully connected
iterator = dataset.make_one_shot_iterator()
two
layers, with a softmax classifier head for each task.
138
49
139
main(input_db=args.db_path,
photo_dir=args.photo_dir, output_dir=args.output_dir,
return
iterator.get_next()
min_score=args.min_score, img_size=args.img_size)
dataset.py hosted with ❤ by GitHub
preprocess_imdb.py hosted with ❤ by GitHub
view raw
view raw
1
import tensorflow as tf
2
3
4
def network(feature_input, labels, mode):
5
"""
6
Creates a simple multi-layer convolutional neural network
7
8
:param feature_input:
9
:param labels:
10
:param mode:
11
:return:
12
"""
13
filters = [32, 64, 128]
14
dropout_rates = [0.2, 0.4, 0.7]
15
conv_layer = feature_input
16
17
18
for filter_num, dropout_rate in zip(filters, dropout_rates):
conv_layer = conv_block(conv_layer, mode, filters=filter_num, dropout=dropout_rate
19
20
# Dense Layer
21
pool4_flat = tf.layers.flatten(conv_layer)
22
dense = tf.layers.dense(inputs=pool4_flat, units=1024, activation=tf.nn.relu)
23
dropout = tf.layers.dropout(
24
inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)
25
26
# Age Head
27
age_dense = tf.layers.dense(inputs=dropout, units=1024)
28
age_logits = tf.layers.dense(inputs=age_dense, units=101)
29
30
# Gender head
31
gender_dense = tf.layers.dense(inputs=dropout, units=1024)
32
gender_logits = tf.layers.dense(inputs=gender_dense, units=2)
33
34
return age_logits, gender_logits
35
36
37
38
def conv_block(input_layer, mode, filters=64, dropout=0.0):
conv = tf.layers.conv2d(
39
inputs=input_layer,
40
filters=filters,
41
kernel_size=[5, 5],
42
padding="same",
43
activation=tf.nn.relu)
44
45
pool = tf.layers.max_pooling2d(inputs=conv, pool_size=[2, 2], strides=2)
45
Joint loss function
46
dropout_layer = tf.layers.dropout(
For
operation,
you’ll use the
Adam Optimizer.
For a loss function, you’ll
47 the training
inputs=pool,
rate=dropout,
training=mode
== tf.estimator.ModeKeys.TRAIN)
48
average
the cross-entropy error of each head, creating a shared loss function between
49
return dropout_layer
the heads.
view raw
cnn_model.py hosted with ❤ by GitHub
age and gender joint loss function
TensorFlow estimator
TensorFlow estimators provide a simple abstraction for graph creation and runtime
processing. TensorFlow has specified an interface
model_fn ,
that can be used to create
custom estimators.
Below, you’ll take the network created above and create training, eval, and predict.
These specifications will be used by TensorFlow’s estimator class to alter the behavior
of the graph.
1
import tensorflow as tf
2
3
from age_gender_estimation_tutorial.cnn_model import network
4
5
6
def model_fn(features, labels, mode, params):
7
"""
8
Creates model_fn for Tensorflow estimator. This function takes features and input, an
9
is responsible for the creation and processing of the Tensorflow graph for training,
10
11
Expected feature: {'image': image tensor }
12
13
:param features: dictionary of input features
14
:param labels: dictionary of ground truth labels
15
:param mode: graph mode
16
:param params: params to configure model
17
:return: Estimator spec dependent on mode
18
"""
19
learning_rate = params['learning_rate']
20
image_input = features['image']
21
22
age_logits, logits = network(image_input, labels, mode)
23
24
if mode == tf.estimator.ModeKeys.PREDICT:
25
return get_prediction_spec(age_logits, logits)
26
27
joint_loss = get_loss(age_logits, logits, labels)
28
29
if mode == tf.estimator.ModeKeys.TRAIN:
30
return get_training_spec(learning_rate, joint_loss)
31
32
else:
33
return get_eval_spec(logits, age_logits, labels, joint_loss)
34
35
36
def get_prediction_spec(age_logits, logits):
37
"""
38
Creates estimator spec for prediction
39
40
:param age_logits: logits of age task
41
:param logits: logits of gender task
42
:return: Estimator spec
43
"""
44
predictions = {
45
"classes": tf argmax(input=logits
axis=1)
45
classes : tf.argmax(input=logits, axis=1),
Train
46
"age_class": tf.argmax(input=age_logits, name='age_class', axis=1),
47 that you’ve
"age_prob":
tf.nn.softmax(age_logits,
name='age_prob'),
Now
preprocessed
the data and created
the model architecture and data
48
pipeline,
you’ll"probabilities":
begin trainingtf.nn.softmax(logits,
the model.
name="softmax_tensor")
49
}
50
return tf.estimator.EstimatorSpec(mode=tf.estimator.ModeKeys.PREDICT, predictions=pre
51
52
53
def get_loss(age_logits, gender_logits, labels):
54
"""
55
Creates joint loss function
56
57
:param age_logits: logits of age
58
:param gender_logits: logits of gender task
59
:param labels: ground-truth labels of age and gender
60
:return: joint loss of age and gender
61
"""
62
gender_loss = tf.losses.sparse_softmax_cross_entropy(labels=labels['gender'], logits=
63
age_loss = tf.losses.sparse_softmax_cross_entropy(labels=labels['age'], logits=age_lo
64
joint_loss = gender_loss + age_loss
65
return joint_loss
66
67
68
def get_eval_spec(gender_logits, age_logits, labels, loss):
69
"""
70
Creates eval spec for tensorflow estimator
71
:param gender_logits: logits of gender task
72
:param age_logits: logits of age task
73
:param labels: ground truth labels for age and gender
74
:param loss: loss op
75
:return: Eval estimator spec
76
"""
77
eval_metric_ops = {
78
"gender_accuracy": tf.metrics.accuracy(
79
labels=labels['gender'], predictions=tf.argmax(gender_logits, axis=1)),
80
'age_accuracy': tf.metrics.accuracy(labels=labels['age'], predictions=tf.argmax(a
81
'age_precision': tf.metrics.sparse_precision_at_k(labels=labels['age'],
82
predictions=age_logits, k=10)
83
}
84
return tf.estimator.EstimatorSpec(
85
mode=tf.estimator.ModeKeys.EVAL, loss=loss, eval_metric_ops=eval_metric_ops)
86
87
88
89
def get_training_spec(learning_rate, joint_loss):
"""
90
1
Creates
training estimator spec
import
argparse
91
2
92
3
:param
learning
for optimizer
import
tensorflow
as rate
tf
93
4
:param joint_loss: loss op
94
5
Training estimator spec
from :return:
medium_age_estimation_tutorial.cnn_estimator
import model_fn, serving_fn
95
6
from """
medium_age_estimation_tutorial.dataset import csv_record_input_fn
96
7
97
8
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gender_train_op = optimizer.minimize(
tf.logging.set_verbosity(tf.logging.INFO)
98
9
99
10
100
11
loss=joint_loss,
global_step=tf.train.get_global_step())
if __name__
== '__main__':
return=tf.estimator.EstimatorSpec(mode=tf.estimator.ModeKeys.TRAIN,
loss=joint_loss,
parser
argparse.ArgumentParser()
101
12
102
13
103
14
parser.add_argument('--img-dir')
defparser.add_argument('--train-csv')
serving_fn():
104
15
receiver_tensor = {
parser.add_argument('--val-csv')
105
16
'image': tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3])
parser.add_argument('--model-dir')
106
17
}
parser.add_argument('--img-size',
type=int, default=160)
107
18
parser.add_argument('--num-steps', type=int, default=200000)
108
19
109
20
features = {
tf.image.resize_images(receiver_tensor['image'], [224, 224])
args 'image':
= parser.parse_args()
110
21
111
22
}
config = tf.estimator.RunConfig(model_dir=args.model_dir,
112
23
return tf.estimator.export.ServingInputReceiver(features,
receiver_tensor)
save_checkpoints_steps=1500,
24
25
cnn_estimator.py
hosted with ❤ by GitHub
view raw
)
26
27
estimator = tf.estimator.Estimator(
28
model_fn=model_fn, config=config, params={
29
'learning_rate': 0.0001
30
})
31
32
train_spec = tf.estimator.TrainSpec(
33
input_fn=lambda: csv_record_input_fn(args.img_dir, args.train_csv, args.img_size,
34
max_steps=args.num_steps,
35
)
36
eval_spec = tf.estimator.EvalSpec(
37
input_fn=lambda: csv_record_input_fn(args.img_dir, args.val_csv, args.img_size, ba
38
39
random=False),
)
40
41
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
42
43
estimator.export_savedmodel(export_dir_base='{}/serving'.format(args.model_dir),
44
serving_input_receiver_fn=serving_fn,
45
as text=True)
45
as_text=True)
docker run -v $PWD:/opt/app \
view raw
train.py
hosted with ❤ by GitHub
-e PYTHONPATH=$PYTHONPATH:/opt/app
\
-it colemurray/age-gender-estimation-tutorial:gpu \
python3 /opt/app/bin/train.py \
--img-dir /opt/app/var/crop \
--train-csv /opt/app/var/train.csv \
--val-csv /opt/app/var/val.csv \
--model-dir /opt/app/var/cnn-model \
--img-size 224 \
--num-steps 200000
Predict
Below, you’ll load your age and gender TensorFlow model. The model will be loaded
from disk and predict on the provided image.
1
import logging
2
from argparse import ArgumentParser
Open in app
Search Medium
3
4
import tensorflow as tf
5
from scipy.misc import imread
6
from tensorflow.contrib import predictor
2
7
8
logging.basicConfig(level=logging.INFO)
9
logger = logging.getLogger(__name__)
10
11
tf.logging.set_verbosity(tf.logging.INFO)
12
13
if __name__ == '__main__':
14
parser = ArgumentParser(add_help=True)
15
parser.add_argument('--model-dir', required=True)
16
parser.add_argument('--image-path', required=True)
17
18
args = parser.parse_args()
19
20
prediction_fn = predictor.from_saved_model(export_dir=args.model_dir, signature_def_ke
21
22
batch = []
23
24
image = imread(args.image_path)
25
output = prediction_fn({
26
'image': [image]
27
})
28
print(output)
predict.py
predict.py hosted with ❤ by GitHub
view raw
# Update the model path below with your model
docker run -v $PWD:/opt/app \
-e PYTHONPATH=$PYTHONPATH:/opt/app \
-it colemurray/age-gender-estimation-tutorial \
python3 /opt/app/bin/predict.py \
--image-path /opt/app/var/crop/25/nm0000325_rm2755562752_1956-17_2002.jpg \
--model-dir /opt/app/var/cnn-model-3/serving/<TIMESTAMP>
Predicted: M/46 Actual: M/46
Conclusion
In this tutorial, you learned how to build and train a multi-task network for predicting
a subject’s age and image. By using a shared architecture, both targets can be trained
and predicted simultaneously.
Next Steps:
Evaluate on Your Own Dataset
Try a different network architecture
Experiment with Different Hyperparameters
Questions/issues? Open an issue here on GitHub
Complete code here.
Call to Action
If you enjoyed this tutorial, follow and recommend!
Interested in learning more about Deep Learning / Machine Learning? Check out my
other tutorials:
- Building an image caption generator with Deep Learning in Tensorflow
- Building a Facial Recognition Pipeline with Deep Learning in Tensorflow
- Deep Learning CNN’s in Tensorflow with GPUs
- Deep Learning with Keras on Google Compute Engine
- Recommendation Systems with Apache Spark on Google Compute Engine
Other places you can find me:
Cole Murray (@_ColeMurray) | Twitter
The latest Tweets from Cole Murray (@_ColeMurray). Interests in:
Machine Learning, Big Data, Android, React/flux…
twitter.com
Machine Learning
Deep Learning
TensorFlow
Data Science
Technology
Download