Binary Classifier for Melanoma Using MobileNet

Source:NHGRI news feature.

I recently read an article about a team of researchers (led by Dr. Andre Esteva) tuning a Deep Learning model(InceptionV3) to detect melanoma, showing promising results toward automated medical diagnostic application. CDC reports that in the year 2014, there were 76,665 Americans diagnosed with melanoma resulting in 9,324 deaths. Currently, dermatologists can recognize advanced melanoma using standard criteria such as Asymmetry, Border irregularity, Color variation, Diameter and Evolving shape (aka, ABCDE) [1]. In case if you are wondering how well your eyes can detect malignant lesion, here's a sample collage of pictures (I will admit that when I tried, it was as good as tossing a coin).

January 10th, 2018 - 10 minute read -
Python, Image Classification, MobilNet, DeepLearning
skin lesions
Fig.1. Can you guess which ones are malignant and which ones are not? Answers are at the end of this post. (Source:

UPDATE: After playing around with MobileNet and DenseNet, I've come to the conclusion that DenseNet had better and more robust performance in terms of log-loss, AUC(90% val/92% test), and F1 score(86% val/88% test). I updated my github repo with the codes for training a 3-class classification model. The final metrics were calculated only based on the melanoma class as if it's a binary classifier.

The largest public collection of dermoscopic images of skin lesions is maintained by International Skin Imaging Collaboration (ISIC). The images were collected from established clinical centers worldwide, and they were captured by various devices within each center. The goal of ISIC's collaboration effort can be summed up in this statement:

"As inexpensive consumer dermatoscope attachments for smart phones are beginning to reach the market, the opportunity for automated dermoscopic assessment algorithms to positively influence patient care increases."

MobileNet - one channel at a time

With that in mind, I want to train a lightweight deep neural net called MobileNet with dermascopic images, and see how it performs as a binary classifier (see complete training code). What makes MobileNet special is its small and elegant architecture optimized for mobile devices, while not compromising too much in accuracy as shown in its ImageNet performance . Instead of using regular convolution layer to learn image features(i.e., shape, area), it uses depthwise convolution followed by point-wise convolution to dramatically reduce computational cost (Fig.2). MobileNet is loaded with 4.3M parameters, compare to InceptionV3 which has 23.9M parameters.

illustration of depthwise conv
model component
Fig.2. Top drawing illustrates the concept of depthwise + pointwise convolution. Bottom schematic shows the building block unit of the MobileNet, which consists of repeating depthwise separable convolutions and full-connected layers at the end.

A regular convolution both filters and combines inputs into a feature map in one step. In contrast, depthwise separable convolution performs convolution on each channel separately first, effectively filters and selects features for the channel, then applies pointwise (1x1) convolution to merge the features to generate a new feature map. According to the Mobilenet paper [2], for a 3×3 kernel depthwise separable convolution is about 9X faster than regular convolution operation. MobileNet has 28 layers, most of which are repeats of depthwise separable convolution, and 95% of the total computation is spent on the 1x1 pointwise convolution[2].

Data Processing

The dermoscopic images had many distracting artifacts (e.g, vignette borders, measurement scale, hair, etc) that can possilby interfere with the model learning. Instead of writing a code for each type of artifacts to clean up the image, I realize it's more efficient to apply binary segmentation mask to select for the general lesion area. For segmentation, I used the SegNet autoencoder model where it achieved a Dice coefficient (metric of overlap) of 78% on validation data.

Another issue is that the malignant category was underpresented (20%), and it's difficult to train a deep neural net model with fewer than 1,000 pictures per class(yes, I did try). I used augmentation (rotation and translation) to produce copies of the original malignant images to mitigate the under-representation. No other augmentation methods were used at this stage.

Transfer the learning with Bottleneck

There's a wonderful tutorial by the creator of Keras illustrating transfer learning for deep neural net. What it means is that you take advantage of fine-tuned model trained on large datasets, and start training new data with the pre-trained weights to save time and computational cost. One common way to do that is using 'bottleneck' features - output of the 'bottom' part of a deep learning model, essentially it's all the convolutional layers (everything up to the fully-connected layers ). I saved the bottleneck features as numpy array, then train a customized 'top' part of the model for a binary classifier (see code below). Training the top part separately ensures that the top model weight provides a consistent and smooth training when it's attached to the bottom model later on.

base_model = MobileNet(input_shape=(192,192,3), include_top=False,weights='imagenet')

def Top_model(input_dim):
    top_model = Sequential()
    top_model.add(Dense(512, activation='relu'))
    top_model.add(Dense(512, activation='relu'))
    top_model.add(Dense(1, activation='sigmoid'))
    return top_model

model2.compile(optimizer=RMSprop(lr=0.0001), loss='binary_crossentropy', metrics=['accuracy',f1]), train_labels,
              epochs=5, batch_size=32,verbose=1,
              validation_data=(validation_data, validation_labels))

################# Use functional API to  create the full model#############
top_model2 = Top_model(base_model.output_shape[1:])
model = Model(input=base_model.input, output=top_model2(base_model.output))

for layer in model.layers[:4]:
              optimizer=SGD(lr=1e-3, momentum=0.9),

After training the top model, I augmented the image files with ImageDataGenerator() along with the default preprocessing function to scale the pixel values. I chose Stochastic Gradient Descend as the optimizer because that was the optimizer used on the original MobileNet.

train_datagen = ImageDataGenerator(


train_generator = train_datagen.flow_from_directory(
            target_size=(ROWS, COLS),

validation_generator = validation_datagen.flow_from_directory(
            target_size=(ROWS, COLS),

from sklearn.utils import class_weight
class_weight = class_weight.compute_class_weight('balanced', np.unique(train_labels), train_labels)

early_stopping =EarlyStopping(monitor='val_loss', patience=4)
model_checkpoint = ModelCheckpoint('training_weights_mobilenet_bottlneck.h5',
                                    monitor='val_loss', save_best_only=True,save_weights_only=True)

                           callbacks=[early_stopping, model_checkpoint],

Binary cross entropy was a loss function used to optimize the model(the lower the score the better), it takes into account confidence of the prediction along with accuracy. The validation binary cross entropy was 0.43, F1 score (balanced accuracy accounting for imbalanced class distribution) was 86%, and AUC for ROC was 0.89 (1 being the best).

ROC of validation
Fig.3. ROC of the validation dataset showing the trade-off between sensitivity (True Positive) and specificity (False Positive).

While the AUC score was lower than the ones presented by Esteve et al[4], I should point out that this MobileNet model was trained on a smaller training dataset provided by ISIC 2017 Challenge. I expect that better performance (>90% balanced accuracy) can be achieved with a larger training set, as well as bigger image array. For my own future reference, here are some things to keep in mind when training a deep learning model for image classification:

  • Pay attention to preprocessing step for a given deep neural net model. It's NOT always mean centered or standard deviation normalized to 1.
  • Sanity check: make sure the images in trainnig and test sets have more or less the same distribution.
  • Explore augmentation options such as contrasting or adaptive histogram equilization. It is a handy tool especially when poor lighting or resolution of the digital image becomes an issue (Note to self: need to explore this further).
  • Obviously there's room for improvement. For example,the number of filters in each layer and dropout rate can be tuned.

Note: For Fig.1, the center 4 images are melanoma, and the rest are benign.


[1]Herman, C. Emerging Technologies for the Detection of Melanoma: Achieving Better Outcomes. Clinical, Cosmetic and Investigational Dermatology 5 (2012): 195–212.

[2]MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications(

[3]Keras Documentation:

[4]Esteva, A.,Kuprel, B., et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(2017): 115–118.

Chalk is a high quality, completely customizable, performant and 100% free blog template for Jekyll built by Nielsen Ramon. Download it here.