Classifying images in the Oxford 102 flower dataset with CNNs
I’ve added some code on GitHUb for training deep convolutional neural networks to classify images in the Oxford 102 category flower dataset. This is using the lovely Caffe. The prototxt files for fine-tuning AlexNet and VGG_S models are included and use initial weights from training on the ILSVRC 2012 (ImageNet) data.
You can get the code via:
git clone https://github.com/jimgoo/caffe-oxford102
To download the Oxford 102 dataset, prepare Caffe image files, and download pre-trained model weights for AlexNet and VGG_S, run
This will give you some pretty flower pictures:
The categories are split into training, testing, and validation sets. It seems odd that there are more testing images than training images. We’ll first train on the provided train set and test on the test set nonetheless.
This model is a slightly modified version of the ILSVR 2012 winning AlexNet. The number of outputs in the inner product layer has been set to 102 to reflect the number of flower categories. Hyperparameter choices in
AlexNet/solver.prototxt reflect those in Fine-tuning CaffeNet for Style Recognition on “Flickr Style” Data. The global learning rate is reduced while the learning rate for the final fully connected layer is increased relative to the other layers.
Once you’ve run the
bootstrap.py script, you can begin training from this directory with:
cd AlexNet $CAFFE_HOME/build/tools/caffe train -solver solver.prototxt -weights pretrained-weights.caffemodel -gpu 0
After 14,000 iterations, the test accuracy is 80%:
I0918 00:02:19.772845 67440 solver.cpp:266] Iteration 14000, Testing net (#0) I0918 00:02:45.828433 67440 solver.cpp:315] Test net output #0: accuracy = 0.8 I0918 00:02:45.978008 67440 solver.cpp:189] Iteration 14000, loss = 0.000117275
cd $CAFFE_HOME ./scripts/download_model_from_gist.sh 0179e52305ca768a601f <dirname>
This is another popular CNN from the University of Oxford Visual Geometry Group (VGG). On ILSVRC 2012, it has a top-5 error rate of 13.1% compared to 15.3% for AlexNet.
Getting the prototxt file setup for training took a little more work because only the
deploy.prototxt file was provided. I added the same learning rate multipliers for each layer as the AlexNet one and the same weight initialization schemes, although the latter was redundant when starting with pre-trained weights. The same random cropping and mirroring are also used.
cd VGG_S $CAFFE_HOME/build/tools/caffe train -solver solver.prototxt -weights pretrained-weights.caffemodel -gpu 1
After 14,000 iterations, this model does a little better with test accuracy of 82%:
I0918 03:49:00.571482 68176 solver.cpp:266] Iteration 14000, Testing net (#0) I0918 03:49:59.285096 68176 solver.cpp:315] Test net output #0: accuracy = 0.824516 I0918 03:49:59.753748 68176 solver.cpp:189] Iteration 14000, loss = 0.000275362
AlexNet uses a crop size of 227 x 227, while VGG_S uses 224 x 224, so it’s not an exact comparison. Accuracy on the test set evolves as follows:
I did the above training at the same time on two GPUs and monitored GPU usage as in my last post with Keras. AlexNet was on GPU 1 and VGG was on GPU 2. Notice how the GPU utilization is always peaked out, excluding dips during test time:
I’ve yet to get my Python implementations of these models to be as efficient.
Caffe on AWS
Installing Caffe and the latest CuDNN libraries is no trivial matter. Luckily there is an Amazon EC2 instance ready to go with Caffe, CUDA 7, and cuDNN (ami-763a311e). It’ll work with with g2.2xlarge (1 x K520) and g2.8xlarge (4 x K520) GPU instances. In this case,
CAFFE_HOME = /home/ubuntu/caffe.
The class labels for each species were deduced by Github user m-co and can be found in the file
class-labels.py. They are in order from class 1 to class 102 as used in the mat files.
These were run using the mean image for ILSVRC 2012 instead of the mean for the actual Oxford dataset. This was more out of laziness that anything else.