Traffic signs are the first element the network would detect. For this task, we need a big dataset containing images with annotated traffic signs. As we saw in the third article 3º- Datsets for Traffic Signs detection, we will start by using the German Traffic Signs Detection Benchmark (GTSDB).
We will make use of Darknet, so be sure to have it installed with CUDA and OpenCV before following this tutorial. To do so you have everything explained in my last tutorial: 4º- How to install YOLO Darknet with CUDA and OpenCV in Ubuntu.
The aim of the detection is to detect and classify 5 different traffic signs categories: prohibitory, danger, mandatory, stop and yield.
German Traffic Signs Detection Benchmark
The features of this first dataset are:
- Size: 900 images (divided into 600 training images and 300 evaluation images).
- Classes: It contains 42 different categories (speed limit 20, speed limit 30, yield, bend left, stop…). However, there are 3 main categories: prohibitory, danger and mandatory.
- Format: Images are stored in PPM format.
In order to be able to provide the GTSDB as input for darknet, a python program was developed in order to parse the annotation in the correct darknet format, convert the pp2 images to jpg files and group them by training set and testing set. You can see the program in SaferAuto GitHub repository.
In respect of the training-testing proportion, the python program has an option of choosing the percentage of images that will be used for training and the ones for testing.
As a result of the program, we will have four main components:
- training-set.txt: Text file that contains the path to all the images that will be used for training our model.
- testing-set.txt: Text file that contains the path to all the images that will be used for testing or validating our model.
- output-img-train/: This folder contains all the output images that will be used for training our model. As stated in the YOLO website, for each one of this files there has to be a text file with the classes of the traffic signs that are in the image.
- output-img-test/: This folder contains all the output images that will be used for testing or validating our model.
In the same folder of these four components, we will need to create a .names file, representing one different object name on each line. In our case, this file will contain:
For training custom objects in darknet, we must have a configuration file with the layers specification of our net. To follow the YOLO layer specification, we will use the YOLOv3-spp configuration file, because, as we can see in the next picture, it has a great mAP at .5 IOU metric. From now on we will refer to this file as yolov3-spp.cfg.
However, in this yolov3-spp.cfg file, we need to update some parameters:
- Change line batch to batch=64
- Change line subdivisions to subdivisions=8. If you get an error while training try subdivisions=16, 32 or 64. In my case it only worked with 64 due to my GPU.
- In each of the [yolo] layers, change line classes=80 to your number of objects. In this case, as we want to detect prohibitory, danger, mandatory, stop and yield, we will need only 5 classes.
- Besides, before each of this [yolo] layers (not anywhere else), there is a previous convolutional layer. Here, we will need to change filters=255 to filters=(classes + 5) x 3. As we want to detect 5 different classes, we will set filters to filters=30.
Finally, we only need to create one more file, it will specify the important paths that our network has to access. We will call it gtsdb.data and it will contain: [Note: You have to change gtsdb-path]
classes = 5 train = [gtsdb-path]/train.txt valid = [gtsdb-path]/test.txt names = [gtsdb-path]/gtsdb.names backup = backup/
In my repository, you have the yolov3-spp.cfg and gtsdb.data files. I created them on cfg/gtsdb/ folder.
Training GTSDB dataset
Now that we have all the necessary files for starting the training of our YOLO model, we need to open a terminal and go to the darknet root path. Besides, we will use pre-trained weights for the convolutional layers, you can download them here. You can copy them to the weights folder.
Once this is done, we have to write:
./darknet detector train -map cfg/gtsdb/gtsdb.data cfg/gtsdb/sl-yolov3-spp.cfg weights/darknet53.conv.74
The first results with this configuration were not very bad, the net achieved a mean average precision of 42%. However, after 1500 iterations, it started to decrease so I decided to stop the training.
To understand the mAP and average loss metrics I recommend this article.
The next thing I tried was to set the input images to black and white instead of color, but unfortunately the results were worse.
I tried to change the number of classes and instead of grouping them in 5 categories, I tried to use the original 43 different classes. Here, the accuracy was very low.
Besides, I tried to classify only the speed limit traffic signs, using the first 8 categories. However, the net struggled to differentiate them, achieving a maximum mAP of 14%.
As we can see, the best configuration for GTSDB dataset was to use 5 different categories. However, the net only achieved a 42% of mAP and lot of false negatives. The bottleneck here is the low amount of input data, so I decided to look for another more complete dataset containing European traffic signs. The most complete I found was the Belgium Traffic Sign Dataset.
Belgium Traffic Sign Dataset
The features of this the dataset are:
- Size: 13.480 images (divided into 8851 training images and 4629 evaluation images). Supplementary are 16.045 background images.
- Classes: It contains 62 different traffic sign classes. However, there are 13 main categories: undefined, other, triangles, red-circles, blue-circles, red-blue-circles, diamonds, yield, stop, forbidden, squares, rectangles-up, rectangles-down.
- Format: Images are stored in JP2 format.
The first task for processing this dataset was to convert the JP2 files to JPG format. In order to do that, I tried to do a script in python with pgmagick or to use convert linux command of ImageMagick, it turned out any of those method worked after trying for a long period of time. Besides, I tried some online APIs but for using all of them you had to pay. Finally, the way I converted them was to use IrfanView on Windows.
After that, I developed a python program to parse the BTSDB annotations and convert them to the darknet format. You can see it in SaferAuto GitHub repository.
Apart from that, I had to adapt the configuration, names and data files, as we saw in the previous dataset. I started by classifying 8 categories: prohibitory, danger, mandatory, red-blue-circles, diamonds, yield, stop and other. The result of this training achieved a mAP of 45%, the highest one achieved for the moment.
The next configuration was to use the same 5 classes as in the GTSDB: prohibitory, danger, mandatory, yield and stop. That way in the future we could merge the datasets. The result of this configuration was…
[work in progress] Don’t forget to follow me on Twitter to stay updated!