CS 385 Final Project

The goal of our project is to be able to identify traffic signs in real-time. To do this, we decided to employ a combination of several techniques. We decided that our detector would be built around the idea of a car dash-cam, meaning that signs would most likely always be in motion (seeing as you would be driving by them). We also decided that, while sign identification would be great, we did not have time to properly implement this feature. Therefore, we decided to focus purely on identifing a sign's location, and tracking it for the user/driver.

Our code essentially consists of two parts - an object detection portion, and a motion tracking portion. The object detection portion's job is to detect the sign in the first place. It is then the motion tracking portion's job to attempt to calculate/anticipate where the sign will be in the next frame, and therefore providing the most accurate tracking possible.

Approach/Algorithm

Sign Detection:

For the sign detection portion of our code, we originally were going to use the HOG (Histogram of Oriented Gradients) method. However, due to multiple problems getting VLFeat to work, as well as us finding that this method didn't work so well for how we were reading in video, we switched to using an ACF (Agregate Channel Features) object detector. ACF is a fast and effective sliding window detector. It is an evolution of the Viola & Jones (VJ) detector but with an ~1000 fold decrease in false positives (at the same detection rate). ACF is best suited for quasi-rigid object detection (e.g. faces, pedestrians, cars), which we found to be perfect for our sign-detecting needs.

To perform the ACF object detection, we used MatLab's built-in function 'trainACFObjectDetector.' We pass in the training data that consists of file-names and bounding-boxes of multiple images in the form of a .mat file. For this program, we used the LISA (Laboratory for Intelligent and Safe Automobiles) Traffic Sign Data Set to generate our .mat file. This data-set contains over 47 US sign types, as well as 7,855 annotations on 6,610 frames. Upon training our detector with this data-set, we then procede to check every frame of the video for signs.


% Loading video and training ACF detector
videoFile       = 'video.mp4';
scaleDataFile   = 'pedScaleTable.mat';
load('signData.mat');
stopSigns = signData(1:250,1:2);

stopSigns.path = fullfile(toolboxdir('vision'),...
	'visiondata',stopSigns.path);acfDetector = trainACFObjectDetector(stopSigns,'NumStages',3);

obj = setupSystemObjects(videoFile, scaleDataFile);

detector = acfDetector;

Some sample test images from the LISA data-set:

Motion Tracking:

For the motion tracking portion of our code, every time a street sign is detected a struct is created for it. This struct is called a 'track,' and it represents the moving sign object. This 'track' contains a unique id, the bounding box around the object, its classification score, its age (number of frames since the sign was detected), the total visible count (number of frames where the object was visible), confidence rating, predicted position (where the program predicts where the next bounding box will be), and a Kalman Filter oject. The Kalman Filter object is one of the most important parts of the tracking/prediction process, as it is an optimal estimation algorithm that combines uncertain information to provide an educated guess on what that object is (or in our case, where an object is going). It also is fantastic for continuously changing systems, which is why it is so useful for real-time detection.

In each frame of the video, a function is called to display the tracking results. This function loops through the array of track and displays them in the form of bounding boxes superimposed on each frame. There are other functions that are called in the process that update the arrays of tracks based on certain global thresholds, such as age, confidence, visibility, and estimated error tolerance. To come up with these global parameters, we experimented until we were satisfied with the detector's accuracy.

 
 % Code for updating the tracking
 function updateAssignedTracks()
 numAssignedTracks = size(assignments, 1);
 for i = 1:numAssignedTracks
	 trackIdx = assignments(i, 1);
	 detectionIdx = assignments(i, 2);

	 centroid = centroids(detectionIdx, :);
	 bbox = bboxes(detectionIdx, :);
	 
	 % Correct the estimate of the object's location
	 % using the new detection.
	 correct(tracks(trackIdx).kalmanFilter, centroid);
	 
	 % Stabilize the bounding box by taking the average of the size 
	 % of recent (up to) 4 boxes on the track. 
	 T = min(size(tracks(trackIdx).bboxes,1), 4);
	 w = mean([tracks(trackIdx).bboxes(end-T+1:end, 3); bbox(3)]);
	 h = mean([tracks(trackIdx).bboxes(end-T+1:end, 4); bbox(4)]);
	 tracks(trackIdx).bboxes(end+1, :) = [centroid - [w, h]/2, w, h];
	 
	 % Update track's age.
	 tracks(trackIdx).age = tracks(trackIdx).age + 1;
	 
	 % Update track's score history
	 tracks(trackIdx).scores = [tracks(trackIdx).scores; scores(detectionIdx)];
	 
	 % Update visibility.
	 tracks(trackIdx).totalVisibleCount = ...
		 tracks(trackIdx).totalVisibleCount + 1;
	 
	 % Adjust track confidence score based on the maximum detection
	 % score in the past 'timeWindowSize' frames.
	 T = min(option.timeWindowSize, length(tracks(trackIdx).scores));
	 score = tracks(trackIdx).scores(end-T+1:end);
	 tracks(trackIdx).confidence = [max(score), mean(score)];
 end
end

Results

Sign Detection:

When run, our program did a relatively decent job at detecting the signs in the video it was given. While we were unable to get it to identify each sign individually (due to both time-constraints and multiple crippling errors), we did get it to recognize most types of signs, and detect them with fair accuracy. There were some issues when signs were distorted (be it due to distortion or bad lighting), but over-all it's detection method seemed to yield relatively robust results.

Motion Tracking:

In terms of motion-tracking, our program did a pretty decent job as well. While it did have trouble detecting signs that were not moving (since this is supposed to be from a car dash-cam and it expect them to be moving by the window), it was good a tracking signs in motion and approximating where they would be from frame to frame. While we found we could fix this to some extent by decreasing our global age and confidence thresholds, we determined that this was not worth it due to the large number of false-positives that it consequently produced.

Positive results after running the code:

Negative results (primarily from bad age/confidence values):

Conclusions

We encountered quite a few problems throughout this project. From VLFeat causing us major headaches in the beginning, to having trouble formating the training data from the LISA data-set into a .mat file, it was a challenge from beginning to end. However, in the process we had the chance to really explore and discover a lot both about MatLab and in object detection in general. When something broke, we had the opportunity to either fix it, or to try another method. Often, we would be doing both simultaneously. What this yielded was a very rich learning experience that gave us practice in areas of detection that we didn't even originally consider (such as our methods for tracking moving objects). We also got experience working with processing video and working in a real-time enviroment, things we did not originally even consider when starting.

Given more time, we would really have liked to expand this program to also perform sign identification. It would have really rounded-out the program and given it more application. However, given the obstacles we had to overcome, we were pleased with the end result and may even come back to this project in the future with the intent of taking it further.

Sources: https://pdollar.github.io/toolbox/detector/acfReadme.html; http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6996284; http://cvrr.ucsd.edu/LISA/lisa-traffic-sign-dataset.html; http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/

Sign-Language (John Eggers, Sean Moloney, Luke Rosenberg)

CS 385 : Real-Time Sign Detection

Introduction

Approach/Algorithm

Sign Detection:

Motion Tracking:

Results

Sign Detection:

Motion Tracking:

Conclusions