CS 385 : TextonBoost - Parking Lots

Example of a right floating element.

Introduction

TextonBoost is an approach to multi-class classification that attempts to segment photographs into various classes. This model seeks to use texture-layout filters, or features based on textons.
Classification and feature selection is achieved using boosting to give an efficent classifier which can be applied to a large number of classes.


Approach/Algorithm

A simplified explanation of TextonBoost is:

Learn a boosting classifier based on relative texture locations for each class. The classification is then refined by combining various weak classifiers and then combining them to construct a classifier that has a higher accuracy than any of the individual classifiers would produce on their own.

Pre-existing Algorithm

    Given an image and for each pixel:
  1. Texture-Layout features are calculated
  2. A boosting classifier gives the probability of the pixel belonging to each class
  3. The model combines the boosting output with shape, location, color and edge information
  4. Pixel receives final label.
  5. Image receives final label.

Extensions to existing TextonBoost framework

Tweaking the Parameters


Results

  1. Problems with Input Images
    • All of the Input Images in the PKLot data set span a large parking lot. Since the input images are not taken from the zenith of the center of the parking lot, the resulting images will result in objects of varying size because parking spaces at one of end of the lot can be much smaller than spaces at another end of the lot. The objects (e.g. cars, unoccupied spots, etc.) in the foreground will be much larger and more defined than the objects in the background. To account for this problem we cropped the original images to capture a small location, where the object's have less varying sizes.
  2. Problems with Ground Truth Images
    • In a perfect world the Ground Truth Images should have every pixel colored as either void, car, or unoccupied. Unfortunately because the PKLot metadata associated with the images did not describe complex polygons that exactly match the contours of a car in a parking spot. Instead the PKLot data set simply provided rectangles that approximately cover a parking spot. This metadata causes issues because there is extraneous information used to describe a car. This means that if, for example, there is a crowded parking lot in the training set and we have a given spot as empty, the model will train on ground truth images that have shadows from adjacent cars and think that these shadows characterize an empty spot.
    • In addition, the Ground Truth metadata provided by PKLot also has a small subset of polygons that would overlap with each other. This will cause issues in the training phase because pixels associated with a given class can be classified inaccurately. For example, consider an empty parking spot with two cars on each side of it. If the Ground Truth image for an occupied spot covers a portion of the two cars, unwanted textures will be introduced to describe the unoccupied spot's class category.

Our new color classes

Output Images

First Pipeline with only SIFT features
First pipeline with both SIFT and filter bank features
UIUC dataset with only filter bank features
UIUC dataset with only SIFT features
PKLot dataset trained with segmented images with only SIFT features
PKLot dataset trained with segmented images with only filter bank features
UIUC dataset trained with segmented images with both filter bank and SIFT features and 5 clusters in k-means clustering
UIUC dataset trained with segmented images with both filter bank and SIFT features and 10 clusters in k-means clustering

Some figures

Graph plotting False Negative vs. True Positive in varying outputs for unoccupied parking spots
Graph plotting False Negative vs. True Positive in varying outputs for unoccupied parking spots
RPC curve for unoccupied classification when varying number of clusters
Legend:
Point 10: Number of Clusters = 10.
Point 3: Number of Clusters = 3.
Point 5: Number of Clusters = 5.
Point 20: Number of Clusters = 20.
RPC curve for occupied classification when varying number of clusters
Legend:
Point 10: Number of Clusters = 10.
Point 3: Number of Clusters = 3.
Point 5: Number of Clusters = 5.
Point 20: Number of Clusters = 20.

Discussion and Conclusions

In hindsight, TextonBoost was limited by our data set, the code base, and the similarity of the classes.
  • The codebase seemed to use k-means clustering at an odd time. We posited that k-means clustering should be used on all of the data at one time. Instead, the codebase seemed to use k-means clustering on each image independent of the other images.
  • Understanding all the various pieces that made the code run, from textonization to training of the model, all involved multiple files and assumed the user of the framework had an intimate understanding of the approach.
  • Introducing dense SIFT was a challenge because the filter-bank used initially on each pixel worked on all pixels. Unfortunately, dense SIFT cannot operate on pixels too close to a boundary. Our team overcame this challenge by adjusting the size that the filter-bank dealt with to the same dimensions as the dense-SIFT portion of the image.
  • Trying to find a good data set was difficult. While testing the functionality of our code we came across many problems caused by input images. With the PKLot data set one of the most obvious problems we noticed was that objects' size varied based on location within the image. In order to get input images with objects of relatively similar size we cropped a small area out of the original images in the PKLot data set. Cropping the original images also had the benefit of speeding up the code because there would be less pixels to train on.
  • We also tried cropping individual objects of interest out of the PKLot data set images. Cropping individual images provided an additional data set, where we could train on either cars or empty parking spots. The difference between these segmented images and the original images with an entire parking lot in view is that the original images included other objects like grass, roads, & people that could interfere with training. The result of these segmented images provided input images with less noise.


References

MSRC framework from Microsoft
Research paper from Microsoft
TextonBoost framework in Matlab documentation