当前位置:首页 >> >>

Neural Network Based Face Detection [Rowley et. al,_图文

Gradient Based Learning
Akash Kushal Computer Science

Neural Networks Review
Network of interconnected neuron units Computes a function of the inputs

Single Neuron

Sigmoid function

Neural Networks Review
Single Neuron Neural Network

Training ? Search for the parameters (weights and threshold) for all the units ? Use back-propagation algorithm (gradient descent)

In this talk…
? Convolutional Neural Networks

? structure and training ? application to single character recognition ? Graph Transformer Networks ? structure and training ? application to check amount recognition ? Other Applications ? face detection ? generic object recognition ? robot navigation
References: ? Gradient Based Learning Applied to Document Recognition [LeCun et. al, 1998] ? Neural Network Based Face Detection [Rowley et. al, 1998] ? Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting [LeCun et. al, 2004]

Convolutional Neural Networks

? Multilayer structure: Each layer consists of parallel planes of neurons ? Dense features: No interest points ? Whole system is trained “end-to-end” with a gradient based method

LeNet - 5

Convolution Layers: feature detectors Subsampling Layers: local feature pooling for invariance to small distortions. Weight Sharing ? Fewer parameters (60,000 free parameters, 400,000 connections) lesser training data ? Efficient when used to search for character over a large image region

Results: MNIST Dataset
MNIST ? 60,000 size normalized training set ? 10,000 size normalized test set ? 0.95% test error

Add distorted characters ? 60,000 + 540,000 = 600,000 ? 0.80% test error

Results: Invariance to distortion, noise

Recognizing a Character String
Heuristic over-segmentation ? partition ink conservatively into segments ? form a segmentation graph with edges marked with single blocks or consecutive blocks of ink Segmentation Graph

Recognizing a character string

Global Training for GTNs
? Viterbi Training
– Loss = penalty of best correct path – Collapse problem: fix RBF centers – Not a reliable measure of confidence: does not consider other low scoring paths

? Discriminative Viterbi Training
– Loss = penalty of best correct path – penalty of best path – Does not build margin

Global Training for GTNs
? Forward Training
– Assume that the penalty is the negative log likelihood. – Use the total likelihood for the label sequence (instead of max) – but, no normalization – Loss = - log (total likelihood) – Can be done efficiently – Again, does not penalize wrong low penalty paths

? Discriminative Forward Training
– Use the normalized negative log likelihood – Loss = -log (total normalized likelihood) – Larger gradient for important errors

Application: Check Reading System
Layers ? Field Graph: possible check amounts ? Segment the fields ? Recognize the segments ? Compose with grammar graph to form the interpretation graph ? Select best interpretation

SDNN: Space Displacement Neural Network

? Convolutional Neural Networks are invariant to shifts, noise and extraneous marks on the input ? Efficient as compared to sweeping a detector over the entire image ? Alternative to Heuristic Over Segmentation

SDNN: Recognition Examples

Application: Generic Object Recognition (LeCun et. al CVPR 04)
Task: Classify objects into 1 of 5 categories using stereo images (Animal, Human, Plane, Truck, Car)

NORB Dataset
? 50 toys from 5 categories ? 10 instances per category 5 training, 5 test ? 972 stereo pairs for each object instance ? 18 azimuths ? 9 elevations ? 6 illuminations

Application: Generic Object Recognition

90,857 parameters, 3,901,162 connections K-NN (K=1), 2x96x96 18.4% error SVM (Gaussian Kernel), 2x96x96 images 14.1% error Convolutional Net, 2x96x96 images 6.6% error

Face Detection (Rowley et. al, 98)
20 x 20 face images 1050 faces x 15 distortions Preprocessing ? hand-labeled parts used for alignment ? correct lighting + histogram equalization Training ? start with roughly 16K face examples, 1000 random non-face images ? add mistakes as learning progresses Testing (Detection) ? move NN-detector over the image ? decrease image size by 1.2 and repeat ? merge close-by detections

Face Detection (Rowley et. al, 98)
Arbitration among different networks ? train 2-4 separate neural networks with different initialization ? use simple ANDing / ORing / voting or a neural network Speed ? use a 30x30 window ? train on a dataset that allows the face to be off-center by 5 pixels ? detector can now be moved in steps of 10 pixels ? too many false positives: use the original network to verify ? 7.2 seconds for a 320x240 image (as opposed to 383 seconds) Results Test Dataset 1: 130 images, 507 faces Net 1, 2 copies of hidden units, Net 2, 3 copies Threshold (2,2) + AND(2) 86.2 % recognition, 1/3,613,009 false detects

Cool Application: Visual navigation for a mobile robot
? Robot with stereo cameras ? Convolutional Net is trained to emulate human driver ? Network maps stereo images to steering angles


学霸百科 | 新词新语

All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。