A couple of days ago, I wrote an article showing you how to train your own Haar classifiers for use with OpenCV. Since I posted that article, I have received several emails asking what a Haar classifier is and why they’re needed for OpenCV. Apparently, I didn’t do a good job of explaining what those classifiers are or what they do. I guess I just took it for granted that if you’re working with OpenCV, you should already know this. But, that wasn’t the case, especially for readers that have begun playing with OpenCV only because of the articles I’ve posted on this site. So, I want to take a minute to explain what a Haar cascade is and what it does in hopes that OpenCV developers will understand why they’re so important.
Haar cascades are XML files called classifiers that tell the computer what an object “looks” like. For example, the “haarcascade_frontalface_alt2.xml” file that comes with OpenCV tells the computer what a face “looks” like from the front. Since a computer doesn’t “see” objects the way humans do, it relies on coordinates instead. These coordinates are generated from several image samples being shown to OpenCV from several different angles and in different lighting. When each of these sample images are presented to OpenCV, the haar trainer measures the contrast between each of these images until it “guesses” at what an object “looks” like.
To help improve accuracy, OpenCV needs input from a human to tell it where the object of interest is located within each of the sample images. That’s where the cropping app comes into play. It works by having a human draw a box around the region of interest (ROI) and then writes the coordinates of the ROI to a text file. This file, along with the images, are then passed thru the haartraining.exe app where OpenCV analyzes the input and outputs your haarcascade.xml file.
The trainer requires 2 image sets for better processing. The first set is your “postive” set. Every image in this group contains the object you want OpenCV to detect. The second set is your “negative” set. Every image in this group does not contain the object you want OpenCV to detect. It’s recommended that you use at least 1-2,000 images in each set. The more images you train your classifier on, the higher your detection accuracy will be. However, by adding more images to your training sets, the trainer will take longer and longer to complete. As mentioned in the haar trainer article, it took about 5 days for the training to complete when working with 2000 images in both the positive and negative image sets. Gathering all of your training images can also take a while to complete. But, I’m in the process of creating an app that will greatly speed up this process.
So, if you want OpenCV to detect your own objects, you will need to go thru the whole Haar classifier training process and build a haar cascade XML file that contains the information about your object. Once you learn how to do this, the possibilities of what OpenCV can detect are endless. For example, a few months ago, I helped one of my readers with an automatic pet door that used OpenCV. The idea was to build a doggy-door that would open only for his dog. To do that, he recorded several videos of his dog from different angles (side view mostly) and in different lighting. Then, he placed a sticker just above where the doggy-door was installed. With a little bit of work, he taught his dog to understand that every time he pressed the sticker the door would open. Now, he could’ve easily put a button there that opened the door. But, this would’ve also open the door for a lot of security issues. Any burglar could gain access to his house by simply pressing the button and reaching thru the doggy-door to unlock the deadbolt above.
Now, I’m sure you’re probably asking why did he add the sticker if it didn’t really do anything? Well, the purpose behind the sticker was to get the dog to pose the same way every time. Just to the left of the door was a floor-to-ceiling window where he positioned his camera to take snapshots from a side perspective. To get the best performance out of OpenCV, he needed the dog to approach the door in the same pose every time. So, when the dog sniffed the sticker (or pushed it with his nose as he thought he was doing), the dog would be in the perfect position for the camera to see and OpenCV could work its magic. Once OpenCV recognized that the dog in the video feed was in fact the dog it had been trained to watch for, his computer sent a signal to an arduino which was hooked to a servo that opened the door. Since OpenCV never knew what his dog “looked” like before, he had to teach it and he did that by training his very own Haar classifier.
A while back, I also helped a reader create an app that tracked product placement within television shows and commercials. To do that, we created Haar classifiers for several different products. Then, we positioned a camera in front of a TV and waited for the products to appear. Once a specific product appeared either in a show or commercial, the image was saved to the filesystem and was named using the timestamp from when the object was detected.
To test the idea for this application, we began by creating cascades of the Facebook logo. Since pretty much every commercial you see on TV these days includes the “follow us on Twitter or friend us on Facebook” messages and logos, we thought this would be a great place to start. For this test, the trainer only ran for about a day and a half, but the results were amazing! As soon as we placed a camera in front of the TV, OpenCV almost instantly began detecting the Facebook logo in commercials and the images started rolling in. After we did the proof-of-concept, it was only a matter of creating cascades for the other things the reader wanted to detect.
As you can see, there are all kinds of cool things you can do once you know how to detect your own objects using OpenCV. So, if you have an interest in using OpenCV to detect your own objects, be sure to checkout my article “How to Train OpenCV Haar Classifiers“. The app I provide in that article is written in C# and depends on OpenCvSharp which is included in the bin\Debug folder. When time permits, I plan on adding a few more articles that will teach you some other parameters you can use to customize the way your trainer works. Until then, HAPPY CODING!
Related Posts
10 Responses to Detect Your Own Objects with OpenCV and C#
Leave a Reply
You must be logged in to post a comment.

Mr.Lucus =) how are you gentle man, i hope every thing is going great with you =), sorry for being late in replying but i was getting through a long weeks of exams, end of term exams =) however, i have seen your magnificent program about hand detect, its amazing and full of functions, helped me so much, thank you, as ya know i’m working on a hand gesture recognition project, so i had a thought about making a region of interest box that apears on my hand once i exposure it to the camera then a window of the hand only appears to identify the hand movements including the gestures and postures of the hand meanwhile the classifier classifies the gesture from the window and then display the letter that gesture indicates from the American sign language, we can now just concentrate on the region of interest and how can i open it up on the hand when it apears on the screen, thank you in advance mr lucus i know that i’m kinda disturbing you =))
You’re not disturbing me. That’s what I’m here for. I created this site to help others learn the same things that I’ve learned over the years. It’s people like you that will be creating our technologies of tomorrow. I’m just passing the torch.
Once you’ve detected the hand, it’s very easy to display that instead of the original image. To do that, you will need to detect the hands like you normally do. Once you’ve located your hands, you’ll need to set the ROI of your image to be the size and location of the rectangle that contains the hand you want to track. Here’s a quick code snippet of how to do all of that.
// detect your handsCvSeq hands = ...
// set the ROI of "img" to the
// rectangle that contains your hand
img.setROI(hands[0].Value.Rect);
// show the img with the new ROI
w.Image = img;
// reset the img ROI
Cv.ResetImageROI(img);
If you’ve detected more than 1 hand, you can switch between hands by changing the index of “hands[0]” to “hands[1]” and so on. You also need to make sure to reset the ROI of your img before repeating your while-loop. Otherwise, your app will focus on that particular area in your camera feed and will not recognize the hand once it moves outside of that area.
wo0w wo0w slow down hot shott i can’t keep up with you your kinda ages ahead of me =D i’v seen that CvSeqbefore and CvPoint too what does it do ?
CvPoint is any location on an image denoted by an X & Y coordinate. For example, CvPoint(0, 0) would refer to the top left corner of an image. CvPoint(10, 10) would refer to 10 pixels from the left and 10 pixels down from the top. However, there are ways to set your origin (0,0) to be at the bottom left if desired. CvSeq is a collection, or better remembered as sort of a list of objects for OpenCV. In this case, if we were using the Cv.HaarDetectObjects method to locate our hands, our code would look something like:
CvSeq<CvAvgComp> hands = Cv.HaarDetectObjects(small_img, hand_cascade, storage, 2.5, 1, 0, new CvSize(30, 30));… or something along those lines.
you mean than i should detect the hand first by Cv.HaarDetectObjects then setting the points of the region of interest rectangle by something like this Cv.Rectangle(img, Cv.Point(minloc.X, minloc.Y), Cv.Point(minloc.X + tpl.Width, minloc.Y + tpl.Height), CvColor.Red, 1, 0, 0) <– this line is from a program of your's template matching and i think this line sets and creates the rectangle of the region of interest, how can i detect the hand by Cv.HaarDetectObjects?
No. Since the hand will almost always be in a different pose and at a different angle, it’s better to detect the hand using my hand tracking code. Once you have the contours of your hand, you can get the outer most points and draw your ROI rectangle from those outer most points.
CvSeq contours = FindContours(imgFlesh, storage); <— thats how i can get the contours of the hand to track the hand ?, how can i get the most outer points and draw the rectangle from those outer most points ?
http://www.youtube.com/watch?v=bUrjFGMfwas <– i need to do something exactly like this (Y) , I've seen your program HandDetect and its amazing i can tell, but i can't understand how to use it on a video feed to detect the contours then get the outer most points to draw the ROI rectangle, thanks for being here always, you are my savior !!
i need to detect shape present in an x-ray image for my project.by detecting shape i mean find out whether there is an arm ,skull or chest present in the x-ray image.Can you please suggest me some approach or way i can do this.I thought of edge based template matching i.e detecting the edges of object in template and matching them in current image…but i am not sure about it…
please i request you to suggest the technique
There are 2 ways you can do that.
The first way is to create your own Haar classifiers as shown in this article: http://www.prodigyproductionsllc.com/articles/programming/how-to-train-opencv-haar-classifiers/. You will need to create a classifier for each object you want to detect. Then, you can use the same code from my head tracking article (http://www.prodigyproductionsllc.com/articles/programming/opencv-head-tracking-with-c/) to actually detect the object from your trained classifiers. However, if using the code from the head tracking article, you will have to add code to loop thru each of the classifiers you want to search for objects from.
The second way is to use blob detection as demonstrated in my hand tracking demo: http://www.prodigyproductionsllc.com/downloads/HandDetect.zip.
Since you will be working with x-rays, you won’t need to worry so much about doing detection in real-time like you would when working with live video feeds. Having a program that takes a second or two to detect your objects should work. So, I would probably recommend using the first technique. The hardest part about using that technique though is that creating each of your classifiers will take some time. But, in the end, it should be the most accurate.