Computer Vision
object detection
Haar Cascade
https://medium.com/data-science/face-detection-with-haar-cascade-727f68dafd08
Haar Feature

then do sliding window

Integral Image(Summed Area Table)
like prefix sum but 2D to speed up calculate sum of rectangle area

Cascade Classifier

AdaBoost
find the feature which best separate positive and negative samples

HOG + SVM
https://medium.com/lifes-a-struggle/hog-svm-c2fb01304c0
Histogram of Oriented Gradients (HOG)
skip pool and normalization part
Support Vector Machine (SVM)
- after Pooling and Normalization HOG feature we can see train SVM by those feature
- SVM try to find the hyperplane that best separate positive and negative samples based on those feature
R-CNN
two stage object detection : find object bbox then classify object in bbox
- region proposal :selective search
- CNN (need to train) :extract feature
- bbox regression (need to train): by input bbox output adjust bbox
- SVM classification (need to train): classify object in bbox by extracted feature
- optional:
- NMS(non-maximum suppression): remove duplicate bbox by IoU(Intersection over Union) and confidence score
selective search
bbox regression
train function to adjust bbox target
NMS(non-maximum suppression)
Fast R-CNN
- Input: entire image + object proposals(selective search)
- Feature extraction: CNN on the whole image once
- ROI pooling: extract fixed-size feature map for each proposal
- Fully connected layers: classify (by softmax) and predict bbox offsets
- Advantages over R-CNN:
- Single CNN pass for entire image → faster
- End-to-end training for classification(softmax) + bbox regression
ROI Pooling
unfixsized feature map to fixed size by max pooling

Faster R-CNN
- Replaces selective search with Region Proposal Network (RPN):
- Fully convolutional network that predicts objectness scores and bbox adjustments
- Shares convolutional features with Fast R-CNN
- Pipeline:
- CNN feature map from entire image
- RPN proposes regions (anchors at multiple scales/aspect ratios)
- ROI pooling → fixed-size features for each proposal
- Classification + bbox regression (Fast R-CNN style)
- Advantages:
- Nearly real-time detection
- End-to-end trainable
- Eliminates slow selective search
YOLO
- Single-stage detector: divides image into grid cells, each predicts bounding boxes and class probabilities
- Fast inference speed, suitable for real-time applications
- Less accurate for small objects compared to two-stage detectors
- Loss function combines localization, confidence, and classification losses