Real-Time Object Detection with Python and YOLOv5
- Published on

Object detection has become a hot topic with wide applications: autonomous vehicles, retail analytics, surveillance, and even wildlife monitoring. In this article, we'll walk through how to use YOLOv5 (You Only Look Once v5), one of the most popular object detection models, to detect objects in real-time using a webcam — all in Python.
This is a practical guide that includes installation, setup, and a complete script to get you up and running.
Prerequisites
Python 3.8+
A webcam
Git & pip
Install the required packages:
pip install torch torchvision opencv-python matplotlib
git clone https://github.com/ultralytics/yolov5 # Clone YOLOv5 repo
cd yolov5
pip install -r requirements.txt
How It Works
YOLOv5 is a family of models pre-trained on the COCO dataset (80 classes). We'll use yolov5s.pt
, the small and fast variant, to process webcam frames and draw bounding boxes around detected objects in real time. What is Object Detection, Really?
Object detection is more than just classification (what is in the image?) — it’s classification + localization (what and where is it in the image?). It returns:
Bounding boxes (x, y, width, height)
Class labels (like "car", "person")
Confidence scores (e.g., 0.87)
This turns a passive system into a perception engine for automation, tracking, and decision-making.
Inside YOLOv5
YOLO stands for You Only Look Once. Instead of using separate region proposal and classification steps (like R-CNN), it performs detection in a single forward pass through a neural network.
Architecture Summary:
Backbone: CSPNet (Cross Stage Partial Network) for feature extraction.
Neck: PANet-like layer for fusing features at different scales.
Head: Predicts bounding boxes, class probabilities, and objectness scores.
Output:
For each frame, YOLOv5 returns a tensor of shape [num_detections, 6]
:
[x_min, y_min, x_max, y_max, confidence, class_id]
Each of these is then used to draw the boxes and labels.
Why YOLOv5?
YOLOv5, developed by Ultralytics, is written in pure PyTorch, making it easier to extend than YOLOv4 (which uses Darknet/C++). Key features include:
Real-time performance: Fast enough for webcam, drones, robotics.
Easy fine-tuning: Custom training with your data in hours, not days.
Model variants:
yolov5n
(nano): super fast, less accurateyolov5s
(small): good balanceyolov5m
/l
/x
: larger, more accurate, slower>)
Python Script: Real-Time Detection with Webcam
import torch
import cv2
import time
# Load pre-trained YOLOv5s model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', trust_repo=True)
# Initialize webcam (0 for default camera)
cap = cv2.VideoCapture(0)
# Set resolution (optional)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
print("Starting webcam detection... Press 'q' to quit.")
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Inference
results = model(frame)
# Render results on the frame
annotated_frame = results.render()[0]
# Display the frame
cv2.imshow('YOLOv5 Webcam Detection', annotated_frame)
# Exit on pressing 'q'
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release resources
cap.release()
cv2.destroyAllWindows()
What You’ll See
As soon as the script runs, your webcam feed will pop up with live bounding boxes labeled with detected object names and confidence scores - like person, laptop, cup, etc.
Tips & Customizations
Improve speed: Use
yolov5n
(nano model) for faster but less accurate results.Train on your own data: YOLOv5 lets you fine-tune on a custom dataset with just a few lines of code.
Save results: Use
results.save()
to save annotated frames.
Use Cases in Real Life
Smart doorbell: Detect people, pets, or packages.
Inventory system: Detect product presence in warehouses.
Fitness app: Count reps by detecting body poses or gym equipment.
Performance Tradeoffs
Model | Size (MB) | Speed (ms) | mAP@0.5 | Best For |
---|---|---|---|---|
yolov5n | ~3.9 | ~6.3 | ~28.0 | Mobile, Raspberry Pi |
yolov5s | ~14.2 | ~6.4 | ~37.4 | Real-time on CPU |
yolov5m | ~40.0 | ~10.0 | ~45.4 | Balanced inference |
yolov5l/x | ~80-170 | ~15-20 | ~50+ | High-end GPUs |
Tip: Use yolov5s
for most real-time applications unless you’re deploying to edge devices -then yolov5n
is a better fit.
Custom Training (DIY Detection)
Want to detect custom objects like your brand logo, insects, tools, or machinery parts?
Steps:
Label images using Roboflow, LabelImg, or CVAT.
Organize your dataset:
/dataset
/images
/train, /val
/labels
/train, /val
- Define a config YAML:
train: ./images/train
val: ./images/val
nc: 2
names: ['helmet', 'vest']
- Fine-tune the model:
python train.py --img 640 --batch 16 --epochs 50 --data your_config.yaml --weights yolov5s.pt
This gives you a model ready for industrial inspection, retail analysis, security, etc.
Real-Time Use Cases & Architecture
Here’s how you’d plug this into larger systems:
Factory Automation
Camera → YOLOv5 → Logic Controller
Auto-stops line if defect detected
Security System
CCTV stream → Object detection (e.g., “person”, “weapon”)
Trigger alarm, notify personnel
Robotics:
- Drone stream → YOLOv5 → Navigate to detected objects
Infrastructure Overview:
[Webcam/Camera]
|
[Frame Capture]
|
[YOLOv5 Inference]
|
[Bounding Boxes + Labels]
|
[Trigger System / UI / Store DB]
With just 30–60ms per frame, it’s good enough for real-time alerts.
Considerations for Production
Latency: GPU preferred; CPU possible with
yolov5n
.False positives: Apply confidence threshold (e.g., 0.5).
Privacy: Mask identities, store only metadata.
Scaling: Use model inference servers (e.g., ONNX, TorchServe).
Edge devices: Convert to
TFLite
,CoreML
, orONNX
for deployment.
What’s Next?
Try YOLOv8, which has even more performance improvements.
Use OpenCV DNN module if you want to avoid PyTorch altogether.
Integrate with Flask or FastAPI to serve detection over REST.
Log results with ElasticSearch, send alerts with Twilio, or visualize with Streamlit.
Conclusion
Real-time object detection with Python and YOLOv5 brings cutting-edge AI to your fingertips. Whether for hobby, research, or enterprise-grade applications, it combines speed, accuracy, and ease-of-use like few tools out there.
Let me know if you’d like a downloadable notebook or example with custom object detection, REST API, or Jetson Nano deployment.
Would you like me to generate an architecture diagram for a real-time YOLOv5 pipeline?