Real-Time Object Detection with Python and YOLOv5

By Charles LAZIOSI
Published on

Object detection has become a hot topic with wide applications: autonomous vehicles, retail analytics, surveillance, and even wildlife monitoring. In this article, we'll walk through how to use YOLOv5 (You Only Look Once v5), one of the most popular object detection models, to detect objects in real-time using a webcam — all in Python.

This is a practical guide that includes installation, setup, and a complete script to get you up and running.

Prerequisites

  1. Python 3.8+

  2. A webcam

  3. Git & pip

Install the required packages:

pip install torch torchvision opencv-python matplotlib
git clone https://github.com/ultralytics/yolov5  # Clone YOLOv5 repo
cd yolov5
pip install -r requirements.txt

How It Works

YOLOv5 is a family of models pre-trained on the COCO dataset (80 classes). We'll use yolov5s.pt, the small and fast variant, to process webcam frames and draw bounding boxes around detected objects in real time. What is Object Detection, Really?

Object detection is more than just classification (what is in the image?) — it’s classification + localization (what and where is it in the image?). It returns:

  • Bounding boxes (x, y, width, height)

  • Class labels (like "car", "person")

  • Confidence scores (e.g., 0.87)

This turns a passive system into a perception engine for automation, tracking, and decision-making.

Inside YOLOv5

YOLO stands for You Only Look Once. Instead of using separate region proposal and classification steps (like R-CNN), it performs detection in a single forward pass through a neural network.

Architecture Summary:

  • Backbone: CSPNet (Cross Stage Partial Network) for feature extraction.

  • Neck: PANet-like layer for fusing features at different scales.

  • Head: Predicts bounding boxes, class probabilities, and objectness scores.

Output:

For each frame, YOLOv5 returns a tensor of shape [num_detections, 6]:

[x_min, y_min, x_max, y_max, confidence, class_id]

Each of these is then used to draw the boxes and labels.

Why YOLOv5?

YOLOv5, developed by Ultralytics, is written in pure PyTorch, making it easier to extend than YOLOv4 (which uses Darknet/C++). Key features include:

  • Real-time performance: Fast enough for webcam, drones, robotics.

  • Easy fine-tuning: Custom training with your data in hours, not days.

  • Model variants:

    • yolov5n (nano): super fast, less accurate

    • yolov5s (small): good balance

    • yolov5m / l / x: larger, more accurate, slower>)

Python Script: Real-Time Detection with Webcam

import torch
import cv2
import time

# Load pre-trained YOLOv5s model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', trust_repo=True)

# Initialize webcam (0 for default camera)
cap = cv2.VideoCapture(0)

# Set resolution (optional)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

print("Starting webcam detection... Press 'q' to quit.")
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Inference
    results = model(frame)

    # Render results on the frame
    annotated_frame = results.render()[0]

    # Display the frame
    cv2.imshow('YOLOv5 Webcam Detection', annotated_frame)

    # Exit on pressing 'q'
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release resources
cap.release()
cv2.destroyAllWindows()

What You’ll See

As soon as the script runs, your webcam feed will pop up with live bounding boxes labeled with detected object names and confidence scores - like person, laptop, cup, etc.

Tips & Customizations

  • Improve speed: Use yolov5n (nano model) for faster but less accurate results.

  • Train on your own data: YOLOv5 lets you fine-tune on a custom dataset with just a few lines of code.

  • Save results: Use results.save() to save annotated frames.

Use Cases in Real Life

  • Smart doorbell: Detect people, pets, or packages.

  • Inventory system: Detect product presence in warehouses.

  • Fitness app: Count reps by detecting body poses or gym equipment.

Performance Tradeoffs

ModelSize (MB)Speed (ms)mAP@0.5Best For
yolov5n~3.9~6.3~28.0Mobile, Raspberry Pi
yolov5s~14.2~6.4~37.4Real-time on CPU
yolov5m~40.0~10.0~45.4Balanced inference
yolov5l/x~80-170~15-20~50+High-end GPUs

Tip: Use yolov5s for most real-time applications unless you’re deploying to edge devices -then yolov5n is a better fit.

Custom Training (DIY Detection)

Want to detect custom objects like your brand logo, insects, tools, or machinery parts?

Steps:

  1. Label images using Roboflow, LabelImg, or CVAT.

  2. Organize your dataset:

/dataset
  /images
    /train, /val
  /labels
    /train, /val
  1. Define a config YAML:
train: ./images/train
val: ./images/val
nc: 2
names: ['helmet', 'vest']
  1. Fine-tune the model:
python train.py --img 640 --batch 16 --epochs 50 --data your_config.yaml --weights yolov5s.pt

This gives you a model ready for industrial inspection, retail analysis, security, etc.

Real-Time Use Cases & Architecture

Here’s how you’d plug this into larger systems:

Factory Automation

  • Camera → YOLOv5 → Logic Controller

  • Auto-stops line if defect detected

Security System

  • CCTV stream → Object detection (e.g., “person”, “weapon”)

  • Trigger alarm, notify personnel

Robotics:

  • Drone stream → YOLOv5 → Navigate to detected objects

Infrastructure Overview:

[Webcam/Camera]
      |
   [Frame Capture]
      |
   [YOLOv5 Inference]
      |
[Bounding Boxes + Labels]
      |
[Trigger System / UI / Store DB]

With just 30–60ms per frame, it’s good enough for real-time alerts.

Considerations for Production

  • Latency: GPU preferred; CPU possible with yolov5n.

  • False positives: Apply confidence threshold (e.g., 0.5).

  • Privacy: Mask identities, store only metadata.

  • Scaling: Use model inference servers (e.g., ONNX, TorchServe).

  • Edge devices: Convert to TFLite, CoreML, or ONNX for deployment.

What’s Next?

  • Try YOLOv8, which has even more performance improvements.

  • Use OpenCV DNN module if you want to avoid PyTorch altogether.

  • Integrate with Flask or FastAPI to serve detection over REST.

  • Log results with ElasticSearch, send alerts with Twilio, or visualize with Streamlit.

Conclusion

Real-time object detection with Python and YOLOv5 brings cutting-edge AI to your fingertips. Whether for hobby, research, or enterprise-grade applications, it combines speed, accuracy, and ease-of-use like few tools out there.

Let me know if you’d like a downloadable notebook or example with custom object detection, REST API, or Jetson Nano deployment.

Would you like me to generate an architecture diagram for a real-time YOLOv5 pipeline?