Computer Vision

Intro

The Computer Vision group on the Computer Science subteam is responsible for choosing and training an object detection model to detect the components of the competition course from a live video input. Our tasks consist of uploading data, labeling objects, and using our results to compare then improve our model. 

Annotation

Roboflow was used to create a custom dataset. We uploaded videos of the buoys in different environments and from a variety of angles. The video is parsed into images based on the number of frames per second we choose. The images are then uploaded in Roboflow to manually draw bounding boxes around the objects we want to label (red, blue, green, and yellow buoys). These labels are used to train the model.

Model Architecture

YOLOv5 is a computer vision model. We compared the versions of YOLOv5 (nano, small, medium, large, and xlarge). We compared the performance of the models which vary in terms of accuracy and time to train. 

This past semester (fall 2022), we specifically tested the nano, small, and medium models, all of which had their own advantages and tradeoffs. Most notably, as the model gets larger, the potential for accuracy improves, but the real-time speed of detection slows. Additionally, training slows significantly. Using a GPU, however, as we did through Google Colaboratory, the training process even with larger models is quite reasonable. 

Having tried several different models, our most up-to-date version uses YOLOv5m (medium), and is quite effective at recognizing buoys. It is trained on over 1000 different images over 50 epochs. The model precision for object detection is around 99%. Confidence levels for detection are also similarly close to 1.

Concepts and Techniques Used

  • ZED SDK & Python API
  • Python 3
  • Pytorch
  • Google Colab
  • OpenCV
  • CUDA
  • Apex

Future Goals

  • Integration with ZED camera
  • Testing with newer YOLO models – version 8 released in 2023
  • Training on additional relevant objects (die faces, humans, etc.)
  • Additional image annotation (more data will lead to a better model) and automation of process