Occluded perons detection - Vision Transformer

Project information

  • Category: Machine learning
  • Client: University of Kelaniya
  • Project URL: project link

This project was focused on implementing the latest Vision Transformers for object detection. Traditionally CNN have been used for object detection. With the radically popularity of Transformers , Vision Transformers have been used to perform many of the task that were initially done by CNN's.

This projects attempts to replicate the process shown in the research paper by Meta on their object detection model called DE:TR.Unlike traditional computer vision techniques, DETR approaches object detection as a direct set prediction problem. It consists of a set-based global loss, which forces unique predictions via bipartite matching, and a Transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. Due to this parallel nature, DETR is very fast and efficient

In this project the main focus is to detect occluded persons in a image. This has been a difficult task for traditional CNN algorithms. The DE:TR model seems to perform very well compared to CNN algorithms. However the model training takes considerably longer.