Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Authors:
Imad Eddine Toubal,
Aditya Avinash,
Neil Gordon Alldrin,
Jan Dlabal,
Wenlei Zhou,
Enming Luo,
Otilia Stretcu,
Hao Xiong,
Chun-Ta Lu,
Howard Zhou,
Ranjay Krishna,
Ariel Fuxman,
Tom Duerig
Abstract:
From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, developing classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, whi…
▽ More
From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, developing classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, which enable rapid bootstrapping of image classifiers, users are still required to spend 30 minutes or more of monotonous, repetitive data labeling just to train a single classifier. Drawing on Fiske's Cognitive Miser theory, we propose a new framework that alleviates manual effort by replacing human labeling with natural language interactions, reducing the total effort required to define a concept by an order of magnitude: from labeling 2,000 images to only 100 plus some natural language interactions. Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points. Most importantly, our framework eliminates the need for crowd-sourced annotations. Moreover, our framework ultimately produces lightweight classification models that are deployable in cost-sensitive scenarios. Across 15 subjective concepts and across 2 public image classification datasets, our trained models outperform traditional Agile Modeling as well as state-of-the-art zero-shot classification models like ALIGN, CLIP, CuPL, and large visual question-answering models like PaLI-X.
△ Less
Submitted 19 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
Real-Time Helmet Violation Detection in AI City Challenge 2023 with Genetic Algorithm-Enhanced YOLOv5
Authors:
Elham Soltanikazemi,
Ashwin Dhakal,
Bijaya Kumar Hatuwal,
Imad Eddine Toubal,
Armstrong Aboah,
Kannappan Palaniappan
Abstract:
This research focuses on real-time surveillance systems as a means for tackling the issue of non-compliance with helmet regulations, a practice that considerably amplifies the risk for motorcycle drivers or riders. Despite the well-established advantages of helmet usage, achieving widespread compliance remains challenging due to diverse contributing factors. To effectively address this concern, re…
▽ More
This research focuses on real-time surveillance systems as a means for tackling the issue of non-compliance with helmet regulations, a practice that considerably amplifies the risk for motorcycle drivers or riders. Despite the well-established advantages of helmet usage, achieving widespread compliance remains challenging due to diverse contributing factors. To effectively address this concern, real-time monitoring and enforcement of helmet laws have been proposed as a plausible solution. However, previous attempts at real-time helmet violation detection have been hindered by their limited ability to operate in real-time. To overcome this limitation, the current paper introduces a novel real-time helmet violation detection system that utilizes the YOLOv5 single-stage object detection model. This model is trained on the 2023 NVIDIA AI City Challenge 2023 Track 5 dataset. The optimal hyperparameters for training the model are determined using genetic algorithms. Additionally, data augmentation and various sampling techniques are implemented to enhance the model's performance. The efficacy of the models is evaluated using precision, recall, and mean Average Precision (mAP) metrics. The results demonstrate impressive precision, recall, and mAP scores of 0.848, 0.599, and 0.641, respectively for the training data. Furthermore, the model achieves notable mAP score of 0.6667 for the test datasets, leading to a commendable 4th place rank in the public leaderboard. This innovative approach represents a notable breakthrough in the field and holds immense potential to substantially enhance motorcycle safety. By enabling real-time monitoring and enforcement capabilities, this system has the capacity to contribute towards increased compliance with helmet laws, thereby effectively reducing the risks faced by motorcycle riders and passengers.
△ Less
Submitted 20 November, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.