-
Efficient auto-labeling of large-scale poultry datasets (ALPD) using an ensemble model with self- and active-learning approaches
Authors:
Ramesh Bahadur Bist,
Lilong Chai,
Shawna Weimer,
Hannah Atungulua,
Chantel Pennicott,
Xiao Yang,
Sachin Subedi,
Chaitanya Pallerla,
Yang Tian,
Dongyi Wang
Abstract:
The rapid growth of artificial intelligence in poultry farming has highlighted the challenge of efficiently labeling large, diverse datasets. Manual annotation is time-consuming and costly, making it impractical for modern systems that continuously generate data. This study addresses this challenge by exploring semi-supervised auto-labeling methods, integrating self and active learning approaches…
▽ More
The rapid growth of artificial intelligence in poultry farming has highlighted the challenge of efficiently labeling large, diverse datasets. Manual annotation is time-consuming and costly, making it impractical for modern systems that continuously generate data. This study addresses this challenge by exploring semi-supervised auto-labeling methods, integrating self and active learning approaches to develop an efficient, label-scarce framework for auto-labeling large poultry datasets (ALPD). For this study, video data were collected from broilers and laying hens housed. Various machine learning models, including zero-shot models and supervised models, were utilized for broilers and hens detection. The results showed that YOLOv8s-World and YOLOv9s performed better when compared performance metrics for broiler and hen detection under supervised learning, while among the semi-supervised model, YOLOv8s-ALPD achieved the highest precision (96.1%) and recall (99%) with an RMSE of 1.87. The hybrid YOLO-World model, incorporating the optimal YOLOv8s backbone with zero-shot models, demonstrated the highest overall performance. It achieved a precision of 99.2%, recall of 99.4%, and an F1 score of 98.7% for detection. In addition, the semi-supervised models with minimal human intervention (active learning) reduced annotation time by over 80% compared to full manual labeling. Moreover, integrating zero-shot models with the best models enhanced broiler and hen detection, achieving comparable results to supervised models while significantly increasing speed. In conclusion, integrating semi-supervised auto-labeling and zero-shot models significantly improves detection accuracy. It reduces manual annotation efforts, offering a promising solution to optimize AI-driven systems in poultry farming, advancing precision livestock management, and promoting more sustainable practices.
△ Less
Submitted 21 February, 2025; v1 submitted 18 January, 2025;
originally announced January 2025.
-
Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications
Authors:
Sujit Khanna,
Shishir Subedi
Abstract:
In recent times Large Language Models have exhibited tremendous capabilities, especially in the areas of mathematics, code generation and general-purpose reasoning. However for specialized domains especially in applications that require parsing and analyzing large chunks of numeric or tabular data even state-of-the-art (SOTA) models struggle. In this paper, we introduce a new approach to solving d…
▽ More
In recent times Large Language Models have exhibited tremendous capabilities, especially in the areas of mathematics, code generation and general-purpose reasoning. However for specialized domains especially in applications that require parsing and analyzing large chunks of numeric or tabular data even state-of-the-art (SOTA) models struggle. In this paper, we introduce a new approach to solving domain-specific tabular data analysis tasks by presenting a unique RAG workflow that mitigates the scalability issues of existing tabular LLM solutions. Specifically, we present Tabular Embedding Model (TEM), a novel approach to fine-tune embedding models for tabular Retrieval-Augmentation Generation (RAG) applications. Embedding models form a crucial component in the RAG workflow and even current SOTA embedding models struggle as they are predominantly trained on textual datasets and thus underperform in scenarios involving complex tabular data. The evaluation results showcase that our approach not only outperforms current SOTA embedding models in this domain but also does so with a notably smaller and more efficient model structure.
△ Less
Submitted 28 April, 2024;
originally announced May 2024.
-
SAM for Poultry Science
Authors:
Xiao Yang,
Haixing Dai,
Zihao Wu,
Ramesh Bist,
Sachin Subedi,
Jin Sun,
Guoyu Lu,
Changying Li,
Tianming Liu,
Lilong Chai
Abstract:
In recent years, the agricultural industry has witnessed significant advancements in artificial intelligence (AI), particularly with the development of large-scale foundational models. Among these foundation models, the Segment Anything Model (SAM), introduced by Meta AI Research, stands out as a groundbreaking solution for object segmentation tasks. While SAM has shown success in various agricult…
▽ More
In recent years, the agricultural industry has witnessed significant advancements in artificial intelligence (AI), particularly with the development of large-scale foundational models. Among these foundation models, the Segment Anything Model (SAM), introduced by Meta AI Research, stands out as a groundbreaking solution for object segmentation tasks. While SAM has shown success in various agricultural applications, its potential in the poultry industry, specifically in the context of cage-free hens, remains relatively unexplored. This study aims to assess the zero-shot segmentation performance of SAM on representative chicken segmentation tasks, including part-based segmentation and the use of infrared thermal images, and to explore chicken-tracking tasks by using SAM as a segmentation tool. The results demonstrate SAM's superior performance compared to SegFormer and SETR in both whole and part-based chicken segmentation. SAM-based object tracking also provides valuable data on the behavior and movement patterns of broiler birds. The findings of this study contribute to a better understanding of SAM's potential in poultry science and lay the foundation for future advancements in chicken segmentation and tracking.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Federated Learning Attacks Revisited: A Critical Discussion of Gaps, Assumptions, and Evaluation Setups
Authors:
Aidmar Wainakh,
Ephraim Zimmer,
Sandeep Subedi,
Jens Keim,
Tim Grube,
Shankar Karuppayah,
Alejandro Sanchez Guinea,
Max Mühlhäuser
Abstract:
Federated learning (FL) enables a set of entities to collaboratively train a machine learning model without sharing their sensitive data, thus, mitigating some privacy concerns. However, an increasing number of works in the literature propose attacks that can manipulate the model and disclose information about the training data in FL. As a result, there has been a growing belief in the research co…
▽ More
Federated learning (FL) enables a set of entities to collaboratively train a machine learning model without sharing their sensitive data, thus, mitigating some privacy concerns. However, an increasing number of works in the literature propose attacks that can manipulate the model and disclose information about the training data in FL. As a result, there has been a growing belief in the research community that FL is highly vulnerable to a variety of severe attacks. Although these attacks do indeed highlight security and privacy risks in FL, some of them may not be as effective in production deployment because they are feasible only under special -- sometimes impractical -- assumptions. Furthermore, some attacks are evaluated under limited setups that may not match real-world scenarios. In this paper, we investigate this issue by conducting a systematic mapping study of attacks against FL, covering 48 relevant papers from 2016 to the third quarter of 2021. On the basis of this study, we provide a quantitative analysis of the proposed attacks and their evaluation settings. This analysis reveals several research gaps with regard to the type of target ML models and their architectures. Additionally, we highlight unrealistic assumptions in the problem settings of some attacks, related to the hyper-parameters of the ML model and data distribution among clients. Furthermore, we identify and discuss several fallacies in the evaluation of attacks, which open up questions on the generalizability of the conclusions. As a remedy, we propose a set of recommendations to avoid these fallacies and to promote adequate evaluations.
△ Less
Submitted 3 January, 2022; v1 submitted 5 November, 2021;
originally announced November 2021.