-
SceneGraMMi: Scene Graph-boosted Hybrid-fusion for Multi-Modal Misinformation Veracity Prediction
Authors:
Swarang Joshi,
Siddharth Mavani,
Joel Alex,
Arnav Negi,
Rahul Mishra,
Ponnurangam Kumaraguru
Abstract:
Misinformation undermines individual knowledge and affects broader societal narratives. Despite growing interest in the research community in multi-modal misinformation detection, existing methods exhibit limitations in capturing semantic cues, key regions, and cross-modal similarities within multi-modal datasets. We propose SceneGraMMi, a Scene Graph-boosted Hybrid-fusion approach for Multi-modal…
▽ More
Misinformation undermines individual knowledge and affects broader societal narratives. Despite growing interest in the research community in multi-modal misinformation detection, existing methods exhibit limitations in capturing semantic cues, key regions, and cross-modal similarities within multi-modal datasets. We propose SceneGraMMi, a Scene Graph-boosted Hybrid-fusion approach for Multi-modal Misinformation veracity prediction, which integrates scene graphs across different modalities to improve detection performance. Experimental results across four benchmark datasets show that SceneGraMMi consistently outperforms state-of-the-art methods. In a comprehensive ablation study, we highlight the contribution of each component, while Shapley values are employed to examine the explainability of the model's decision-making process.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Enhancing Pollinator Conservation towards Agriculture 4.0: Monitoring of Bees through Object Recognition
Authors:
Ajay John Alex,
Chloe M. Barnes,
Pedro Machado,
Isibor Ihianle,
Gábor Markó,
Martin Bencsik,
Jordan J. Bird
Abstract:
In an era of rapid climate change and its adverse effects on food production, technological intervention to monitor pollinator conservation is of paramount importance for environmental monitoring and conservation for global food security. The survival of the human species depends on the conservation of pollinators. This article explores the use of Computer Vision and Object Recognition to autonomo…
▽ More
In an era of rapid climate change and its adverse effects on food production, technological intervention to monitor pollinator conservation is of paramount importance for environmental monitoring and conservation for global food security. The survival of the human species depends on the conservation of pollinators. This article explores the use of Computer Vision and Object Recognition to autonomously track and report bee behaviour from images. A novel dataset of 9664 images containing bees is extracted from video streams and annotated with bounding boxes. With training, validation and testing sets (6722, 1915, and 997 images, respectively), the results of the COCO-based YOLO model fine-tuning approaches show that YOLOv5m is the most effective approach in terms of recognition accuracy. However, YOLOv5s was shown to be the most optimal for real-time bee detection with an average processing and inference time of 5.1ms per video frame at the cost of slightly lower ability. The trained model is then packaged within an explainable AI interface, which converts detection events into timestamped reports and charts, with the aim of facilitating use by non-technical users such as expert stakeholders from the apiculture industry towards informing responsible consumption and production.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Thread Detection and Response Generation using Transformers with Prompt Optimisation
Authors:
Kevin Joshua T,
Arnav Agarwal,
Shriya Sanjay,
Yash Sarda,
John Sahaya Rani Alex,
Saurav Gupta,
Sushant Kumar,
Vishwanath Kamath
Abstract:
Conversational systems are crucial for human-computer interaction, managing complex dialogues by identifying threads and prioritising responses. This is especially vital in multi-party conversations, where precise identification of threads and strategic response prioritisation ensure efficient dialogue management. To address these challenges an end-to-end model that identifies threads and prioriti…
▽ More
Conversational systems are crucial for human-computer interaction, managing complex dialogues by identifying threads and prioritising responses. This is especially vital in multi-party conversations, where precise identification of threads and strategic response prioritisation ensure efficient dialogue management. To address these challenges an end-to-end model that identifies threads and prioritises their response generation based on the importance was developed, involving a systematic decomposition of the problem into discrete components - thread detection, prioritisation, and performance optimisation which was meticulously analysed and optimised. These refined components seamlessly integrate into a unified framework, in conversational systems. Llama2 7b is used due to its high level of generalisation but the system can be updated with any open source Large Language Model(LLM). The computational capabilities of the Llama2 model was augmented by using fine tuning methods and strategic prompting techniques to optimise the model's performance, reducing computational time and increasing the accuracy of the model. The model achieves up to 10x speed improvement, while generating more coherent results compared to existing models.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Early Detection of Parkinson's Disease using Motor Symptoms and Machine Learning
Authors:
Poojaa C,
John Sahaya Rani Alex
Abstract:
Parkinson's disease (PD) has been found to affect 1 out of every 1000 people, being more inclined towards the population above 60 years. Leveraging wearable-systems to find accurate biomarkers for diagnosis has become the need of the hour, especially for a neurodegenerative condition like Parkinson's. This work aims at focusing on early-occurring, common symptoms, such as motor and gait related pa…
▽ More
Parkinson's disease (PD) has been found to affect 1 out of every 1000 people, being more inclined towards the population above 60 years. Leveraging wearable-systems to find accurate biomarkers for diagnosis has become the need of the hour, especially for a neurodegenerative condition like Parkinson's. This work aims at focusing on early-occurring, common symptoms, such as motor and gait related parameters to arrive at a quantitative analysis on the feasibility of an economical and a robust wearable device. A subset of the Parkinson's Progression Markers Initiative (PPMI), PPMI Gait dataset has been utilised for feature-selection after a thorough analysis with various Machine Learning algorithms. Identified influential features has then been used to test real-time data for early detection of Parkinson Syndrome, with a model accuracy of 91.9%
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Agent-Based Model of Crowd Dynamics in Emergency Situations: A Focus on People With Disabilities
Authors:
Janey Alex,
Jason Stillerman,
Noah Fritzhand,
Tucker Paron
Abstract:
Collective behavior of people in large groups and emergent crowd dynamics can have dangerous and disastrous results when panic is introduced. These events can be caused by emergency situations such as fires in a large building or a stampeding effect when people are rushing in a densely packed area. In this paper, we will use an agent-based modeling approach to simulate different evacuation events…
▽ More
Collective behavior of people in large groups and emergent crowd dynamics can have dangerous and disastrous results when panic is introduced. These events can be caused by emergency situations such as fires in a large building or a stampeding effect when people are rushing in a densely packed area. In this paper, we will use an agent-based modeling approach to simulate different evacuation events in an attempt to understand what is the most efficient scenario. Specifically, we will focus on how people with disabilities are impacted by chosen parameters during an emergency evacuation. We chose an ABM to simulate this because we want to specify specific roles for different "agents" in our model. Specifically, we will focus on the influence of people with disabilities on crowd dynamics and the optimal exits. Does the placement of seating for people with disabilities affect the time it takes for the last person to exit the building? What effect does poor signage have on the time it takes for able-bodied and people with disabilities to exit safely? What happens if some people do not know about alternative exits in their panicked state? Using our agent-based model, we will investigate these questions while also adjusting other outside effects such as the density of the crowd, the speed at which people exit, and the location of people at the start of the simulation.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Atomized Search Length: Beyond User Models
Authors:
John Alex,
Keith Hall,
Donald Metzler
Abstract:
We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space. If IR systems are weak, these metrics undersample or completely filter out the deeper documents that need improvement. If IR systems are relatively strong, these metrics undersample deeper relevant documents that could underpin even stronger IR systems, ones that could present con…
▽ More
We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space. If IR systems are weak, these metrics undersample or completely filter out the deeper documents that need improvement. If IR systems are relatively strong, these metrics undersample deeper relevant documents that could underpin even stronger IR systems, ones that could present content from tens or hundreds of relevant documents in a user-digestible hierarchy or text summary. We reanalyze over 70 TREC tracks from the past 28 years, showing that roughly half undersample top ranked documents and nearly all undersample tail documents. We show that in the 2020 Deep Learning tracks, neural systems were actually near-optimal at top-ranked documents, compared to only modest gains over BM25 on tail documents. Our analysis is based on a simple new systems-oriented metric, 'atomized search length', which is capable of accurately and evenly measuring all relevant documents at any depth.
△ Less
Submitted 5 January, 2022;
originally announced January 2022.
-
Learning To Split and Rephrase From Wikipedia Edit History
Authors:
Jan A. Botha,
Manaal Faruqui,
John Alex,
Jason Baldridge,
Dipanjan Das
Abstract:
Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia's edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan…
▽ More
Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia's edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark.
△ Less
Submitted 28 August, 2018;
originally announced August 2018.