Search | arXiv e-print repository

Leveraging OpenFlamingo for Multimodal Embedding Analysis of C2C Car Parts Data

Authors: Maisha Binte Rashid, Pablo Rivas

Abstract: In this paper, we aim to investigate the capabilities of multimodal machine learning models, particularly the OpenFlamingo model, in processing a large-scale dataset of consumer-to-consumer (C2C) online posts related to car parts. We have collected data from two platforms, OfferUp and Craigslist, resulting in a dataset of over 1.2 million posts with their corresponding images. The OpenFlamingo mod… ▽ More In this paper, we aim to investigate the capabilities of multimodal machine learning models, particularly the OpenFlamingo model, in processing a large-scale dataset of consumer-to-consumer (C2C) online posts related to car parts. We have collected data from two platforms, OfferUp and Craigslist, resulting in a dataset of over 1.2 million posts with their corresponding images. The OpenFlamingo model was used to extract embeddings for the text and image of each post. We used $k$-means clustering on the joint embeddings to identify underlying patterns and commonalities among the posts. We have found that most clusters contain a pattern, but some clusters showed no internal patterns. The results provide insight into the fact that OpenFlamingo can be used for finding patterns in large datasets but needs some modification in the architecture according to the dataset. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: The 26th International Conference on Artificial Intelligence (ICAI'24: July 22-25, 2024; Las Vegas, USA)

ACM Class: I.2.10; I.2.7; I.5; H.3.3; H.3.1

arXiv:2502.16361 [pdf, other]

A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications

Authors: Maisha Binte Rashid, Pablo Rivas

Abstract: Vision-Language Models (VLMs) are increasingly deployed in public sector missions, necessitating robust evaluation of their safety and vulnerability to adversarial attacks. This paper introduces a novel framework to quantify adversarial risks in VLMs. We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noi… ▽ More Vision-Language Models (VLMs) are increasingly deployed in public sector missions, necessitating robust evaluation of their safety and vulnerability to adversarial attacks. This paper introduces a novel framework to quantify adversarial risks in VLMs. We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noise patches and saliency patterns that highlight vulnerable regions. These patterns are compared against the Fast Gradient Sign Method (FGSM) to assess their adversarial effectiveness. We propose a new Vulnerability Score that combines the impact of random noise and adversarial attacks, providing a comprehensive metric for evaluating model robustness. △ Less

Submitted 22 February, 2025; originally announced February 2025.

Comments: AAAI 2025 Workshop on AI for Social Impact: Bridging Innovations in Finance, Social Media, and Crime Prevention

ACM Class: I.2.10; I.4.9; K.4.1

arXiv:2410.20664

Embedding with Large Language Models for Classification of HIPAA Safeguard Compliance Rules

Authors: Md Abdur Rahman, Md Abdul Barek, ABM Kamrul Islam Riad, Md Mostafizur Rahman, Md Bajlur Rashid, Smita Ambedkar, Md Raihan Miaa, Fan Wu, Alfredo Cuzzocrea, Sheikh Iqbal Ahamed

Abstract: Although software developers of mHealth apps are responsible for protecting patient data and adhering to strict privacy and security requirements, many of them lack awareness of HIPAA regulations and struggle to distinguish between HIPAA rules categories. Therefore, providing guidance of HIPAA rules patterns classification is essential for developing secured applications for Google Play Store. In… ▽ More Although software developers of mHealth apps are responsible for protecting patient data and adhering to strict privacy and security requirements, many of them lack awareness of HIPAA regulations and struggle to distinguish between HIPAA rules categories. Therefore, providing guidance of HIPAA rules patterns classification is essential for developing secured applications for Google Play Store. In this work, we identified the limitations of traditional Word2Vec embeddings in processing code patterns. To address this, we adopt multilingual BERT (Bidirectional Encoder Representations from Transformers) which offers contextualized embeddings to the attributes of dataset to overcome the issues. Therefore, we applied this BERT to our dataset for embedding code patterns and then uses these embedded code to various machine learning approaches. Our results demonstrate that the models significantly enhances classification performance, with Logistic Regression achieving a remarkable accuracy of 99.95\%. Additionally, we obtained high accuracy from Support Vector Machine (99.79\%), Random Forest (99.73\%), and Naive Bayes (95.93\%), outperforming existing approaches. This work underscores the effectiveness and showcases its potential for secure application development. △ Less

Submitted 7 November, 2024; v1 submitted 27 October, 2024; originally announced October 2024.

Comments: I am requesting the withdrawal of my paper due to critical issues identified in the methodology/results that may impact its accuracy and reliability. I also plan to make substantial revisions that go beyond minor corrections

arXiv:2407.21174 [pdf, other]

AI Safety in Practice: Enhancing Adversarial Robustness in Multimodal Image Captioning

Authors: Maisha Binte Rashid, Pablo Rivas

Abstract: Multimodal machine learning models that combine visual and textual data are increasingly being deployed in critical applications, raising significant safety and security concerns due to their vulnerability to adversarial attacks. This paper presents an effective strategy to enhance the robustness of multimodal image captioning models against such attacks. By leveraging the Fast Gradient Sign Metho… ▽ More Multimodal machine learning models that combine visual and textual data are increasingly being deployed in critical applications, raising significant safety and security concerns due to their vulnerability to adversarial attacks. This paper presents an effective strategy to enhance the robustness of multimodal image captioning models against such attacks. By leveraging the Fast Gradient Sign Method (FGSM) to generate adversarial examples and incorporating adversarial training techniques, we demonstrate improved model robustness on two benchmark datasets: Flickr8k and COCO. Our findings indicate that selectively training only the text decoder of the multimodal architecture shows performance comparable to full adversarial training while offering increased computational efficiency. This targeted approach suggests a balance between robustness and training costs, facilitating the ethical deployment of multimodal AI systems across various domains. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: Accepted into KDD 2024 workshop on Ethical AI

ACM Class: I.2.7

Showing 1–4 of 4 results for author: Rashid, M B