-
AWARE-NET: Adaptive Weighted Averaging for Robust Ensemble Network in Deepfake Detection
Authors:
Muhammad Salman,
Iqra Tariq,
Mishal Zulfiqar,
Muqadas Jalal,
Sami Aujla,
Sumbal Fatima
Abstract:
Deepfake detection has become increasingly important due to the rise of synthetic media, which poses significant risks to digital identity and cyber presence for security and trust. While multiple approaches have improved detection accuracy, challenges remain in achieving consistent performance across diverse datasets and manipulation types. In response, we propose a novel two-tier ensemble framew…
▽ More
Deepfake detection has become increasingly important due to the rise of synthetic media, which poses significant risks to digital identity and cyber presence for security and trust. While multiple approaches have improved detection accuracy, challenges remain in achieving consistent performance across diverse datasets and manipulation types. In response, we propose a novel two-tier ensemble framework for deepfake detection based on deep learning that hierarchically combines multiple instances of three state-of-the-art architectures: Xception, Res2Net101, and EfficientNet-B7. Our framework employs a unique approach where each architecture is instantiated three times with different initializations to enhance model diversity, followed by a learnable weighting mechanism that dynamically combines their predictions. Unlike traditional fixed-weight ensembles, our first-tier averages predictions within each architecture family to reduce model variance, while the second tier learns optimal contribution weights through backpropagation, automatically adjusting each architecture's influence based on their detection reliability. Our experiments achieved state-of-the-art intra-dataset performance with AUC scores of 99.22% (FF++) and 100.00% (CelebDF-v2), and F1 scores of 98.06% (FF++) and 99.94% (CelebDF-v2) without augmentation. With augmentation, we achieve AUC scores of 99.47% (FF++) and 100.00% (CelebDF-v2), and F1 scores of 98.43% (FF++) and 99.95% (CelebDF-v2). The framework demonstrates robust cross-dataset generalization, achieving AUC scores of 88.20% and 72.52%, and F1 scores of 93.16% and 80.62% in cross-dataset evaluations.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Brick Kiln Dataset for Pakistan's IGP Region Using AI
Authors:
Muhammad Suleman Ali Hamdani,
Khizer Zakir,
Neetu Kushwaha,
Syeda Eman Fatima,
Hassan Aftab Sheikh
Abstract:
Brick kilns are a major source of air pollution in Pakistan, with many operating without regulation. A key challenge in Pakistan and across the Indo-Gangetic Plain is the limited air quality monitoring and lack of transparent data on pollution sources. To address this, we present a two-fold AI approach that combines low-resolution Sentinel-2 and high-resolution imagery to map brick kiln locations.…
▽ More
Brick kilns are a major source of air pollution in Pakistan, with many operating without regulation. A key challenge in Pakistan and across the Indo-Gangetic Plain is the limited air quality monitoring and lack of transparent data on pollution sources. To address this, we present a two-fold AI approach that combines low-resolution Sentinel-2 and high-resolution imagery to map brick kiln locations. Our process begins with a low-resolution analysis, followed by a post-processing step to reduce false positives, minimizing the need for extensive high-resolution imagery. This analysis initially identified 20,000 potential brick kilns, with high-resolution validation confirming around 11,000 kilns. The dataset also distinguishes between Fixed Chimney and Zigzag kilns, enabling more accurate pollution estimates for each type. Our approach demonstrates how combining satellite imagery with AI can effectively detect specific polluting sources. This dataset provides regulators with insights into brick kiln pollution, supporting interventions for unregistered kilns and actions during high pollution episodes.
△ Less
Submitted 24 November, 2024;
originally announced December 2024.
-
Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Authors:
Saikat Chakraborty,
Gabriel Ebner,
Siddharth Bhat,
Sarah Fakhoury,
Sakina Fatima,
Shuvendu Lahiri,
Nikhil Swamy
Abstract:
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K…
▽ More
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem producing a definition given a formal specification expressed as an F* type. We provide a program fragment checker that queries F* to check the correctness of candidate solutions. We also report on an extended version of our dataset containing a total of 940K lines of programs and proofs, with a total of 54k top-level F* definitions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker. Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F*, with promising results. Our main finding in that the performance of fine-tuned smaller language models (such as Phi-2 or StarCoder) compare favorably with large language models (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements.
△ Less
Submitted 4 September, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair
Authors:
Sakina Fatima,
Hadi Hemmati,
Lionel Briand
Abstract:
Flaky tests are problematic because they non-deterministically pass or fail for the same software version under test, causing confusion and wasting development effort. While machine learning models have been used to predict flakiness and its root causes, there is much less work on providing support to fix the problem. To address this gap, in this paper, we focus on predicting the type of fix that…
▽ More
Flaky tests are problematic because they non-deterministically pass or fail for the same software version under test, causing confusion and wasting development effort. While machine learning models have been used to predict flakiness and its root causes, there is much less work on providing support to fix the problem. To address this gap, in this paper, we focus on predicting the type of fix that is required to remove flakiness and then repair the test code on that basis. We do this for a subset of flaky tests where the root cause of flakiness is in the test itself and not in the production code. One key idea is to guide the repair process with additional knowledge about the test's flakiness in the form of its predicted fix category. Thus, we first propose a framework that automatically generates labeled datasets for 13 fix categories and trains models to predict the fix category of a flaky test by analyzing the test code only. Our experimental results using code models and few-shot learning show that we can correctly predict most of the fix categories. To show the usefulness of such fix category labels for automatically repairing flakiness, we augment the prompts of GPT-3.5 Turbo, a Large Language Model (LLM), with such extra knowledge to request repair suggestions. The results show that our suggested fix category labels, complemented with in-context learning, significantly enhance the capability of GPT-3.5 Turbo in generating fixes for flaky tests. Based on the execution and analysis of a sample of GPT-repaired flaky tests, we estimate that a large percentage of such repairs (roughly between 51% and 83%) can be expected to pass. For the failing repaired tests, on average, 16% of the test code needs to be further changed for them to pass.
△ Less
Submitted 30 August, 2024; v1 submitted 21 June, 2023;
originally announced July 2023.
-
An Extended Model of Software Configuration
Authors:
Rezvan Mahdavi-Hezaveh,
Sameeha Fatima,
Laurie Williams
Abstract:
Feature toggles and configuration options are modern programmatic techniques to easily include or exclude functionality in a software product. The research contributions to these two techniques have most often been focused on either one of them separately. However, focusing on the similarities of these two techniques may enable a more fruitful combined family of research on software configuration,…
▽ More
Feature toggles and configuration options are modern programmatic techniques to easily include or exclude functionality in a software product. The research contributions to these two techniques have most often been focused on either one of them separately. However, focusing on the similarities of these two techniques may enable a more fruitful combined family of research on software configuration, a term we use to encompass both techniques. Also, a common terminology may have enabled meta-analysis, a more practical application of the research on the two techniques, and prevented duplication of research effort. The goal of this research study is to aid researchers in conducting a family of research on software configuration by extending an existing model of software configuration that provides terminology for research studies. To achieve our goal, we started with Seigmund et al. Model of Software Configuration (MSC) which was developed based on interviews and publications on configuration options. We explicitly extend the MSC to include feature toggles and to add qualitative analysis of feature toggle-related resources. From our analysis, we proposed MSCv2 as an extended version of MSC and evaluated it through its application on five academic publications and the Chrome system. Our results indicate that multiple researchers studying the same system may provide different definitions of software configuration in their publications. Also, similar research questions may be answered on feature toggles and configuration options repeatedly because of a lack of a clear definition of software configuration. These observations indicate that having a model for defining software configuration may enable more clear and generalized research on the software configuration family of research. Practitioners benefit MSCv2 in their systems to better knowledge transfer to other practitioners and researchers.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Flakify: A Black-Box, Language Model-based Predictor for Flaky Tests
Authors:
Sakina Fatima,
Taher A. Ghaleb,
Lionel Briand
Abstract:
Software testing assures that code changes do not adversely affect existing functionality. However, a test case can be flaky, i.e., passing and failing across executions, even for the same version of the source code. Flaky test cases introduce overhead to software development as they can lead to unnecessary attempts to debug production or testing code. The state-of-the-art ML-based flaky test case…
▽ More
Software testing assures that code changes do not adversely affect existing functionality. However, a test case can be flaky, i.e., passing and failing across executions, even for the same version of the source code. Flaky test cases introduce overhead to software development as they can lead to unnecessary attempts to debug production or testing code. The state-of-the-art ML-based flaky test case predictors rely on pre-defined sets of features that are either project-specific, require access to production code, which is not always available to software test engineers. Therefore, in this paper, we propose Flakify, a black-box, language model-based predictor for flaky test cases. Flakify relies exclusively on the source code of test cases, thus not requiring to (a) access to production code (black-box), (b) rerun test cases, (c) pre-define features. To this end, we employed CodeBERT, a pre-trained language model, and fine-tuned it to predict flaky test cases using the source code of test cases. We evaluated Flakify on two publicly available datasets (FlakeFlagger and IDoFT) for flaky test cases and compared our technique with the FlakeFlagger approach using two different evaluation procedures: cross-validation and per-project validation. Flakify achieved high F1-scores on both datasets using cross-validation and per-project validation, and surpassed FlakeFlagger by 10 and 18 percentage points in terms of precision and recall, respectively, when evaluated on the FlakeFlagger dataset, thus reducing the cost bound to be wasted on unnecessarily debugging test cases and production code by the same percentages. Flakify also achieved significantly higher prediction results when used to predict test cases on new projects, suggesting better generalizability over FlakeFlagger. Our results further show that a black-box version of FlakeFlagger is not a viable option for predicting flaky test cases.
△ Less
Submitted 16 August, 2022; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Arabic Interface Analysis Based on Cultural Markers
Authors:
Mohammadi Akheela Khanum,
Shameem Fatima,
Mousmi A. Chaurasia
Abstract:
This study examines the Arabic interface design elements that are largely influenced by the cultural values. Cultural markers are examined in websites from educational, business, and media. Cultural values analysis is based on Geert Hofstede's cultural dimensions. The findings show that there are cultural markers which are largely influenced by the culture and that the Hofstede's score for Arab co…
▽ More
This study examines the Arabic interface design elements that are largely influenced by the cultural values. Cultural markers are examined in websites from educational, business, and media. Cultural values analysis is based on Geert Hofstede's cultural dimensions. The findings show that there are cultural markers which are largely influenced by the culture and that the Hofstede's score for Arab countries is partially supported by the website design components examined in this study. Moderate support was also found for the long term orientation, for which Hoftsede has no score.
△ Less
Submitted 16 March, 2012;
originally announced March 2012.
-
Multi-Issue Negotiation with Deadlines
Authors:
S. S. Fatima,
N. R. Jennings,
M. J. Wooldridge
Abstract:
This paper studies bilateral multi-issue negotiation between self-interested autonomous agents. Now, there are a number of different procedures that can be used for this process; the three main ones being the package deal procedure in which all the issues are bundled and discussed together, the simultaneous procedure in which the issues are discussed simultaneously but independently of each other,…
▽ More
This paper studies bilateral multi-issue negotiation between self-interested autonomous agents. Now, there are a number of different procedures that can be used for this process; the three main ones being the package deal procedure in which all the issues are bundled and discussed together, the simultaneous procedure in which the issues are discussed simultaneously but independently of each other, and the sequential procedure in which the issues are discussed one after another. Since each of them yields a different outcome, a key problem is to decide which one to use in which circumstances. Specifically, we consider this question for a model in which the agents have time constraints (in the form of both deadlines and discount factors) and information uncertainty (in that the agents do not know the opponents utility function). For this model, we consider issues that are both independent and those that are interdependent and determine equilibria for each case for each procedure. In so doing, we show that the package deal is in fact the optimal procedure for each party. We then go on to show that, although the package deal may be computationally more complex than the other two procedures, it generates Pareto optimal outcomes (unlike the other two), it has similar earliest and latest possible times of agreement to the simultaneous procedure (which is better than the sequential procedure), and that it (like the other two procedures) generates a unique outcome only under certain conditions (which we define).
△ Less
Submitted 12 October, 2011;
originally announced October 2011.