Search | arXiv e-print repository

doi 10.1145/3715275.3732038

Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs)

Authors: Anna Neumann, Elisabeth Kirsten, Muhammad Bilal Zafar, Jatinder Singh

Abstract: System prompts in Large Language Models (LLMs) are predefined directives that guide model behaviour, taking precedence over user inputs in text processing and generation. LLM deployers increasingly use them to ensure consistent responses across contexts. While model providers set a foundation of system prompts, deployers and third-party developers can append additional prompts without visibility i… ▽ More System prompts in Large Language Models (LLMs) are predefined directives that guide model behaviour, taking precedence over user inputs in text processing and generation. LLM deployers increasingly use them to ensure consistent responses across contexts. While model providers set a foundation of system prompts, deployers and third-party developers can append additional prompts without visibility into others' additions, while this layered implementation remains entirely hidden from end-users. As system prompts become more complex, they can directly or indirectly introduce unaccounted for side effects. This lack of transparency raises fundamental questions about how the position of information in different directives shapes model outputs. As such, this work examines how the placement of information affects model behaviour. To this end, we compare how models process demographic information in system versus user prompts across six commercially available LLMs and 50 demographic groups. Our analysis reveals significant biases, manifesting in differences in user representation and decision-making scenarios. Since these variations stem from inaccessible and opaque system-level configurations, they risk representational, allocative and potential other biases and downstream harms beyond the user's ability to detect or correct. Our findings draw attention to these critical issues, which have the potential to perpetuate harms if left unexamined. Further, we argue that system prompt analysis must be incorporated into AI auditing processes, particularly as customisable system prompts become increasingly prevalent in commercial AI deployments. △ Less

Submitted 5 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

Comments: Forthcoming in Proceedings of ACM FAccT 2025

arXiv:2503.13577 [pdf, other]

When Should We Orchestrate Multiple Agents?

Authors: Umang Bhatt, Sanyam Kapoor, Mihir Upadhyay, Ilia Sucholutsky, Francesco Quinzan, Katherine M. Collins, Adrian Weller, Andrew Gordon Wilson, Muhammad Bilal Zafar

Abstract: Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost diff… ▽ More Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost differentials between agents. We then empirically demonstrate how orchestration between multiple agents can be helpful for selecting agents in a simulated environment, picking a learning strategy in the infamous Rogers' Paradox from social science, and outsourcing tasks to other agents during a question-answer task in a user study. △ Less

Submitted 17 March, 2025; originally announced March 2025.

arXiv:2503.02723 [pdf, other]

ImpedanceGPT: VLM-driven Impedance Control of Swarm of Mini-drones for Intelligent Navigation in Dynamic Environment

Authors: Faryal Batool, Malaika Zafar, Yasheerah Yaqoot, Roohan Ahmed Khan, Muhammad Haris Khan, Aleksey Fedoseev, Dzmitry Tsetserukou

Abstract: Swarm robotics plays a crucial role in enabling autonomous operations in dynamic and unpredictable environments. However, a major challenge remains ensuring safe and efficient navigation in environments filled with both dynamic alive (e.g., humans) and dynamic inanimate (e.g., non-living objects) obstacles. In this paper, we propose ImpedanceGPT, a novel system that combines a Vision-Language Mode… ▽ More Swarm robotics plays a crucial role in enabling autonomous operations in dynamic and unpredictable environments. However, a major challenge remains ensuring safe and efficient navigation in environments filled with both dynamic alive (e.g., humans) and dynamic inanimate (e.g., non-living objects) obstacles. In this paper, we propose ImpedanceGPT, a novel system that combines a Vision-Language Model (VLM) with retrieval-augmented generation (RAG) to enable real-time reasoning for adaptive navigation of mini-drone swarms in complex environments. The key innovation of ImpedanceGPT lies in the integration of VLM and RAG, which provides the drones with enhanced semantic understanding of their surroundings. This enables the system to dynamically adjust impedance control parameters in response to obstacle types and environmental conditions. Our approach not only ensures safe and precise navigation but also improves coordination between drones in the swarm. Experimental evaluations demonstrate the effectiveness of the system. The VLM-RAG framework achieved an obstacle detection and retrieval accuracy of 80 % under optimal lighting. In static environments, drones navigated dynamic inanimate obstacles at 1.4 m/s but slowed to 0.7 m/s with increased separation around humans. In dynamic environments, speed adjusted to 1.0 m/s near hard obstacles, while reducing to 0.6 m/s with higher deflection to safely avoid moving humans. △ Less

Submitted 4 March, 2025; originally announced March 2025.

Comments: Submitted to IROS 2025

arXiv:2502.18156 [pdf, other]

Can LLMs Explain Themselves Counterfactually?

Authors: Zahra Dehghanighobadi, Asja Fischer, Muhammad Bilal Zafar

Abstract: Explanations are an important tool for gaining insights into the behavior of ML models, calibrating user trust and ensuring regulatory compliance. Past few years have seen a flurry of post-hoc methods for generating model explanations, many of which involve computing model gradients or solving specially designed optimization problems. However, owing to the remarkable reasoning abilities of Large L… ▽ More Explanations are an important tool for gaining insights into the behavior of ML models, calibrating user trust and ensuring regulatory compliance. Past few years have seen a flurry of post-hoc methods for generating model explanations, many of which involve computing model gradients or solving specially designed optimization problems. However, owing to the remarkable reasoning abilities of Large Language Model (LLMs), self-explanation, that is, prompting the model to explain its outputs has recently emerged as a new paradigm. In this work, we study a specific type of self-explanations, self-generated counterfactual explanations (SCEs). We design tests for measuring the efficacy of LLMs in generating SCEs. Analysis over various LLM families, model sizes, temperature settings, and datasets reveals that LLMs sometimes struggle to generate SCEs. Even when they do, their prediction often does not agree with their own counterfactual reasoning. △ Less

Submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.06722 [pdf, other]

HetSwarm: Cooperative Navigation of Heterogeneous Swarm in Dynamic and Dense Environments through Impedance-based Guidance

Authors: Malaika Zafar, Roohan Ahmed Khan, Aleksey Fedoseev, Kumar Katyayan Jaiswal, Dzmitry Tsetserukou

Abstract: With the growing demand for efficient logistics and warehouse management, unmanned aerial vehicles (UAVs) are emerging as a valuable complement to automated guided vehicles (AGVs). UAVs enhance efficiency by navigating dense environments and operating at varying altitudes. However, their limited flight time, battery life, and payload capacity necessitate a supporting ground station. To address the… ▽ More With the growing demand for efficient logistics and warehouse management, unmanned aerial vehicles (UAVs) are emerging as a valuable complement to automated guided vehicles (AGVs). UAVs enhance efficiency by navigating dense environments and operating at varying altitudes. However, their limited flight time, battery life, and payload capacity necessitate a supporting ground station. To address these challenges, we propose HetSwarm, a heterogeneous multi-robot system that combines a UAV and a mobile ground robot for collaborative navigation in cluttered and dynamic conditions. Our approach employs an artificial potential field (APF)-based path planner for the UAV, allowing it to dynamically adjust its trajectory in real time. The ground robot follows this path while maintaining connectivity through impedance links, ensuring stable coordination. Additionally, the ground robot establishes temporal impedance links with low-height ground obstacles to avoid local collisions, as these obstacles do not interfere with the UAV's flight. Experimental validation of HetSwarm in diverse environmental conditions demonstrated a 90% success rate across 30 test cases. The ground robot exhibited an average deviation of 45 cm near obstacles, confirming effective collision avoidance. Extensive simulations in the Gym PyBullet environment further validated the robustness of our system for real-world applications, demonstrating its potential for dynamic, real-time task execution in cluttered environments. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: Manuscript has been submitted to ICUAS-2025

arXiv:2501.18403 [pdf, other]

Efficient Transformer for High Resolution Image Motion Deblurring

Authors: Amanturdieva Akmaral, Muhammad Hamza Zafar

Abstract: This paper presents a comprehensive study and improvement of the Restormer architecture for high-resolution image motion deblurring. We introduce architectural modifications that reduce model complexity by 18.4% while maintaining or improving performance through optimized attention mechanisms. Our enhanced training pipeline incorporates additional transformations including color jitter, Gaussian b… ▽ More This paper presents a comprehensive study and improvement of the Restormer architecture for high-resolution image motion deblurring. We introduce architectural modifications that reduce model complexity by 18.4% while maintaining or improving performance through optimized attention mechanisms. Our enhanced training pipeline incorporates additional transformations including color jitter, Gaussian blur, and perspective transforms to improve model robustness as well as a new frequency loss term. Extensive experiments on the RealBlur-R, RealBlur-J, and Ultra-High-Definition Motion blurred (UHDM) datasets demonstrate the effectiveness of our approach. The improved architecture shows better convergence behavior and reduced training time while maintaining competitive performance across challenging scenarios. We also provide detailed ablation studies analyzing the impact of our modifications on model behavior and performance. Our results suggest that thoughtful architectural simplification combined with enhanced training strategies can yield more efficient yet equally capable models for motion deblurring tasks. Code and Data Available at: https://github.com/hamzafer/image-deblurring △ Less

Submitted 30 January, 2025; originally announced January 2025.

Comments: 14 pages, 18 figures Submitted as a preprint, no prior journal/conference submission

arXiv:2410.22118 [pdf, ps, other]

The Impact of Inference Acceleration on Bias of LLMs

Authors: Elisabeth Kirsten, Ivan Habernal, Vedant Nanda, Muhammad Bilal Zafar

Abstract: Last few years have seen unprecedented advances in capabilities of Large Language Models (LLMs). These advancements promise to benefit a vast array of application domains. However, due to their immense size, performing inference with LLMs is both costly and slow. Consequently, a plethora of recent work has proposed strategies to enhance inference efficiency, e.g., quantization, pruning, and cachin… ▽ More Last few years have seen unprecedented advances in capabilities of Large Language Models (LLMs). These advancements promise to benefit a vast array of application domains. However, due to their immense size, performing inference with LLMs is both costly and slow. Consequently, a plethora of recent work has proposed strategies to enhance inference efficiency, e.g., quantization, pruning, and caching. These acceleration strategies reduce the inference cost and latency, often by several factors, while maintaining much of the predictive performance measured via common benchmarks. In this work, we explore another critical aspect of LLM performance: demographic bias in model generations due to inference acceleration optimizations. Using a wide range of metrics, we probe bias in model outputs from a number of angles. Analysis of outputs before and after inference acceleration shows significant change in bias. Worryingly, these bias effects are complex and unpredictable. A combination of an acceleration strategy and bias type may show little bias change in one model but may lead to a large effect in another. Our results highlight a need for in-depth and case-by-case evaluation of model bias after it has been modified to accelerate inference. △ Less

Submitted 5 June, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

arXiv:2410.07848 [pdf, other]

doi 10.1109/ROBIO64047.2024.10907517

SwarmPath: Drone Swarm Navigation through Cluttered Environments Leveraging Artificial Potential Field and Impedance Control

Authors: Roohan Ahmed Khan, Malaika Zafar, Amber Batool, Aleksey Fedoseev, Dzmitry Tsetserukou

Abstract: In the area of multi-drone systems, navigating through dynamic environments from start to goal while providing collision-free trajectory and efficient path planning is a significant challenge. To solve this problem, we propose a novel SwarmPath technology that involves the integration of Artificial Potential Field (APF) with Impedance Controller. The proposed approach provides a solution based on… ▽ More In the area of multi-drone systems, navigating through dynamic environments from start to goal while providing collision-free trajectory and efficient path planning is a significant challenge. To solve this problem, we propose a novel SwarmPath technology that involves the integration of Artificial Potential Field (APF) with Impedance Controller. The proposed approach provides a solution based on collision free leader-follower behaviour where drones are able to adapt themselves to the environment. Moreover, the leader is virtual while drones are physical followers leveraging APF path planning approach to find the smallest possible path to the target. Simultaneously, the drones dynamically adjust impedance links, allowing themselves to create virtual links with obstacles to avoid them. As compared to conventional APF, the proposed SwarmPath system not only provides smooth collision-avoidance but also enable agents to efficiently pass through narrow passages by reducing the total travel time by 30% while ensuring safety in terms of drones connectivity. Lastly, the results also illustrate that the discrepancies between simulated and real environment, exhibit an average absolute percentage error (APE) of 6% of drone trajectories. This underscores the reliability of our solution in real-world scenarios. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: Manuscript accepted in IEEE International Conference on Robotics and Biomimetics (IEEE ROBIO 2024)

arXiv:2407.12872 [pdf, other]

Evaluating Large Language Models with fmeval

Authors: Pola Schwöbel, Luca Franceschi, Muhammad Bilal Zafar, Keerthan Vasist, Aman Malhotra, Tomer Shenhar, Pinal Tailor, Pinar Yilmaz, Michael Diamond, Michele Donini

Abstract: fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific an… ▽ More fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2406.17446 [pdf, other]

Data-Driven Turbulence Modeling Approach for Cold-Wall Hypersonic Boundary Layers

Authors: Muhammad I. Zafar, Xuhui Zhou, Christopher J. Roy, David Stelter, Heng Xiao

Abstract: Wall-cooling effect in hypersonic boundary layers can significantly alter the near-wall turbulence behavior, which is not accurately modeled by traditional RANS turbulence models. To address this shortcoming, this paper presents a turbulence modeling approach for hypersonic flows with cold-wall conditions using an iterative ensemble Kalman method. Specifically, a neural-network-based turbulence mo… ▽ More Wall-cooling effect in hypersonic boundary layers can significantly alter the near-wall turbulence behavior, which is not accurately modeled by traditional RANS turbulence models. To address this shortcoming, this paper presents a turbulence modeling approach for hypersonic flows with cold-wall conditions using an iterative ensemble Kalman method. Specifically, a neural-network-based turbulence model is used to provide closure mapping from mean flow quantities to Reynolds stress as well as a variable turbulent Prandtl number. Sparse observation data of velocity and temperature are used to train the turbulence model. This approach is analyzed using direct numerical simulation database for zero-pressure gradient (ZPG) boundary layer flows over a flat plate with a Mach number between 6 and 14 and wall-to-recovery temperature ratios ranging from 0.18 to 0.76. Two training cases are conducted: 1) a single training case with observation data from one flow case, 2) a joint training case where data from two flow cases are simultaneously used for training. Trained models are also tested for generalizability on the remaining flow cases in each of the training cases. The results are also analyzed for insights to inform the future work towards enhancing the generalizability of the learned turbulence model. △ Less

Submitted 16 April, 2025; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.12318 [pdf, other]

Riemann problem for Aw-Rascle model with more realistic version of extended Chaplygin gas

Authors: Priyanka, M. Zafar

Abstract: The motivation of this study is to find the Riemann solutions of Aw-Rascle model with friction for a more realistic version of extended Chaplygin gas. Firstly, we established the $ δ$-shock wave in its solutions; indeed, by using generalized Rankine Hugoniot jump conditions the position, strength, and velocity of $ δ$-shock are obtained. Further, by analyzing the limiting behavior, it is found tha… ▽ More The motivation of this study is to find the Riemann solutions of Aw-Rascle model with friction for a more realistic version of extended Chaplygin gas. Firstly, we established the $ δ$-shock wave in its solutions; indeed, by using generalized Rankine Hugoniot jump conditions the position, strength, and velocity of $ δ$-shock are obtained. Further, by analyzing the limiting behavior, it is found that one of the Riemann solutions converges to $ δ$-shock solution as the pressure approaches to generalized Chaplygin gas pressure. Moreover, we obtained that our Riemann solutions converge to the corresponding solutions of the transport equations as pressure tends to zero. Furthermore, we explicitly construct the Riemann solutions of the inhomogeneous Aw-Rascle model. △ Less

Submitted 18 June, 2024; originally announced June 2024.

MSC Class: 35L03; 35L40; 35L65; 35L67

arXiv:2402.03129 [pdf, ps, other]

Clairaut anti-invariant Riemannian maps to Sasakian manifolds

Authors: Md Nadim Zafar, Adeeba Zaidi, Gauree Shanker

Abstract: In this paper, we investigate the geometry of Clairaut anti-invariant Riemannnian maps whose base space are Sasakian manifolds. We obtain the necessary and sufficient conditions for a curve on a base manifold to be geodesic. We obtain conditions for an anti-invariant Riemannian map to be Clairaut. Further, we discuss the biharmonicity of such maps and construct some illustrative examples. In this paper, we investigate the geometry of Clairaut anti-invariant Riemannnian maps whose base space are Sasakian manifolds. We obtain the necessary and sufficient conditions for a curve on a base manifold to be geodesic. We obtain conditions for an anti-invariant Riemannian map to be Clairaut. Further, we discuss the biharmonicity of such maps and construct some illustrative examples. △ Less

Submitted 30 August, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 13 pages

MSC Class: 53C15 (Primary); 53C25; 54C05 (Secondary)

arXiv:2312.14183 [pdf, other]

doi 10.1145/3637528.3671796

On Early Detection of Hallucinations in Factual Question Answering

Authors: Ben Snyder, Marius Moisescu, Muhammad Bilal Zafar

Abstract: While large language models (LLMs) have taken great strides towards helping humans with a plethora of tasks, hallucinations remain a major impediment towards gaining user trust. The fluency and coherence of model generations even when hallucinating makes detection a difficult task. In this work, we explore if the artifacts associated with the model generations can provide hints that the generation… ▽ More While large language models (LLMs) have taken great strides towards helping humans with a plethora of tasks, hallucinations remain a major impediment towards gaining user trust. The fluency and coherence of model generations even when hallucinating makes detection a difficult task. In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations. Specifically, we probe LLMs at 1) the inputs via Integrated Gradients based token attribution, 2) the outputs via the Softmax probabilities, and 3) the internal state via self-attention and fully-connected layer activations for signs of hallucinations on open-ended question answering tasks. Our results show that the distributions of these artifacts tend to differ between hallucinated and non-hallucinated generations. Building on this insight, we train binary classifiers that use these artifacts as input features to classify model generations into hallucinations and non-hallucinations. These hallucination classifiers achieve up to $0.80$ AUROC. We also show that tokens preceding a hallucination can already predict the subsequent hallucination even before it occurs. △ Less

Submitted 22 August, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: KDD 2024

arXiv:2312.11842 [pdf, other]

Neural operator-based super-fidelity: A warm-start approach for accelerating steady-state simulations

Authors: Xu-Hui Zhou, Jiequn Han, Muhammad I. Zafar, Eric M. Wolf, Christopher R. Schrock, Christopher J. Roy, Heng Xiao

Abstract: Recently, the use of neural networks to accelerate the solving of partial differential equations (PDEs) has gained significant traction in both academia and industry. However, employing neural networks as standalone surrogate models raises concerns about solution reliability, especially in precision-critical scientific tasks. This study introduces a novel "super-fidelity" method that leverages neu… ▽ More Recently, the use of neural networks to accelerate the solving of partial differential equations (PDEs) has gained significant traction in both academia and industry. However, employing neural networks as standalone surrogate models raises concerns about solution reliability, especially in precision-critical scientific tasks. This study introduces a novel "super-fidelity" method that leverages neural networks for warm-starting steady-state PDE solvers, ensuring both efficiency and accuracy. Inspired by super-resolution techniques in computer vision, this method maps low-fidelity solutions to high-fidelity targets using a vector-cloud neural network with equivariance (VCNN-e), a neural operator that preserves all necessary invariance and equivariance properties for scalar and vector predictions while seamlessly adapting to different spatial discretizations. We evaluated this approach in three scenarios: (1) a weakly nonlinear case involving low Reynolds number flows around elliptical cylinders, (2) a strongly nonlinear case with high Reynolds number flows over airfoils, and (3) a practical case with high Reynolds number flows over a wing. In all cases, the neural operator-based initialization accelerated convergence by at least two-fold compared to traditional methods, without sacrificing accuracy. The method's robustness and scalability are further demonstrated across different linear equation solvers and multi-process computing configurations. It also achieves overall time savings in scenarios with multiple simulations, even when accounting for model development time. Overall, our approach provides an effective means to accelerate steady-state PDE solutions using neural operators, maintaining high accuracy while significantly improving computational efficiency, particularly in precision-driven scientific applications. △ Less

Submitted 26 February, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.10268 [pdf, other]

The MAGPI Survey: Drivers of kinematic asymmetries in the ionised gas of $z\sim0.3$ star-forming galaxies

Authors: R. S. Bagge, C. Foster, A. Battisti, S. Bellstedt, M. Mun, K. Harborne, S. Barsanti, T. Mendel, S. Brough, S. M. Croom, C. D. P. Lagos, T. Mukherjee, Y. Peng, R-S. Remus, G. Santucci, P. Sharda, S. Thater, J. van de Sande, L. M. Valenzuela E. Wisnioski T. Zafar, B. Ziegler

Abstract: Galaxy gas kinematics are sensitive to the physical processes that contribute to a galaxy's evolution. It is expected that external processes will cause more significant kinematic disturbances in the outer regions, while internal processes will cause more disturbances for the inner regions. Using a subsample of 47 galaxies ($0.27<z<0.36$) from the Middle Ages Galaxy Properties with Integral Field… ▽ More Galaxy gas kinematics are sensitive to the physical processes that contribute to a galaxy's evolution. It is expected that external processes will cause more significant kinematic disturbances in the outer regions, while internal processes will cause more disturbances for the inner regions. Using a subsample of 47 galaxies ($0.27<z<0.36$) from the Middle Ages Galaxy Properties with Integral Field Spectroscopy (MAGPI) survey, we conduct a study into the source of kinematic disturbances by measuring the asymmetry present in the ionised gas line-of-sight velocity maps at the $0.5R_e$ (inner regions) and $1.5R_e$ (outer regions) elliptical annuli. By comparing the inner and outer kinematic asymmetries, we aim to better understand what physical processes are driving the asymmetries in galaxies. We find the local environment plays a role in kinematic disturbance, in agreement with other integral field spectroscopy studies of the local universe, with most asymmetric systems being in close proximity to a more massive neighbour. We do not find evidence suggesting that hosting an Active Galactic Nucleus (AGN) contributes to asymmetry within the inner regions, with some caveats due to emission line modelling. In contrast to previous studies, we do not find evidence that processes leading to asymmetry also enhance star formation in MAGPI galaxies. Finally, we find a weak anti-correlation between stellar mass and asymmetry (ie. high stellar mass galaxies are less asymmetric). We conclude by discussing possible sources driving the asymmetry in the ionised gas, such as disturbances being present in the colder gas phase (either molecular or atomic) prior to the gas being ionised, and non-axisymmetric features (e.g., a bar) being present in the galactic disk. Our results highlight the complex interplay between ionised gas kinematic disturbances and physical processes involved in galaxy evolution. △ Less

Submitted 28 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: e.g., 20 pages, 19 figures

arXiv:2310.17989 [pdf]

Uncovering a Paleotsunami Triggered by Mass-Movement in an Alpine Lake

Authors: Muhammad Naveed Zafar, Denys Dutykh, Pierre Sabatier, Mathilde Banjan, Jihwan Kim

Abstract: Mass movements and delta collapses are significant sources of tsunamis in lacustrine environments, impacting human societies enormously. Palaeotsunamis play an essential role in understanding historical events and their consequences along with their return periods. Here, we focus on a palaeo event that occurred during the Younger Dryas to Early Holocene climatic transition, ca., 12,000 years ago i… ▽ More Mass movements and delta collapses are significant sources of tsunamis in lacustrine environments, impacting human societies enormously. Palaeotsunamis play an essential role in understanding historical events and their consequences along with their return periods. Here, we focus on a palaeo event that occurred during the Younger Dryas to Early Holocene climatic transition, ca., 12,000 years ago in the Lake Aiguebelette (NW Alps, France). Based on highresolution seismic and bathymetric surveys and sedimentological, geochemical, and magnetic analyses, a seismically induced large mass transport deposit with an initial volume of 767172 m3 was identified, dated and mapped. To investigate whether this underwater mass transport produced a palaeotsunami in the Lake Aiguebelette, this research combines sedimentary records and numerical models. Numerical simulations of tsunamis are performed using a viscoplastic landslide model for tsunami source generation and two-dimensional depth-averaged nonlinear shallow water equations for tsunami wave propagation and inundation modelling. Our simulations conclude that this sublacustrine landslide produced a tsunami wave with a maximum amplitude of approximately 2 m and run-up heights of up to 3.6 m. The modelled sediment thickness resulting from this mass transport corroborates well with the event deposits mapped in the lake. Based on our results, we suggest that this sublacustrine mass transport generated a significant tsunami wave that has not been reported previously to the best of our knowledge. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: Advances in Hydroinformatics, O. Delestre (Polytech Nice Sophia -- University C{ô}te d'Azur, France), Nov 2023, Chatou, France

arXiv:2302.13319 [pdf, other]

Efficient fair PCA for fair representation learning

Authors: Matthäus Kleindessner, Michele Donini, Chris Russell, Muhammad Bilal Zafar

Abstract: We revisit the problem of fair principal component analysis (PCA), where the goal is to learn the best low-rank linear approximation of the data that obfuscates demographic information. We propose a conceptually simple approach that allows for an analytic solution similar to standard PCA and can be kernelized. Our methods have the same complexity as standard PCA, or kernel PCA, and run much faster… ▽ More We revisit the problem of fair principal component analysis (PCA), where the goal is to learn the best low-rank linear approximation of the data that obfuscates demographic information. We propose a conceptually simple approach that allows for an analytic solution similar to standard PCA and can be kernelized. Our methods have the same complexity as standard PCA, or kernel PCA, and run much faster than existing methods for fair PCA based on semidefinite programming or manifold optimization, while achieving similar results. △ Less

Submitted 26 February, 2023; originally announced February 2023.

arXiv:2302.09394 [pdf]

Deep Neural Networks based Meta-Learning for Network Intrusion Detection

Authors: Anabia Sohail, Bibi Ayisha, Irfan Hameed, Muhammad Mohsin Zafar, Hani Alquhayz, Asifullah Khan

Abstract: The digitization of different components of industry and inter-connectivity among indigenous networks have increased the risk of network attacks. Designing an intrusion detection system to ensure security of the industrial ecosystem is difficult as network traffic encompasses various attack types, including new and evolving ones with minor changes. The data used to construct a predictive model for… ▽ More The digitization of different components of industry and inter-connectivity among indigenous networks have increased the risk of network attacks. Designing an intrusion detection system to ensure security of the industrial ecosystem is difficult as network traffic encompasses various attack types, including new and evolving ones with minor changes. The data used to construct a predictive model for computer networks has a skewed class distribution and limited representation of attack types, which differ from real network traffic. These limitations result in dataset shift, negatively impacting the machine learning models' predictive abilities and reducing the detection rate against novel attacks. To address the challenges, we propose a novel deep neural network based Meta-Learning framework; INformation FUsion and Stacking Ensemble (INFUSE) for network intrusion detection. First, a hybrid feature space is created by integrating decision and feature spaces. Five different classifiers are utilized to generate a pool of decision spaces. The feature space is then enriched through a deep sparse autoencoder that learns the semantic relationships between attacks. Finally, the deep Meta-Learner acts as an ensemble combiner to analyze the hybrid feature space and make a final decision. Our evaluation on stringent benchmark datasets and comparison to existing techniques showed the effectiveness of INFUSE with an F-Score of 0.91, Accuracy of 91.6%, and Recall of 0.94 on the Test+ dataset, and an F-Score of 0.91, Accuracy of 85.6%, and Recall of 0.87 on the stringent Test-21 dataset. These promising results indicate the strong generalization capability and the potential to detect network attacks. △ Less

Submitted 28 July, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: Pages: 15, Figures: 10 and Tables: 9

arXiv:2212.13897 [pdf, other]

What You Like: Generating Explainable Topical Recommendations for Twitter Using Social Annotations

Authors: Parantapa Bhattacharya, Saptarshi Ghosh, Muhammad Bilal Zafar, Soumya K. Ghosh, Niloy Ganguly

Abstract: With over 500 million tweets posted per day, in Twitter, it is difficult for Twitter users to discover interesting content from the deluge of uninteresting posts. In this work, we present a novel, explainable, topical recommendation system, that utilizes social annotations, to help Twitter users discover tweets, on topics of their interest. A major challenge in using traditional rating dependent r… ▽ More With over 500 million tweets posted per day, in Twitter, it is difficult for Twitter users to discover interesting content from the deluge of uninteresting posts. In this work, we present a novel, explainable, topical recommendation system, that utilizes social annotations, to help Twitter users discover tweets, on topics of their interest. A major challenge in using traditional rating dependent recommendation systems, like collaborative filtering and content based systems, in high volume social networks is that, due to attention scarcity most items do not get any ratings. Additionally, the fact that most Twitter users are passive consumers, with 44% users never tweeting, makes it very difficult to use user ratings for generating recommendations. Further, a key challenge in developing recommendation systems is that in many cases users reject relevant recommendations if they are totally unfamiliar with the recommended item. Providing a suitable explanation, for why the item is recommended, significantly improves the acceptability of recommendation. By virtue of being a topical recommendation system our method is able to present simple topical explanations for the generated recommendations. Comparisons with state-of-the-art matrix factorization based collaborative filtering, content based and social recommendations demonstrate the efficacy of the proposed approach. △ Less

Submitted 23 December, 2022; originally announced December 2022.

arXiv:2211.07712 [pdf, other]

Cloning Ideology and Style using Deep Learning

Authors: Omer Beg, Muhammad Nasir Zafar, Waleed Anjum

Abstract: Text generation tasks have gotten the attention of researchers in the last few years because of their applications on a large scale.In the past, many researchers focused on task-based text generations.Our research focuses on text generation based on the ideology and style of a specific author, and text generation on a topic that was not written by the same author in the past.Our trained model requ… ▽ More Text generation tasks have gotten the attention of researchers in the last few years because of their applications on a large scale.In the past, many researchers focused on task-based text generations.Our research focuses on text generation based on the ideology and style of a specific author, and text generation on a topic that was not written by the same author in the past.Our trained model requires an input prompt containing initial few words of text to produce a few paragraphs of text based on the ideology and style of the author on which the model is trained.Our methodology to accomplish this task is based on Bi-LSTM.The Bi-LSTM model is used to make predictions at the character level, during the training corpus of a specific author is used along with the ground truth corpus.A pre-trained model is used to identify the sentences of ground truth having contradiction with the author's corpus to make our language model inclined.During training, we have achieved a perplexity score of 2.23 at the character level. The experiments show a perplexity score of around 3 over the test dataset. △ Less

Submitted 25 October, 2022; originally announced November 2022.

Comments: 11 pages, 7 figures, 3 tables

arXiv:2209.06624 [pdf]

Digital 'nudges' to increase childhood vaccination compliance: Evidence from Pakistan

Authors: Shehryar Munir, Farah Said, Umar Taj, Maida Zafar

Abstract: Pakistan has one of the lowest rates of routine childhood immunization worldwide, with only two-thirds of infants 2 years or younger being fully immunized (Pakistan Demographic and Health Survey 2019). Government-led, routine information campaigns have been disrupted over the last few years due to the on-going COVID-19 pandemic. We use data from a mobile-based campaign that involved sending out sh… ▽ More Pakistan has one of the lowest rates of routine childhood immunization worldwide, with only two-thirds of infants 2 years or younger being fully immunized (Pakistan Demographic and Health Survey 2019). Government-led, routine information campaigns have been disrupted over the last few years due to the on-going COVID-19 pandemic. We use data from a mobile-based campaign that involved sending out short audio dramas emphasizing the importance of vaccines and parental responsibilities in Quetta, Pakistan. Five out of eleven areas designated by the provincial government were randomly selected to receive the audio calls with a lag of 3 months and form the comparison group in our analysis. We conduct a difference-in-difference analysis on data collected by the provincial Department of Health in the 3-month study and find a significant 30% increase over the comparison mean in the number of fully vaccinated children in campaign areas on average. We find evidence that suggests vaccination increased in UCs where vaccination centers were within a short 30-minute travel distance, and that the campaign was successful in changing perceptions about vaccination and reliable sources of advice. Results highlight the need for careful design and targeting of similar soft behavioral change campaigns, catering to the constraints and abilities of the context. △ Less

Submitted 14 September, 2022; originally announced September 2022.

arXiv:2205.05161 [pdf, ps, other]

Numerical Solution of the Savage-Hutter Equations for Granular Avalanche Flow using the Discontinuous Galerkin Method

Authors: Abdullah Shah, Muhammad Naveed Zafar, Yulong Du, Li Yuan

Abstract: The Savage-Hutter (SH) equations are a hyperbolic system of nonlinear partial differential equations describing the temporal evolution of the depth and depth averaged velocity for modelling the avalanche of a shallow layer of granular materials on an inclined surface. These equations admit the occurrence of shock waves and vacuum fronts as in the shallow-water equations while possessing the specia… ▽ More The Savage-Hutter (SH) equations are a hyperbolic system of nonlinear partial differential equations describing the temporal evolution of the depth and depth averaged velocity for modelling the avalanche of a shallow layer of granular materials on an inclined surface. These equations admit the occurrence of shock waves and vacuum fronts as in the shallow-water equations while possessing the special reposing state of granular material. In this paper, we develop a third-order Runge-Kutta discontinuous Galerkin (RKDG) method for the numerical solution of the one-dimensional SH equations. We adopt a TVD slope limiter to suppress numerical oscillations near discontinuities. And we give numerical treatments for the avalanche front and for the bed friction to achieve the well-balanced reposing property of granular materials. Numerical results of the avalanche of cohesionless dry granular materials down an inclined and smoothly transitioned to horizontal plane under various internal and bed friction angles and slope angles are given to show the performance of the present numerical scheme. △ Less

Submitted 29 April, 2022; originally announced May 2022.

Comments: 28 pages, 9 figures

arXiv:2203.11103 [pdf, other]

Diverse Counterfactual Explanations for Anomaly Detection in Time Series

Authors: Deborah Sulem, Michele Donini, Muhammad Bilal Zafar, Francois-Xavier Aubet, Jan Gasthaus, Tim Januschowski, Sanjiv Das, Krishnaram Kenthapadi, Cedric Archambeau

Abstract: Data-driven methods that detect anomalies in times series data are ubiquitous in practice, but they are in general unable to provide helpful explanations for the predictions they make. In this work we propose a model-agnostic algorithm that generates counterfactual ensemble explanations for time series anomaly detection models. Our method generates a set of diverse counterfactual examples, i.e, mu… ▽ More Data-driven methods that detect anomalies in times series data are ubiquitous in practice, but they are in general unable to provide helpful explanations for the predictions they make. In this work we propose a model-agnostic algorithm that generates counterfactual ensemble explanations for time series anomaly detection models. Our method generates a set of diverse counterfactual examples, i.e, multiple perturbed versions of the original time series that are not considered anomalous by the detection model. Since the magnitude of the perturbations is limited, these counterfactuals represent an ensemble of inputs similar to the original time series that the model would deem normal. Our algorithm is applicable to any differentiable anomaly detection model. We investigate the value of our method on univariate and multivariate real-world datasets and two deep-learning-based anomaly detection models, under several explainability criteria previously proposed in other data domains such as Validity, Plausibility, Closeness and Diversity. We show that our algorithm can produce ensembles of counterfactual examples that satisfy these criteria and thanks to a novel type of visualisation, can convey a richer interpretation of a model's internal mechanism than existing methods. Moreover, we design a sparse variant of our method to improve the interpretability of counterfactual explanations for high-dimensional time series anomalies. In this setting, our explanation is localised on only a few dimensions and can therefore be communicated more efficiently to the model's user. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 24 pages, 11 figures

arXiv:2112.14769 [pdf, other]

doi 10.4208/cicp.OA-2021-0256

Frame invariance and scalability of neural operators for partial differential equations

Authors: Muhammad I. Zafar, Jiequn Han, Xu-Hui Zhou, Heng Xiao

Abstract: Partial differential equations (PDEs) play a dominant role in the mathematical modeling of many complex dynamical processes. Solving these PDEs often requires prohibitively high computational costs, especially when multiple evaluations must be made for different parameters or conditions. After training, neural operators can provide PDEs solutions significantly faster than traditional PDE solvers.… ▽ More Partial differential equations (PDEs) play a dominant role in the mathematical modeling of many complex dynamical processes. Solving these PDEs often requires prohibitively high computational costs, especially when multiple evaluations must be made for different parameters or conditions. After training, neural operators can provide PDEs solutions significantly faster than traditional PDE solvers. In this work, invariance properties and computational complexity of two neural operators are examined for transport PDE of a scalar quantity. Neural operator based on graph kernel network (GKN) operates on graph-structured data to incorporate nonlocal dependencies. Here we propose a modified formulation of GKN to achieve frame invariance. Vector cloud neural network (VCNN) is an alternate neural operator with embedded frame invariance which operates on point cloud data. GKN-based neural operator demonstrates slightly better predictive performance compared to VCNN. However, GKN requires an excessively high computational cost that increases quadratically with the increasing number of discretized objects as compared to a linear increase for VCNN. △ Less

Submitted 27 December, 2021; originally announced December 2021.

arXiv:2112.12444 [pdf, other]

More Than Words: Towards Better Quality Interpretations of Text Classifiers

Authors: Muhammad Bilal Zafar, Philipp Schmidt, Michele Donini, Cédric Archambeau, Felix Biessmann, Sanjiv Ranjan Das, Krishnaram Kenthapadi

Abstract: The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to input tokens. However, prior work, using differen… ▽ More The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to input tokens. However, prior work, using different randomization tests, has shown that interpretations generated by these methods may not be robust. For instance, models making the same predictions on the test set may still lead to different feature importance rankings. In order to address the lack of robustness of token-based interpretability, we explore explanations at higher semantic levels like sentences. We use computational metrics and human subject studies to compare the quality of sentence-based interpretations against token-based ones. Our experiments show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher granularity level. Based on these findings, we show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations. △ Less

Submitted 23 December, 2021; originally announced December 2021.

arXiv:2111.13657 [pdf, other]

Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

Authors: David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini, Krishnaram Kenthapadi

Abstract: With the increasing adoption of machine learning (ML) models and systems in high-stakes settings across different industries, guaranteeing a model's performance after deployment has become crucial. Monitoring models in production is a critical aspect of ensuring their continued performance and reliability. We present Amazon SageMaker Model Monitor, a fully managed service that continuously monitor… ▽ More With the increasing adoption of machine learning (ML) models and systems in high-stakes settings across different industries, guaranteeing a model's performance after deployment has become crucial. Monitoring models in production is a critical aspect of ensuring their continued performance and reliability. We present Amazon SageMaker Model Monitor, a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker. Our system automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models. We describe the key requirements obtained from customers, system design and architecture, and methodology for detecting different types of drift. Further, we provide quantitative evaluations followed by use cases, insights, and lessons learned from more than two years of production deployment. △ Less

Submitted 5 August, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

arXiv:2109.03285 [pdf, other]

doi 10.1145/3447548.3467177

Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability in the Cloud

Authors: Michaela Hardt, Xiaoguang Chen, Xiaoyi Cheng, Michele Donini, Jason Gelman, Satish Gollaprolu, John He, Pedro Larroy, Xinyu Liu, Nick McCarthy, Ashish Rathi, Scott Rees, Ankit Siva, ErhYuan Tsai, Keerthan Vasist, Pinar Yilmaz, Muhammad Bilal Zafar, Sanjiv Das, Kevin Haas, Tyler Hill, Krishnaram Kenthapadi

Abstract: Understanding the predictions made by machine learning (ML) models and their potential biases remains a challenging and labor-intensive task that depends on the application, the dataset, and the specific model. We present Amazon SageMaker Clarify, an explainability feature for Amazon SageMaker that launched in December 2020, providing insights into data and ML models by identifying biases and expl… ▽ More Understanding the predictions made by machine learning (ML) models and their potential biases remains a challenging and labor-intensive task that depends on the application, the dataset, and the specific model. We present Amazon SageMaker Clarify, an explainability feature for Amazon SageMaker that launched in December 2020, providing insights into data and ML models by identifying biases and explaining predictions. It is deeply integrated into Amazon SageMaker, a fully managed service that enables data scientists and developers to build, train, and deploy ML models at any scale. Clarify supports bias detection and feature importance computation across the ML lifecycle, during data preparation, model evaluation, and post-deployment monitoring. We outline the desiderata derived from customer input, the modular architecture, and the methodology for bias and explanation computations. Further, we describe the technical challenges encountered and the tradeoffs we had to make. For illustration, we discuss two customer use cases. We present our deployment results including qualitative customer feedback and a quantitative evaluation. Finally, we summarize lessons learned, and discuss best practices for the successful adoption of fairness and explanation tools in practice. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Journal ref: In Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2974-2983 (2021)

arXiv:2107.05978 [pdf, other]

DIVINE: Diverse Influential Training Points for Data Visualization and Model Refinement

Authors: Umang Bhatt, Isabel Chien, Muhammad Bilal Zafar, Adrian Weller

Abstract: As the complexity of machine learning (ML) models increases, resulting in a lack of prediction explainability, several methods have been developed to explain a model's behavior in terms of the training data points that most influence the model. However, these methods tend to mark outliers as highly influential points, limiting the insights that practitioners can draw from points that are not repre… ▽ More As the complexity of machine learning (ML) models increases, resulting in a lack of prediction explainability, several methods have been developed to explain a model's behavior in terms of the training data points that most influence the model. However, these methods tend to mark outliers as highly influential points, limiting the insights that practitioners can draw from points that are not representative of the training data. In this work, we take a step towards finding influential training points that also represent the training data well. We first review methods for assigning importance scores to training points. Given importance scores, we propose a method to select a set of DIVerse INfluEntial (DIVINE) training points as a useful explanation of model behavior. As practitioners might not only be interested in finding data points influential with respect to model accuracy, but also with respect to other important metrics, we show how to evaluate training data points on the basis of group fairness. Our method can identify unfairness-inducing training points, which can be removed to improve fairness outcomes. Our quantitative experiments and user studies show that visualizing DIVINE points helps practitioners understand and explain model behavior better than earlier approaches. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 30 pages, 32 figures

arXiv:2107.04566 [pdf]

Multi-level Stress Assessment from ECG in a Virtual Reality Environment using Multimodal Fusion

Authors: Zeeshan Ahmad, Suha Rabbani, Muhammad Rehman Zafar, Syem Ishaque, Sridhar Krishnan, Naimul Khan

Abstract: ECG is an attractive option to assess stress in serious Virtual Reality (VR) applications due to its non-invasive nature. However, the existing Machine Learning (ML) models perform poorly. Moreover, existing studies only perform a binary stress assessment, while to develop a more engaging biofeedback-based application, multi-level assessment is necessary. Existing studies annotate and classify a s… ▽ More ECG is an attractive option to assess stress in serious Virtual Reality (VR) applications due to its non-invasive nature. However, the existing Machine Learning (ML) models perform poorly. Moreover, existing studies only perform a binary stress assessment, while to develop a more engaging biofeedback-based application, multi-level assessment is necessary. Existing studies annotate and classify a single experience (e.g. watching a VR video) to a single stress level, which again prevents design of dynamic experiences where real-time in-game stress assessment can be utilized. In this paper, we report our findings on a new study on VR stress assessment, where three stress levels are assessed. ECG data was collected from 9 users experiencing a VR roller coaster. The VR experience was then manually labeled in 10-seconds segments to three stress levels by three raters. We then propose a novel multimodal deep fusion model utilizing spectrogram and 1D ECG that can provide a stress prediction from just a 1-second window. Experimental results demonstrate that the proposed model outperforms the classical HRV-based ML models (9% increase in accuracy) and baseline deep learning models (2.5% increase in accuracy). We also report results on the benchmark WESAD dataset to show the supremacy of the model. △ Less

Submitted 9 July, 2021; originally announced July 2021.

Comments: Under review

arXiv:2106.12639 [pdf, other]

Multi-objective Asynchronous Successive Halving

Authors: Robin Schmucker, Michele Donini, Muhammad Bilal Zafar, David Salinas, Cédric Archambeau

Abstract: Hyperparameter optimization (HPO) is increasingly used to automatically tune the predictive performance (e.g., accuracy) of machine learning models. However, in a plethora of real-world applications, accuracy is only one of the multiple -- often conflicting -- performance criteria, necessitating the adoption of a multi-objective (MO) perspective. While the literature on MO optimization is rich, fe… ▽ More Hyperparameter optimization (HPO) is increasingly used to automatically tune the predictive performance (e.g., accuracy) of machine learning models. However, in a plethora of real-world applications, accuracy is only one of the multiple -- often conflicting -- performance criteria, necessitating the adoption of a multi-objective (MO) perspective. While the literature on MO optimization is rich, few prior studies have focused on HPO. In this paper, we propose algorithms that extend asynchronous successive halving (ASHA) to the MO setting. Considering multiple evaluation metrics, we assess the performance of these methods on three real world tasks: (i) Neural architecture search, (ii) algorithmic fairness and (iii) language model optimization. Our empirical analysis shows that MO ASHA enables to perform MO HPO at scale. Further, we observe that that taking the entire Pareto front into account for candidate selection consistently outperforms multi-fidelity HPO based on MO scalarization in terms of wall-clock time. Our algorithms (to be open-sourced) establish new baselines for future research in the area. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2106.07359 [pdf, other]

MexPub: Deep Transfer Learning for Metadata Extraction from German Publications

Authors: Zeyd Boukhers, Nada Beili, Timo Hartmann, Prantik Goswami, Muhammad Arslan Zafar

Abstract: Extracting metadata from scientific papers can be considered a solved problem in NLP due to the high accuracy of state-of-the-art methods. However, this does not apply to German scientific publications, which have a variety of styles and layouts. In contrast to most of the English scientific publications that follow standard and simple layouts, the order, content, position and size of metadata in… ▽ More Extracting metadata from scientific papers can be considered a solved problem in NLP due to the high accuracy of state-of-the-art methods. However, this does not apply to German scientific publications, which have a variety of styles and layouts. In contrast to most of the English scientific publications that follow standard and simple layouts, the order, content, position and size of metadata in German publications vary greatly among publications. This variety makes traditional NLP methods fail to accurately extract metadata from these publications. In this paper, we present a method that extracts metadata from PDF documents with different layouts and styles by viewing the document as an image. We used Mask R-CNN that is trained on COCO dataset and finetuned with PubLayNet dataset that consists of ~200K PDF snapshots with five basic classes (e.g. text, figure, etc). We refine-tuned the model on our proposed synthetic dataset consisting of ~30K article snapshots to extract nine patterns (i.e. author, title, etc). Our synthetic dataset is generated using contents in both languages German and English and a finite set of challenging templates obtained from German publications. Our method achieved an average accuracy of around $90\%$ which validates its capability to accurately extract metadata from a variety of PDF documents with challenging templates. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Comments: A long version of an accepted paper @ JCDL 2021

arXiv:2106.04631 [pdf, other]

On the Lack of Robust Interpretability of Neural Text Classifiers

Authors: Muhammad Bilal Zafar, Michele Donini, Dylan Slack, Cédric Archambeau, Sanjiv Das, Krishnaram Kenthapadi

Abstract: With the ever-increasing complexity of neural language models, practitioners have turned to methods for understanding the predictions of these models. One of the most well-adopted approaches for model interpretability is feature-based interpretability, i.e., ranking the features in terms of their impact on model predictions. Several prior studies have focused on assessing the fidelity of feature-b… ▽ More With the ever-increasing complexity of neural language models, practitioners have turned to methods for understanding the predictions of these models. One of the most well-adopted approaches for model interpretability is feature-based interpretability, i.e., ranking the features in terms of their impact on model predictions. Several prior studies have focused on assessing the fidelity of feature-based interpretability methods, i.e., measuring the impact of dropping the top-ranked features on the model output. However, relatively little work has been conducted on quantifying the robustness of interpretations. In this work, we assess the robustness of interpretations of neural text classifiers, specifically, those based on pretrained Transformer encoders, using two randomization tests. The first compares the interpretations of two models that are identical except for their initializations. The second measures whether the interpretations differ between a model with trained parameters and a model with random parameters. Both tests show surprising deviations from expected behavior, raising questions about the extent of insights that practitioners may draw from interpretations. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: Appearing at ACL Findings 2021

arXiv:2105.04273 [pdf, other]

doi 10.1145/3461702.3462630

Loss-Aversively Fair Classification

Authors: Junaid Ali, Muhammad Bilal Zafar, Adish Singla, Krishna P. Gummadi

Abstract: The use of algorithmic (learning-based) decision making in scenarios that affect human lives has motivated a number of recent studies to investigate such decision making systems for potential unfairness, such as discrimination against subjects based on their sensitive features like gender or race. However, when judging the fairness of a newly designed decision making system, these studies have ove… ▽ More The use of algorithmic (learning-based) decision making in scenarios that affect human lives has motivated a number of recent studies to investigate such decision making systems for potential unfairness, such as discrimination against subjects based on their sensitive features like gender or race. However, when judging the fairness of a newly designed decision making system, these studies have overlooked an important influence on people's perceptions of fairness, which is how the new algorithm changes the status quo, i.e., decisions of the existing decision making system. Motivated by extensive literature in behavioral economics and behavioral psychology (prospect theory), we propose a notion of fair updates that we refer to as loss-averse updates. Loss-averse updates constrain the updates to yield improved (more beneficial) outcomes to subjects compared to the status quo. We propose tractable proxy measures that would allow this notion to be incorporated in the training of a variety of linear and non-linear classifiers. We show how our proxy measures can be combined with existing measures for training nondiscriminatory classifiers. Our evaluation using synthetic and real-world datasets demonstrates that the proposed proxy measures are effective for their desired tasks. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: 8 pages, Accepted at AIES 2019

Journal ref: In AAAI/ACM Conference on AI, Ethics, and Society (AIES 2019), January 27-28 2019 Honolulu, HI, USA

arXiv:2105.03153 [pdf, other]

Pairwise Fairness for Ordinal Regression

Authors: Matthäus Kleindessner, Samira Samadi, Muhammad Bilal Zafar, Krishnaram Kenthapadi, Chris Russell

Abstract: We initiate the study of fairness for ordinal regression. We adapt two fairness notions previously considered in fair ranking and propose a strategy for training a predictor that is approximately fair according to either notion. Our predictor has the form of a threshold model, composed of a scoring function and a set of thresholds, and our strategy is based on a reduction to fair binary classifica… ▽ More We initiate the study of fairness for ordinal regression. We adapt two fairness notions previously considered in fair ranking and propose a strategy for training a predictor that is approximately fair according to either notion. Our predictor has the form of a threshold model, composed of a scoring function and a set of thresholds, and our strategy is based on a reduction to fair binary classification for learning the scoring function and local search for choosing the thresholds. We provide generalization guarantees on the error and fairness violation of our predictor, and we illustrate the effectiveness of our approach in extensive experiments. △ Less

Submitted 11 February, 2022; v1 submitted 7 May, 2021; originally announced May 2021.

arXiv:2103.13898 [pdf, other]

Recurrent Neural Network for End-to-End Modeling of Laminar-Turbulent Transition

Authors: Muhammad I. Zafar, Meelan M. Choudhari, Pedro Paredes, Heng Xiao

Abstract: Accurate prediction of laminar-turbulent transition is a critical element of computational fluid dynamics simulations for aerodynamic design across multiple flow regimes. Traditional methods of transition prediction cannot be easily extended to flow configurations where the transition process depends on a large set of parameters. In comparison, neural network methods allow higher dimensional input… ▽ More Accurate prediction of laminar-turbulent transition is a critical element of computational fluid dynamics simulations for aerodynamic design across multiple flow regimes. Traditional methods of transition prediction cannot be easily extended to flow configurations where the transition process depends on a large set of parameters. In comparison, neural network methods allow higher dimensional input features to be considered without compromising the efficiency and accuracy of the traditional data driven models. Neural network methods proposed earlier follow a cumbersome methodology of predicting instability growth rates over a broad range of frequencies, which are then processed to obtain the N-factor envelope, and then, the transition location based on the correlating N-factor. This paper presents an end-to-end transition model based on a recurrent neural network, which sequentially processes the mean boundary-layer profiles along the surface of the aerodynamic body to directly predict the N-factor envelope and the transition locations over a two-dimensional airfoil. The proposed transition model has been developed and assessed using a large database of 53 airfoils over a wide range of chord Reynolds numbers and angles of attack. The sequence-to-sequence transduction model proposed herein provides a more direct approach for accurate predictions of the transition location than the earlier neural network methods, which predict the local amplification rate of a single instability mode at a fixed location along the airfoil. The large universe of airfoils encountered in various applications causes additional difficulties. As such, we provide further insights on selecting training datasets from large amounts of available data. △ Less

Submitted 15 June, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

Comments: Submitted to Data-Centric Engineering journal

arXiv:2012.07955 [pdf]

Importance of Signal and Image Processing in Photoacoustic Imaging

Authors: Mohsin Zafar, Qiuyun Xu, Rayyan Manwar

Abstract: Photoacoustic imaging (PAI) is a powerful imaging modality that relies on the PA effect. PAI works on the principle of electromagnetic energy absorption by the exogenous contrast agents and/or endogenous molecules present in the biological tissue, consequently generating ultrasound waves. PAI combines a high optical contrast with a high acoustic spatiotemporal resolution, allowing the non-invasive… ▽ More Photoacoustic imaging (PAI) is a powerful imaging modality that relies on the PA effect. PAI works on the principle of electromagnetic energy absorption by the exogenous contrast agents and/or endogenous molecules present in the biological tissue, consequently generating ultrasound waves. PAI combines a high optical contrast with a high acoustic spatiotemporal resolution, allowing the non-invasive visualization of absorbers at deep structures. However, due to the optical diffusion and ultrasound attenuation in heterogeneous turbid biological tissue, the quality of the PA images is deteriorated. Therefore, signal and image processing techniques are imperative in PAI to provide high quality images with detailed structural and functional information in deep tissues. Here, we review various signal and image processing techniques that have been developed/implemented in PAI. Our goal is to highlight the importance of image computing in photoacoustic imaging. △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: 20 pages, 6 figures

arXiv:2007.00251 [pdf, other]

Unifying Model Explainability and Robustness via Machine-Checkable Concepts

Authors: Vedant Nanda, Till Speicher, John P. Dickerson, Krishna P. Gummadi, Muhammad Bilal Zafar

Abstract: As deep neural networks (DNNs) get adopted in an ever-increasing number of applications, explainability has emerged as a crucial desideratum for these models. In many real-world tasks, one of the principal reasons for requiring explainability is to in turn assess prediction robustness, where predictions (i.e., class labels) that do not conform to their respective explanations (e.g., presence or ab… ▽ More As deep neural networks (DNNs) get adopted in an ever-increasing number of applications, explainability has emerged as a crucial desideratum for these models. In many real-world tasks, one of the principal reasons for requiring explainability is to in turn assess prediction robustness, where predictions (i.e., class labels) that do not conform to their respective explanations (e.g., presence or absence of a concept in the input) are deemed to be unreliable. However, most, if not all, prior methods for checking explanation-conformity (e.g., LIME, TCAV, saliency maps) require significant manual intervention, which hinders their large-scale deployability. In this paper, we propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts. Our framework defines a large number of concepts that the DNN explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness. Both steps are executed in an automated manner without requiring any human intervention and are easily scaled to datasets with a very large number of classes. Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly: the predictions marked to be robust by our framework have significantly higher accuracy and are more robust to adversarial perturbations. △ Less

Submitted 2 July, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: 22 pages, 12 figures, 11 tables

arXiv:2006.05109 [pdf, other]

Fair Bayesian Optimization

Authors: Valerio Perrone, Michele Donini, Muhammad Bilal Zafar, Robin Schmucker, Krishnaram Kenthapadi, Cédric Archambeau

Abstract: Given the increasing importance of machine learning (ML) in our lives, several algorithmic fairness techniques have been proposed to mitigate biases in the outcomes of the ML models. However, most of these techniques are specialized to cater to a single family of ML models and a specific definition of fairness, limiting their adaptibility in practice. We introduce a general constrained Bayesian op… ▽ More Given the increasing importance of machine learning (ML) in our lives, several algorithmic fairness techniques have been proposed to mitigate biases in the outcomes of the ML models. However, most of these techniques are specialized to cater to a single family of ML models and a specific definition of fairness, limiting their adaptibility in practice. We introduce a general constrained Bayesian optimization (BO) framework to optimize the performance of any ML model while enforcing one or multiple fairness constraints. BO is a model-agnostic optimization method that has been successfully applied to automatically tune the hyperparameters of ML models. We apply BO with fairness constraints to a range of popular models, including random forests, gradient boosting, and neural networks, showing that we can obtain accurate and fair solutions by acting solely on the hyperparameters. We also show empirically that our approach is competitive with specialized techniques that enforce model-specific fairness constraints, and outperforms preprocessing methods that learn fair representations of the input data. Moreover, our method can be used in synergy with such specialized fairness techniques to tune their hyperparameters. Finally, we study the relationship between fairness and the hyperparameters selected by BO. We observe a correlation between regularization and unbiased models, explaining why acting on the hyperparameters leads to ML models that generalize well and are fair. △ Less

Submitted 18 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

arXiv:2005.02599 [pdf, other]

Convolutional Neural Network for Transition Modeling Based on Linear Stability Theory

Authors: Muhammad I. Zafar, Heng Xiao, Meelan M. Choudhari, Fei Li, Chau-Lyan Chang, Pedro Paredes, Balaji Venkatachari

Abstract: Transition prediction is an important aspect of aerodynamic design because of its impact on skin friction and potential coupling with flow separation characteristics. Traditionally, the modeling of transition has relied on correlation-based empirical formulas based on integral quantities such as the shape factor of the boundary layer. However, in many applications of computational fluid dynamics,… ▽ More Transition prediction is an important aspect of aerodynamic design because of its impact on skin friction and potential coupling with flow separation characteristics. Traditionally, the modeling of transition has relied on correlation-based empirical formulas based on integral quantities such as the shape factor of the boundary layer. However, in many applications of computational fluid dynamics, the shape factor is not straightforwardly available or not well-defined. We propose using the complete velocity profile along with other quantities (e.g., frequency, Reynolds number) to predict the perturbation amplification factor. While this can be achieved with regression models based on a classical fully connected neural network, such a model can be computationally more demanding. We propose a novel convolutional neural network inspired by the underlying physics as described by the stability equations. Specifically, convolutional layers are first used to extract integral quantities from the velocity profiles, and then fully connected layers are used to map the extracted integral quantities, along with frequency and Reynolds number, to the output (amplification ratio). Numerical tests on classical boundary layers clearly demonstrate the merits of the proposed method. More importantly, we demonstrate that, for Tollmien-Schlichting instabilities in two-dimensional, low-speed boundary layers, the proposed network encodes information in the boundary layer profiles into an integral quantity that is strongly correlated to a well-known, physically defined parameter -- the shape factor. △ Less

Submitted 6 May, 2020; originally announced May 2020.

Comments: 15 pages, 7 figures, submitted to Physical Review Fluids journal

arXiv:2003.08803 [pdf]

Deep Object Detection based Mitosis Analysis in Breast Cancer Histopathological Images

Authors: Anabia Sohail, Muhammad Ahsan Mukhtar, Asifullah Khan, Muhammad Mohsin Zafar, Aneela Zameer, Saranjam Khan

Abstract: Empirical evaluation of breast tissue biopsies for mitotic nuclei detection is considered an important prognostic biomarker in tumor grading and cancer progression. However, automated mitotic nuclei detection poses several challenges because of the unavailability of pixel-level annotations, different morphological configurations of mitotic nuclei, their sparse representation, and close resemblance… ▽ More Empirical evaluation of breast tissue biopsies for mitotic nuclei detection is considered an important prognostic biomarker in tumor grading and cancer progression. However, automated mitotic nuclei detection poses several challenges because of the unavailability of pixel-level annotations, different morphological configurations of mitotic nuclei, their sparse representation, and close resemblance with non-mitotic nuclei. These challenges undermine the precision of the automated detection model and thus make detection difficult in a single phase. This work proposes an end-to-end detection system for mitotic nuclei identification in breast cancer histopathological images. Deep object detection-based Mask R-CNN is adapted for mitotic nuclei detection that initially selects the candidate mitotic region with maximum recall. However, in the second phase, these candidate regions are refined by multi-object loss function to improve the precision. The performance of the proposed detection model shows improved discrimination ability (F-score of 0.86) for mitotic nuclei with significant precision (0.86) as compared to the two-stage detection models (F-score of 0.701) on TUPAC16 dataset. Promising results suggest that the deep object detection-based model has the potential to learn the characteristic features of mitotic nuclei from weakly annotated data and suggests that it can be adapted for the identification of other nuclear bodies in histopathological images. △ Less

Submitted 16 March, 2020; originally announced March 2020.

Comments: Tables: 4, Figures 11, Pages: 21

arXiv:1906.10263 [pdf, other]

DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems

Authors: Muhammad Rehman Zafar, Naimul Mefraz Khan

Abstract: Local Interpretable Model-Agnostic Explanations (LIME) is a popular technique used to increase the interpretability and explainability of black box Machine Learning (ML) algorithms. LIME typically generates an explanation for a single prediction by any ML model by learning a simpler interpretable model (e.g. linear classifier) around the prediction through generating simulated data around the inst… ▽ More Local Interpretable Model-Agnostic Explanations (LIME) is a popular technique used to increase the interpretability and explainability of black box Machine Learning (ML) algorithms. LIME typically generates an explanation for a single prediction by any ML model by learning a simpler interpretable model (e.g. linear classifier) around the prediction through generating simulated data around the instance by random perturbation, and obtaining feature importance through applying some form of feature selection. While LIME and similar local algorithms have gained popularity due to their simplicity, the random perturbation and feature selection methods result in "instability" in the generated explanations, where for the same prediction, different explanations can be generated. This is a critical issue that can prevent deployment of LIME in a Computer-Aided Diagnosis (CAD) system, where stability is of utmost importance to earn the trust of medical professionals. In this paper, we propose a deterministic version of LIME. Instead of random perturbation, we utilize agglomerative Hierarchical Clustering (HC) to group the training data together and K-Nearest Neighbour (KNN) to select the relevant cluster of the new instance that is being explained. After finding the relevant cluster, a linear model is trained over the selected cluster to generate the explanations. Experimental results on three different medical datasets show the superiority for Deterministic Local Interpretable Model-Agnostic Explanations (DLIME), where we quantitatively determine the stability of DLIME compared to LIME utilizing the Jaccard similarity among multiple generated explanations. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1903.09711 [pdf, other]

doi 10.23919/ACC45564.2020.9147864

Barrier Functions in Cascaded Controller: Safe Quadrotor Control

Authors: Mouhyemen Khan, Munzir Zafar, Abhijit Chatterjee

Abstract: Safe control for inherently unstable systems such as quadrotors is crucial. Imposing multiple dynamic constraints simultaneously on the states for safety regulation can be a challenging problem. In this paper, we propose a quadratic programming (QP) based approach on a cascaded control architecture for quadrotors to enforce safety. Safety regions are constructed using control barrier functions (CB… ▽ More Safe control for inherently unstable systems such as quadrotors is crucial. Imposing multiple dynamic constraints simultaneously on the states for safety regulation can be a challenging problem. In this paper, we propose a quadratic programming (QP) based approach on a cascaded control architecture for quadrotors to enforce safety. Safety regions are constructed using control barrier functions (CBF) while explicitly considering the nonlinear underactuated dynamics of the quadrotor. The safety regions constructed using CBFs establish a non-conservative forward invariant safe region for quadrotor navigation. Barriers imposed across the cascaded architecture allows independent safety regulation in quadrotor's altitude and lateral domains. Despite barriers appearing in a cascaded fashion, we show preservation of safety for quadrotor motion in SE(3). We demonstrate the feasibility of our method on a quadrotor in simulation with static and dynamic constraints enforced on position and velocity spaces simultaneously. △ Less

Submitted 17 February, 2020; v1 submitted 22 March, 2019; originally announced March 2019.

Comments: Submitted to ACC 2020, 8 pages, 7 figures

arXiv:1902.09987 [pdf]

Review of Cost Reduction Methods in Photoacoustic Computed Tomography

Authors: Afreen Fatima, Karl Kratkiewicz, Rayyan Manwar, Mohsin Zafar, Ruiying Zhang, Bin Huang, Neda Dadashzadesh, Jun Xia, Mohammad Avanaki

Abstract: Photoacoustic Computed Tomography (PACT) is a major configuration of photoacoustic imaging, a hybrid noninvasive modality for both functional and molecular imaging. PACT has rapidly gained importance in the field of biomedical imaging due to superior performance as compared to conventional optical imaging counterparts. However, the overall cost of developing a PACT system is one of the challenges… ▽ More Photoacoustic Computed Tomography (PACT) is a major configuration of photoacoustic imaging, a hybrid noninvasive modality for both functional and molecular imaging. PACT has rapidly gained importance in the field of biomedical imaging due to superior performance as compared to conventional optical imaging counterparts. However, the overall cost of developing a PACT system is one of the challenges towards clinical translation of this novel technique. The cost of a typical commercial PACT system originates from optical source, ultrasound detector, and data acquisition unit. With growing applications of photoacoustic imaging, there is a tremendous demand towards reducing its cost. In this review article, we have discussed various approaches to reduce the overall cost of a PACT system, and provided a cost estimation to build a low-cost PACT system. △ Less

Submitted 29 May, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: 33 pages, 10 figures, 5 tables

arXiv:1810.03076 [pdf, ps, other]

Online Center of Mass Estimation for a Humanoid Wheeled Inverted Pendulum Robot

Authors: Munzir Zafar, Akash Patel, Bogdan Vlahov, Nathaniel Glaser, Sergio Aguillera, Seth Hutchinson

Abstract: We present a novel application of robust control and online learning for the balancing of a n Degree of Freedom (DoF), Wheeled Inverted Pendulum (WIP) humanoid robot. Our technique condenses the inaccuracies of a mass model into a Center of Mass (CoM) error, balances despite this error, and uses online learning to update the mass model for a better CoM estimate. Using a simulated model of our robo… ▽ More We present a novel application of robust control and online learning for the balancing of a n Degree of Freedom (DoF), Wheeled Inverted Pendulum (WIP) humanoid robot. Our technique condenses the inaccuracies of a mass model into a Center of Mass (CoM) error, balances despite this error, and uses online learning to update the mass model for a better CoM estimate. Using a simulated model of our robot, we meta-learn a set of excitory joint poses that makes our gradient descent algorithm quickly converge to an accurate (CoM) estimate. This simulated pipeline executes in a fully online fashion, using active disturbance rejection to address the mass errors that result from a steadily evolving mass model. Experiments were performed on a 19 DoF WIP, in which we manually acquired the data for the learned set of poses and show that the mass model produced by a gradient descent produces a CoM estimate that improves overall control and efficiency. This work contributes to a greater corpus of whole body control on the Golem Krang humanoid robot. △ Less

Submitted 14 May, 2019; v1 submitted 6 October, 2018; originally announced October 2018.

arXiv:1810.03074 [pdf, other]

Hierarchical Optimization for Whole-Body Control of Wheeled Inverted Pendulum Humanoids

Authors: Munzir Zafar, Seth Hutchinson, Evangelos A. Theodorou

Abstract: In this paper, we present a whole-body control framework for Wheeled Inverted Pendulum (WIP) Humanoids. WIP Humanoids are redundant manipulators dynamically balancing themselves on wheels. Characterized by several degrees of freedom, they have the ability to perform several tasks simultaneously, such as balancing, maintaining a body pose, controlling the gaze, lifting a load or maintaining end-eff… ▽ More In this paper, we present a whole-body control framework for Wheeled Inverted Pendulum (WIP) Humanoids. WIP Humanoids are redundant manipulators dynamically balancing themselves on wheels. Characterized by several degrees of freedom, they have the ability to perform several tasks simultaneously, such as balancing, maintaining a body pose, controlling the gaze, lifting a load or maintaining end-effector configuration in operation space. The problem of whole-body control is to enable simultaneous performance of these tasks with optimal participation of all degrees of freedom at specified priorities for each objective. The control also has to obey constraint of angle and torque limits on each joint. The proposed approach is hierarchical with a low level controller for body joints manipulation and a high-level controller that defines center of mass (CoM) targets for the low-level controller to control zero dynamics of the system driving the wheels. The low-level controller plans for shorter horizons while considering more complete dynamics of the system, while the high-level controller plans for longer horizon based on an approximate model of the robot for computational efficiency. △ Less

Submitted 6 October, 2018; originally announced October 2018.

arXiv:1807.00787 [pdf, other]

doi 10.1145/3219819.3220046

A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices

Authors: Till Speicher, Hoda Heidari, Nina Grgic-Hlaca, Krishna P. Gummadi, Adish Singla, Adrian Weller, Muhammad Bilal Zafar

Abstract: Discrimination via algorithmic decision making has received considerable attention. Prior work largely focuses on defining conditions for fairness, but does not define satisfactory measures of algorithmic unfairness. In this paper, we focus on the following question: Given two unfair algorithms, how should we determine which of the two is more unfair? Our core idea is to use existing inequality in… ▽ More Discrimination via algorithmic decision making has received considerable attention. Prior work largely focuses on defining conditions for fairness, but does not define satisfactory measures of algorithmic unfairness. In this paper, we focus on the following question: Given two unfair algorithms, how should we determine which of the two is more unfair? Our core idea is to use existing inequality indices from economics to measure how unequally the outcomes of an algorithm benefit different individuals or groups in a population. Our work offers a justified and general framework to compare and contrast the (un)fairness of algorithmic predictors. This unifying approach enables us to quantify unfairness both at the individual and the group level. Further, our work reveals overlooked tradeoffs between different fairness notions: using our proposed measures, the overall individual-level unfairness of an algorithm can be decomposed into a between-group and a within-group component. Earlier methods are typically designed to tackle only between-group unfairness, which may be justified for legal or other reasons. However, we demonstrate that minimizing exclusively the between-group component may, in fact, increase the within-group, and hence the overall unfairness. We characterize and illustrate the tradeoffs between our measures of (un)fairness and the prediction accuracy. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: 12 pages 7 figures To be published in: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Proceedings

arXiv:1707.00010 [pdf, other]

From Parity to Preference-based Notions of Fairness in Classification

Authors: Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, Krishna P. Gummadi, Adrian Weller

Abstract: The adoption of automated, data-driven decision making in an ever expanding range of applications has raised concerns about its potential unfairness towards certain social groups. In this context, a number of recent studies have focused on defining, detecting, and removing unfairness from data-driven decision systems. However, the existing notions of fairness, based on parity (equality) in treatme… ▽ More The adoption of automated, data-driven decision making in an ever expanding range of applications has raised concerns about its potential unfairness towards certain social groups. In this context, a number of recent studies have focused on defining, detecting, and removing unfairness from data-driven decision systems. However, the existing notions of fairness, based on parity (equality) in treatment or outcomes for different social groups, tend to be quite stringent, limiting the overall decision making accuracy. In this paper, we draw inspiration from the fair-division and envy-freeness literature in economics and game theory and propose preference-based notions of fairness -- given the choice between various sets of decision treatments or outcomes, any group of users would collectively prefer its treatment or outcomes, regardless of the (dis)parity as compared to the other groups. Then, we introduce tractable proxies to design margin-based classifiers that satisfy these preference-based notions of fairness. Finally, we experiment with a variety of synthetic and real-world datasets and show that preference-based fairness allows for greater decision accuracy than parity-based fairness. △ Less

Submitted 28 November, 2017; v1 submitted 30 June, 2017; originally announced July 2017.

Comments: To appear in Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017). Code available at: https://github.com/mbilalzafar/fair-classification

arXiv:1706.10208 [pdf, other]

On Fairness, Diversity and Randomness in Algorithmic Decision Making

Authors: Nina Grgić-Hlača, Muhammad Bilal Zafar, Krishna P. Gummadi, Adrian Weller

Abstract: Consider a binary decision making process where a single machine learning classifier replaces a multitude of humans. We raise questions about the resulting loss of diversity in the decision making process. We study the potential benefits of using random classifier ensembles instead of a single classifier in the context of fairness-aware learning and demonstrate various attractive properties: (i) a… ▽ More Consider a binary decision making process where a single machine learning classifier replaces a multitude of humans. We raise questions about the resulting loss of diversity in the decision making process. We study the potential benefits of using random classifier ensembles instead of a single classifier in the context of fairness-aware learning and demonstrate various attractive properties: (i) an ensemble of fair classifiers is guaranteed to be fair, for several different measures of fairness, (ii) an ensemble of unfair classifiers can still achieve fair outcomes, and (iii) an ensemble of classifiers can achieve better accuracy-fairness trade-offs than a single classifier. Finally, we introduce notions of distributional fairness to characterize further potential benefits of random classifier ensembles. △ Less

Submitted 30 June, 2017; originally announced June 2017.

Comments: Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)

arXiv:1704.01442 [pdf, other]

Characterizing Information Diets of Social Media Users

Authors: Juhi Kulshrestha, Muhammad Bilal Zafar, Lisette Espin-Noboa, Krishna P. Gummadi, Saptarshi Ghosh

Abstract: With the widespread adoption of social media sites like Twitter and Facebook, there has been a shift in the way information is produced and consumed. Earlier, the only producers of information were traditional news organizations, which broadcast the same carefully-edited information to all consumers over mass media channels. Whereas, now, in online social media, any user can be a producer of infor… ▽ More With the widespread adoption of social media sites like Twitter and Facebook, there has been a shift in the way information is produced and consumed. Earlier, the only producers of information were traditional news organizations, which broadcast the same carefully-edited information to all consumers over mass media channels. Whereas, now, in online social media, any user can be a producer of information, and every user selects which other users she connects to, thereby choosing the information she consumes. Moreover, the personalized recommendations that most social media sites provide also contribute towards the information consumed by individual users. In this work, we define a concept of information diet -- which is the topical distribution of a given set of information items (e.g., tweets) -- to characterize the information produced and consumed by various types of users in the popular Twitter social media. At a high level, we find that (i) popular users mostly produce very specialized diets focusing on only a few topics; in fact, news organizations (e.g., NYTimes) produce much more focused diets on social media as compared to their mass media diets, (ii) most users' consumption diets are primarily focused towards one or two topics of their interest, and (iii) the personalized recommendations provided by Twitter help to mitigate some of the topical imbalances in the users' consumption diets, by adding information on diverse topics apart from the users' primary topics of interest. △ Less

Submitted 5 April, 2017; originally announced April 2017.

Comments: In Proceeding of International AAAI Conference on Web and Social Media (ICWSM), Oxford, UK, May 2015

arXiv:1704.01347 [pdf, ps, other]

doi 10.1145/2998181.2998321

Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media

Authors: Juhi Kulshrestha, Motahhare Eslami, Johnnatan Messias, Muhammad Bilal Zafar, Saptarshi Ghosh, Krishna P. Gummadi, Karrie Karahalios

Abstract: Search systems in online social media sites are frequently used to find information about ongoing events and people. For topics with multiple competing perspectives, such as political events or political candidates, bias in the top ranked results significantly shapes public opinion. However, bias does not emerge from an algorithm alone. It is important to distinguish between the bias that arises f… ▽ More Search systems in online social media sites are frequently used to find information about ongoing events and people. For topics with multiple competing perspectives, such as political events or political candidates, bias in the top ranked results significantly shapes public opinion. However, bias does not emerge from an algorithm alone. It is important to distinguish between the bias that arises from the data that serves as the input to the ranking system and the bias that arises from the ranking system itself. In this paper, we propose a framework to quantify these distinct biases and apply this framework to politics-related queries on Twitter. We found that both the input data and the ranking system contribute significantly to produce varying amounts of bias in the search results and in different ways. We discuss the consequences of these biases and possible mechanisms to signal this bias in social media search systems' interfaces. △ Less

Submitted 5 April, 2017; originally announced April 2017.

Comments: In Proceedings of ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW), Portland, USA, February 2017

Showing 1–50 of 57 results for author: Zafar, M