-
Teaching Transformers Causal Reasoning through Axiomatic Training
Authors:
Aniket Vashishtha,
Abhinav Kumar,
Atharva Pandey,
Abbavaram Gowtham Reddy,
Kabir Ahuja,
Vineeth N Balasubramanian,
Amit Sharma
Abstract:
For text-based AI systems to interact in the real world, causal reasoning is an essential skill. Since active interventions are costly, we study to what extent a system can learn causal reasoning from symbolic demonstrations of causal axioms. Specifically, we present an axiomatic training method where the system learns from multiple demonstrations of a causal axiom (or rule), rather than incorpora…
▽ More
For text-based AI systems to interact in the real world, causal reasoning is an essential skill. Since active interventions are costly, we study to what extent a system can learn causal reasoning from symbolic demonstrations of causal axioms. Specifically, we present an axiomatic training method where the system learns from multiple demonstrations of a causal axiom (or rule), rather than incorporating the axiom as an inductive bias or inferring it from data values. A key question is whether the system would learn to generalize from the axiom demonstrations to more complex scenarios. Our results, based on applying axiomatic training to learn the transitivity axiom and d-separation rule, indicate that such generalization is possible. To avoid data contamination issues, we start with a 67 million parameter transformer model and train it from scratch. On both tasks, we find that a model trained on linear causal chains (along with some noisy variations) can generalize well to complex graphs, including longer causal chains, causal chains with reversed order, and graphs with branching.To handle diverse text inputs, the same method is extended to finetune language models. Finetuning Llama-3.1 8B model on our axiomatic data leads to significant gains on causal benchmarks such as Corr2Cause and CLEAR, in some cases providing state-of-the-art performance surpassing GPT-4.
△ Less
Submitted 15 April, 2025; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference
Authors:
Aniket Vashishtha,
Abbavaram Gowtham Reddy,
Abhinav Kumar,
Saketh Bachu,
Vineeth N Balasubramanian,
Amit Sharma
Abstract:
Large Language Models (LLMs) have been used as experts to infer causal graphs, often by repeatedly applying a pairwise prompt that asks about the causal relationship of each variable pair. However, such experts, including human domain experts, cannot distinguish between direct and indirect effects given a pairwise prompt. Therefore, instead of the graph, we propose that causal order be used as a m…
▽ More
Large Language Models (LLMs) have been used as experts to infer causal graphs, often by repeatedly applying a pairwise prompt that asks about the causal relationship of each variable pair. However, such experts, including human domain experts, cannot distinguish between direct and indirect effects given a pairwise prompt. Therefore, instead of the graph, we propose that causal order be used as a more stable output interface for utilizing expert knowledge. Even when querying a perfect expert with a pairwise prompt, we show that the inferred graph can have significant errors whereas the causal order is always correct. In practice, however, LLMs are imperfect experts and we find that pairwise prompts lead to multiple cycles. Hence, we propose the triplet method, a novel querying strategy that introduces an auxiliary variable for every variable pair and instructs the LLM to avoid cycles within this triplet. It then uses a voting-based ensemble method that results in higher accuracy and fewer cycles while ensuring cost efficiency. Across multiple real-world graphs, such a triplet-based method yields a more accurate order than the pairwise prompt, using both LLMs and human annotators. The triplet method enhances robustness by repeatedly querying an expert with different auxiliary variables, enabling smaller models like Phi-3 and Llama-3 8B Instruct to surpass GPT-4 with pairwise prompting. For practical usage, we show how the expert-provided causal order from the triplet method can be used to reduce error in downstream graph discovery and effect inference tasks.
△ Less
Submitted 7 April, 2025; v1 submitted 23 October, 2023;
originally announced October 2023.
-
On Evaluating and Mitigating Gender Biases in Multilingual Settings
Authors:
Aniket Vashishtha,
Kabir Ahuja,
Sunayana Sitaram
Abstract:
While understanding and removing gender biases in language models has been a long-standing problem in Natural Language Processing, prior research work has primarily been limited to English. In this work, we investigate some of the challenges with evaluating and mitigating biases in multilingual settings which stem from a lack of existing benchmarks and resources for bias evaluation beyond English…
▽ More
While understanding and removing gender biases in language models has been a long-standing problem in Natural Language Processing, prior research work has primarily been limited to English. In this work, we investigate some of the challenges with evaluating and mitigating biases in multilingual settings which stem from a lack of existing benchmarks and resources for bias evaluation beyond English especially for non-western context. In this paper, we first create a benchmark for evaluating gender biases in pre-trained masked language models by extending DisCo to different Indian languages using human annotations. We extend various debiasing methods to work beyond English and evaluate their effectiveness for SOTA massively multilingual models on our proposed metric. Overall, our work highlights the challenges that arise while studying social biases in multilingual settings and provides resources as well as mitigation techniques to take a step toward scaling to more languages.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Vaccination Worldwide: Strategies, Distribution and Challenges
Authors:
Chirag Samal,
Kasia Jakimowicz,
Krishnendu Dasgupta,
Aniket Vashishtha,
Francisco O.,
Arunakiry Natarajan,
Haris Nazir,
Alluri Siddhartha Varma,
Tejal Dahake,
Amitesh Anand Pandey,
Ishaan Singh,
John Sangyeob Kim,
Mehrab Singh Gill,
Saurish Srivastava,
Orna Mukhopadhyay,
Parth Patwa,
Qamil Mirza,
Sualeha Irshad,
Sheshank Shankar,
Rohan Iyer,
Rohan Sukumaran,
Ashley Mehra,
Anshuman Sharma,
Abhishek Singh,
Maurizio Arseni
, et al. (4 additional authors not shown)
Abstract:
The Coronavirus 2019 (Covid-19) pandemic caused by the SARS-CoV-2 virus represents an unprecedented crisis for our planet. It is a bane of the über connected world that we live in that this virus has affected almost all countries and caused mortality and economic upheaval at a scale whose effects are going to be felt for generations to come. While we can all be buoyed at the pace at which vaccines…
▽ More
The Coronavirus 2019 (Covid-19) pandemic caused by the SARS-CoV-2 virus represents an unprecedented crisis for our planet. It is a bane of the über connected world that we live in that this virus has affected almost all countries and caused mortality and economic upheaval at a scale whose effects are going to be felt for generations to come. While we can all be buoyed at the pace at which vaccines have been developed and brought to market, there are still challenges ahead for all countries to get their populations vaccinated equitably and effectively. This paper provides an overview of ongoing immunization efforts in various countries. In this early draft, we have identified a few key factors that we use to review different countries' current COVID-19 immunization strategies and their strengths and draw conclusions so that policymakers worldwide can learn from them. Our paper focuses on processes related to vaccine approval, allocation and prioritization, distribution strategies, population to vaccine ratio, vaccination governance, accessibility and use of digital solutions, and government policies. The statistics and numbers are dated as per the draft date [June 24th, 2021].
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
Mach Number Dependence of Flow Instability around a Spiked Body
Authors:
Ashish Vashishtha,
Shashank Khurana
Abstract:
A forward-facing aerospike have been identified as a passive flow control device for enhancing the aerodynamic efficiency and reducing the heat transfer in high-speed flows. In addition, it has been reported that the presence of a spike brings in unsteadiness in the form of oscillation and pulsation to the structure. Previous researchers have investigated the aerothermodynamic coefficients, togeth…
▽ More
A forward-facing aerospike have been identified as a passive flow control device for enhancing the aerodynamic efficiency and reducing the heat transfer in high-speed flows. In addition, it has been reported that the presence of a spike brings in unsteadiness in the form of oscillation and pulsation to the structure. Previous researchers have investigated the aerothermodynamic coefficients, together with offering a detailed explanation of the flow physics and associated unsteadiness, and their dependence on the spike's geometric characteristics (spike nose, and length-to-fore-body diameter ratio, L/D). This work focuses on ascertaining the role of flow speeds (free-stream Mach number), and their energy content, in governing the physics around a spiked body, which is yet to be established. Numerical investigation has been carried out using axisymmetric Navier-Stokes laminar flow solver for Mach number range of 2.0 to 7.0. A round-tip spike with flat-face cylindrical after-body have been simulated for spike length ratio of L/D = 2.0, with spike diameter to fore-body diameter of 0.1. The flow unsteadiness has been analyzed with drag and pressure coefficients variation at different Mach numbers. It was found that the flow field around the spiked blunt nose behaves in pulsation mode at lower Mach numbers 2, 3 and transition to oscillatory mode at higher Mach numbers 5, 6 and 7, while remain almost stable at Mach 4. The limit of Strouhal Number for characterizing the pulsation and oscillation modes at various Mach numbers for spike length of L/D = 2 with flat after-body is observed as 0.2, however it may very well depend on other geometric parameters of spike and after-body.
△ Less
Submitted 27 May, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Mining Trends of COVID-19 Vaccine Beliefs on Twitter with Lexical Embeddings
Authors:
Harshita Chopra,
Aniket Vashishtha,
Ridam Pal,
Ashima,
Ananya Tyagi,
Tavpritesh Sethi
Abstract:
Social media plays a pivotal role in disseminating news globally and acts as a platform for people to express their opinions on various topics. A wide variety of views accompanies COVID-19 vaccination drives across the globe, often colored by emotions, which change along with rising cases, approval of vaccines, and multiple factors discussed online. This study aims at analyzing the temporal evolut…
▽ More
Social media plays a pivotal role in disseminating news globally and acts as a platform for people to express their opinions on various topics. A wide variety of views accompanies COVID-19 vaccination drives across the globe, often colored by emotions, which change along with rising cases, approval of vaccines, and multiple factors discussed online. This study aims at analyzing the temporal evolution of different Emotion categories: Hesitation, Rage, Sorrow, Anticipation, Faith, and Contentment with Influencing Factors: Vaccine Rollout, Misinformation, Health Effects, and Inequities as lexical categories created from Tweets belonging to five countries with vital vaccine roll-out programs, namely, India, United States of America, Brazil, United Kingdom, and Australia. We extracted a corpus of nearly 1.8 million Twitter posts related to COVID-19 vaccination. Using cosine distance from selected seed words, we expanded the vocabulary of each category and tracked the longitudinal change in their strength from June 2020 to April 2021. We used community detection algorithms to find modules in positive correlation networks. Our findings suggest that tweets expressing hesitancy towards vaccines contain the highest mentions of health-related effects in all countries. Our results indicated that the patterns of hesitancy were variable across geographies and can help us learn targeted interventions. We also observed a significant change in the linear trends of categories like hesitation and contentment before and after approval of vaccines. Negative emotions like rage and sorrow gained the highest importance in the alluvial diagram. They formed a significant module with all the influencing factors in April 2021, when India observed the second wave of COVID-19 cases. The relationship between Emotions and Influencing Factors was found to be variable across the countries.
△ Less
Submitted 20 July, 2021; v1 submitted 2 April, 2021;
originally announced April 2021.
-
Drag Control by Hydrogen Injection in Shocked Stagnation Zone of Blunt Nose
Authors:
Ashish Vashishtha,
Dean Callaghan,
Cathal Nolan
Abstract:
The main motivation of the current study is to propose a high-pressure hydrogen injection as an hybrid active flow control technique in order to manipulate the flow-field in front of a blunt nose during hypersonic flight. Hydrogen injection can lead to self-igition under the right environment conditions in a stagnation zone, and may cause thermal heat addition through combustion and provide the co…
▽ More
The main motivation of the current study is to propose a high-pressure hydrogen injection as an hybrid active flow control technique in order to manipulate the flow-field in front of a blunt nose during hypersonic flight. Hydrogen injection can lead to self-igition under the right environment conditions in a stagnation zone, and may cause thermal heat addition through combustion and provide the counterjet effect together by pushing bow shock upstream. The axisymmetric numerical simulations for the hemispherical blunt nose are performed at a Mach 6 freestream flow with 10000 Pa pressure and 293 K temperature. The sonic and supersonic hydrogen and air injections are compared for drag reduction at the same stagnation pressure ratio $PR$ and momentum ratio ($R_{MA}$). The sonic air and hydrogen injection scenarios show similar performance in terms of drag reduction and similar SPM flow features, but hydrogen injection has a mass flow rate 3.76 times lower than air. Supersonic hydrogen injection at $M_j$ 2.94 behaves differently than supersonic air injection and can achieve up to 60 % drag reduction at lower PR and LPM mode with lower mass flow rate. Additionally, air injection achieves a drag reduction of 40 % in SPM mode at higher PR with very high mass flow rate.
△ Less
Submitted 20 October, 2020; v1 submitted 10 October, 2020;
originally announced October 2020.
-
VacSIM: Learning Effective Strategies for COVID-19 Vaccine Distribution using Reinforcement Learning
Authors:
Raghav Awasthi,
Keerat Kaur Guliani,
Saif Ahmad Khan,
Aniket Vashishtha,
Mehrab Singh Gill,
Arshita Bhatt,
Aditya Nagori,
Aniket Gupta,
Ponnurangam Kumaraguru,
Tavpritesh Sethi
Abstract:
A COVID-19 vaccine is our best bet for mitigating the ongoing onslaught of the pandemic. However, vaccine is also expected to be a limited resource. An optimal allocation strategy, especially in countries with access inequities and temporal separation of hot-spots, might be an effective way of halting the disease spread. We approach this problem by proposing a novel pipeline VacSIM that dovetails…
▽ More
A COVID-19 vaccine is our best bet for mitigating the ongoing onslaught of the pandemic. However, vaccine is also expected to be a limited resource. An optimal allocation strategy, especially in countries with access inequities and temporal separation of hot-spots, might be an effective way of halting the disease spread. We approach this problem by proposing a novel pipeline VacSIM that dovetails Deep Reinforcement Learning models into a Contextual Bandits approach for optimizing the distribution of COVID-19 vaccine. Whereas the Reinforcement Learning models suggest better actions and rewards, Contextual Bandits allow online modifications that may need to be implemented on a day-to-day basis in the real world scenario. We evaluate this framework against a naive allocation approach of distributing vaccine proportional to the incidence of COVID-19 cases in five different States across India (Assam, Delhi, Jharkhand, Maharashtra and Nagaland) and demonstrate up to 9039 potential infections prevented and a significant increase in the efficacy of limiting the spread over a period of 45 days through the VacSIM approach. Our models and the platform are extensible to all states of India and potentially across the globe. We also propose novel evaluation strategies including standard compartmental model-based projections and a causality-preserving evaluation of our model. Since all models carry assumptions that may need to be tested in various contexts, we open source our model VacSIM and contribute a new reinforcement learning environment compatible with OpenAI gym to make it extensible for real-world applications across the globe. (http://vacsim.tavlab.iiitd.edu.in:8000/).
△ Less
Submitted 4 December, 2021; v1 submitted 14 September, 2020;
originally announced September 2020.