-
A Variational Approach for Mitigating Entity Bias in Relation Extraction
Authors:
Samuel Mensah,
Elena Kochkina,
Jabez Magomere,
Joy Prakash Sain,
Simerjot Kaur,
Charese Smiley
Abstract:
Mitigating entity bias is a critical challenge in Relation Extraction (RE), where models often rely excessively on entities, resulting in poor generalization. This paper presents a novel approach to address this issue by adapting a Variational Information Bottleneck (VIB) framework. Our method compresses entity-specific information while preserving task-relevant features. It achieves state-of-the-…
▽ More
Mitigating entity bias is a critical challenge in Relation Extraction (RE), where models often rely excessively on entities, resulting in poor generalization. This paper presents a novel approach to address this issue by adapting a Variational Information Bottleneck (VIB) framework. Our method compresses entity-specific information while preserving task-relevant features. It achieves state-of-the-art performance on relation extraction datasets across general, financial, and biomedical domains, in both indomain (original test sets) and out-of-domain (modified test sets with type-constrained entity replacements) settings. Our approach offers a robust, interpretable, and theoretically grounded methodology.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
GenPlanX. Generation of Plans and Execution
Authors:
Daniel Borrajo,
Giuseppe Canonaco,
Tomás de la Rosa,
Alfredo Garrachón,
Sriram Gopalakrishnan,
Simerjot Kaur,
Marianela Morales,
Sunandita Patra,
Alberto Pozanco,
Keshav Ramani,
Charese Smiley,
Pietro Totis,
Manuela Veloso
Abstract:
Classical AI Planning techniques generate sequences of actions for complex tasks. However, they lack the ability to understand planning tasks when provided using natural language. The advent of Large Language Models (LLMs) has introduced novel capabilities in human-computer interaction. In the context of planning tasks, LLMs have shown to be particularly good in interpreting human intents among ot…
▽ More
Classical AI Planning techniques generate sequences of actions for complex tasks. However, they lack the ability to understand planning tasks when provided using natural language. The advent of Large Language Models (LLMs) has introduced novel capabilities in human-computer interaction. In the context of planning tasks, LLMs have shown to be particularly good in interpreting human intents among other uses. This paper introduces GenPlanX that integrates LLMs for natural language-based description of planning tasks, with a classical AI planning engine, alongside an execution and monitoring framework. We demonstrate the efficacy of GenPlanX in assisting users with office-related tasks, highlighting its potential to streamline workflows and enhance productivity through seamless human-AI collaboration.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Conservative Bias in Large Language Models: Measuring Relation Predictions
Authors:
Toyin Aguda,
Erik Wilson,
Allan Anzagira,
Simerjot Kaur,
Charese Smiley
Abstract:
Large language models (LLMs) exhibit pronounced conservative bias in relation extraction tasks, frequently defaulting to No_Relation label when an appropriate option is unavailable. While this behavior helps prevent incorrect relation assignments, our analysis reveals that it also leads to significant information loss when reasoning is not explicitly included in the output. We systematically evalu…
▽ More
Large language models (LLMs) exhibit pronounced conservative bias in relation extraction tasks, frequently defaulting to No_Relation label when an appropriate option is unavailable. While this behavior helps prevent incorrect relation assignments, our analysis reveals that it also leads to significant information loss when reasoning is not explicitly included in the output. We systematically evaluate this trade-off across multiple prompts, datasets, and relation types, introducing the concept of Hobson's choice to capture scenarios where models opt for safe but uninformative labels over hallucinated ones. Our findings suggest that conservative bias occurs twice as often as hallucination. To quantify this effect, we use SBERT and LLM prompts to capture the semantic similarity between conservative bias behaviors in constrained prompts and labels generated from semi-constrained and open-ended prompts.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Human Heterogeneity Invariant Stress Sensing
Authors:
Yi Xiao,
Harshit Sharma,
Sawinder Kaur,
Dessa Bergen-Cico,
Asif Salekin
Abstract:
Stress affects physical and mental health, and wearable devices have been widely used to detect daily stress through physiological signals. However, these signals vary due to factors such as individual differences and health conditions, making generalizing machine learning models difficult. To address these challenges, we present Human Heterogeneity Invariant Stress Sensing (HHISS), a domain gener…
▽ More
Stress affects physical and mental health, and wearable devices have been widely used to detect daily stress through physiological signals. However, these signals vary due to factors such as individual differences and health conditions, making generalizing machine learning models difficult. To address these challenges, we present Human Heterogeneity Invariant Stress Sensing (HHISS), a domain generalization approach designed to find consistent patterns in stress signals by removing person-specific differences. This helps the model perform more accurately across new people, environments, and stress types not seen during training. Its novelty lies in proposing a novel technique called person-wise sub-network pruning intersection to focus on shared features across individuals, alongside preventing overfitting by leveraging continuous labels while training. The study focuses especially on people with opioid use disorder (OUD)-a group where stress responses can change dramatically depending on their time of daily medication taking. Since stress often triggers cravings, a model that can adapt well to these changes could support better OUD rehabilitation and recovery. We tested HHISS on seven different stress datasets-four of which we collected ourselves and three public ones. Four are from lab setups, one from a controlled real-world setting, driving, and two are from real-world in-the-wild field datasets without any constraints. This is the first study to evaluate how well a stress detection model works across such a wide range of data. Results show HHISS consistently outperformed state-of-the-art baseline methods, proving both effective and practical for real-world use. Ablation studies, empirical justifications, and runtime evaluations confirm HHISS's feasibility and scalability for mobile stress sensing in sensitive real-world applications.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Calibrating LLM Confidence by Probing Perturbed Representation Stability
Authors:
Reza Khanmohammadi,
Erfan Miahi,
Mehrsa Mardikoraem,
Simerjot Kaur,
Ivan Brugere,
Charese H. Smiley,
Kundan Thind,
Mohammad M. Ghassemi
Abstract:
Miscalibration in Large Language Models (LLMs) undermines their reliability, highlighting the need for accurate confidence estimation. We introduce CCPS (Calibrating LLM Confidence by Probing Perturbed Representation Stability), a novel method analyzing internal representational stability in LLMs. CCPS applies targeted adversarial perturbations to final hidden states, extracts features reflecting…
▽ More
Miscalibration in Large Language Models (LLMs) undermines their reliability, highlighting the need for accurate confidence estimation. We introduce CCPS (Calibrating LLM Confidence by Probing Perturbed Representation Stability), a novel method analyzing internal representational stability in LLMs. CCPS applies targeted adversarial perturbations to final hidden states, extracts features reflecting the model's response to these perturbations, and uses a lightweight classifier to predict answer correctness. CCPS was evaluated on LLMs from 8B to 32B parameters (covering Llama, Qwen, and Mistral architectures) using MMLU and MMLU-Pro benchmarks in both multiple-choice and open-ended formats. Our results show that CCPS significantly outperforms current approaches. Across four LLMs and three MMLU variants, CCPS reduces Expected Calibration Error by approximately 55% and Brier score by 21%, while increasing accuracy by 5 percentage points, Area Under the Precision-Recall Curve by 4 percentage points, and Area Under the Receiver Operating Characteristic Curve by 6 percentage points, all relative to the strongest prior method. CCPS delivers an efficient, broadly applicable, and more accurate solution for estimating LLM confidence, thereby improving their trustworthiness.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
Authors:
Jabez Magomere,
Elena Kochkina,
Samuel Mensah,
Simerjot Kaur,
Charese H. Smiley
Abstract:
We introduce FinNLI, a benchmark dataset for Financial Natural Language Inference (FinNLI) across diverse financial texts like SEC Filings, Annual Reports, and Earnings Call transcripts. Our dataset framework ensures diverse premise-hypothesis pairs while minimizing spurious correlations. FinNLI comprises 21,304 pairs, including a high-quality test set of 3,304 instances annotated by finance exper…
▽ More
We introduce FinNLI, a benchmark dataset for Financial Natural Language Inference (FinNLI) across diverse financial texts like SEC Filings, Annual Reports, and Earnings Call transcripts. Our dataset framework ensures diverse premise-hypothesis pairs while minimizing spurious correlations. FinNLI comprises 21,304 pairs, including a high-quality test set of 3,304 instances annotated by finance experts. Evaluations show that domain shift significantly degrades general-domain NLI performance. The highest Macro F1 scores for pre-trained (PLMs) and large language models (LLMs) baselines are 74.57% and 78.62%, respectively, highlighting the dataset's difficulty. Surprisingly, instruction-tuned financial LLMs perform poorly, suggesting limited generalizability. FinNLI exposes weaknesses in current LLMs for financial reasoning, indicating room for improvement.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services
Authors:
Shira Michel,
Sufi Kaur,
Sarah Elizabeth Gillespie,
Jeffrey Gleason,
Christo Wilson,
Avijit Ghosh
Abstract:
Recent advances in artificial intelligence (AI) speech generation and voice cloning technologies have produced naturalistic speech and accurate voice replication, yet their influence on sociotechnical systems across diverse accents and linguistic traits is not fully understood. This study evaluates two synthetic AI voice services (Speechify and ElevenLabs) through a mixed methods approach using su…
▽ More
Recent advances in artificial intelligence (AI) speech generation and voice cloning technologies have produced naturalistic speech and accurate voice replication, yet their influence on sociotechnical systems across diverse accents and linguistic traits is not fully understood. This study evaluates two synthetic AI voice services (Speechify and ElevenLabs) through a mixed methods approach using surveys and interviews to assess technical performance and uncover how users' lived experiences influence their perceptions of accent variations in these speech technologies. Our findings reveal technical performance disparities across five regional, English-language accents and demonstrate how current speech generation technologies may inadvertently reinforce linguistic privilege and accent-based discrimination, potentially creating new forms of digital exclusion. Overall, our study highlights the need for inclusive design and regulation by providing actionable insights for developers, policymakers, and organizations to ensure equitable and socially responsible AI speech technologies.
△ Less
Submitted 13 June, 2025; v1 submitted 12 April, 2025;
originally announced April 2025.
-
VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models
Authors:
Suhas G Hegde,
Shilpy Kaur,
Aruna Tiwari
Abstract:
Popular PEFT methods achieve parameter efficiency by assuming that incremental weight updates are inherently low-rank, which often leads to a performance gap compared to full fine-tuning. While recent methods have attempted to address this limitation, they typically lack sufficient parameter and memory efficiency. We propose VectorFit, an effective and easily deployable approach that adaptively tr…
▽ More
Popular PEFT methods achieve parameter efficiency by assuming that incremental weight updates are inherently low-rank, which often leads to a performance gap compared to full fine-tuning. While recent methods have attempted to address this limitation, they typically lack sufficient parameter and memory efficiency. We propose VectorFit, an effective and easily deployable approach that adaptively trains the singular vectors and biases of pre-trained weight matrices. We demonstrate that the utilization of structural and transformational characteristics of pre-trained weights enables high-rank updates comparable to those of full fine-tuning. As a result, VectorFit achieves superior performance with 9X less trainable parameters compared to state-of-the-art PEFT methods. Through extensive experiments over 17 datasets spanning diverse language and vision tasks such as natural language understanding and generation, question answering, image classification, and image generation, we exhibit that VectorFit consistently outperforms baselines, even in extremely low-budget scenarios.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Grounding LLM Reasoning with Knowledge Graphs
Authors:
Alfonso Amayuelas,
Joy Sain,
Simerjot Kaur,
Charese Smiley
Abstract:
Knowledge Graphs (KGs) are valuable tools for representing relationships between entities in a structured format. Traditionally, these knowledge bases are queried to extract specific information. However, question-answering (QA) over such KGs poses a challenge due to the intrinsic complexity of natural language compared to the structured format and the size of these graphs. Despite these challenge…
▽ More
Knowledge Graphs (KGs) are valuable tools for representing relationships between entities in a structured format. Traditionally, these knowledge bases are queried to extract specific information. However, question-answering (QA) over such KGs poses a challenge due to the intrinsic complexity of natural language compared to the structured format and the size of these graphs. Despite these challenges, the structured nature of KGs can provide a solid foundation for grounding the outputs of Large Language Models (LLMs), offering organizations increased reliability and control.
Recent advancements in LLMs have introduced reasoning methods at inference time to improve their performance and maximize their capabilities. In this work, we propose integrating these reasoning strategies with KGs to anchor every step or "thought" of the reasoning chains in KG data. Specifically, we evaluate both agentic and automated search methods across several reasoning strategies, including Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT), using GRBench, a benchmark dataset for graph reasoning with domain-specific graphs. Our experiments demonstrate that this approach consistently outperforms baseline models, highlighting the benefits of grounding LLM reasoning processes in structured KG data.
△ Less
Submitted 21 February, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset
Authors:
Sukhandeep Kaur,
Mubashir Buhari,
Naman Khandelwal,
Priyansh Tyagi,
Kiran Sharma
Abstract:
Deepfakes offer great potential for innovation and creativity, but they also pose significant risks to privacy, trust, and security. With a vast Hindi-speaking population, India is particularly vulnerable to deepfake-driven misinformation campaigns. Fake videos or speeches in Hindi can have an enormous impact on rural and semi-urban communities, where digital literacy tends to be lower and people…
▽ More
Deepfakes offer great potential for innovation and creativity, but they also pose significant risks to privacy, trust, and security. With a vast Hindi-speaking population, India is particularly vulnerable to deepfake-driven misinformation campaigns. Fake videos or speeches in Hindi can have an enormous impact on rural and semi-urban communities, where digital literacy tends to be lower and people are more inclined to trust video content. The development of effective frameworks and detection tools to combat deepfake misuse requires high-quality, diverse, and extensive datasets. The existing popular datasets like FF-DF (FaceForensics++), and DFDC (DeepFake Detection Challenge) are based on English language.. Hence, this paper aims to create a first novel Hindi deep fake dataset, named ``Hindi audio-video-Deepfake'' (HAV-DF). The dataset has been generated using the faceswap, lipsyn and voice cloning methods. This multi-step process allows us to create a rich, varied dataset that captures the nuances of Hindi speech and facial expressions, providing a robust foundation for training and evaluating deepfake detection models in a Hindi language context. It is unique of its kind as all of the previous datasets contain either deepfake videos or synthesized audio. This type of deepfake dataset can be used for training a detector for both deepfake video and audio datasets. Notably, the newly introduced HAV-DF dataset demonstrates lower detection accuracy's across existing detection methods like Headpose, Xception-c40, etc. Compared to other well-known datasets FF-DF, and DFDC. This trend suggests that the HAV-DF dataset presents deeper challenges to detect, possibly due to its focus on Hindi language content and diverse manipulation techniques. The HAV-DF dataset fills the gap in Hindi-specific deepfake datasets, aiding multilingual deepfake detection development.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
Diversity and Inclusion in AI for Recruitment: Lessons from Industry Workshop
Authors:
Muneera Bano,
Didar Zowghi,
Fernando Mourao,
Sarah Kaur,
Tao Zhang
Abstract:
Artificial Intelligence (AI) systems for online recruitment markets have the potential to significantly enhance the efficiency and effectiveness of job placements and even promote fairness or inclusive hiring practices. Neglecting Diversity and Inclusion (D&I) in these systems, however, can perpetuate biases, leading to unfair hiring practices and decreased workplace diversity, while exposing orga…
▽ More
Artificial Intelligence (AI) systems for online recruitment markets have the potential to significantly enhance the efficiency and effectiveness of job placements and even promote fairness or inclusive hiring practices. Neglecting Diversity and Inclusion (D&I) in these systems, however, can perpetuate biases, leading to unfair hiring practices and decreased workplace diversity, while exposing organisations to legal and reputational risks. Despite the acknowledged importance of D&I in AI, there is a gap in research on effectively implementing D&I guidelines in real-world recruitment systems. Challenges include a lack of awareness and framework for operationalising D&I in a cost-effective, context-sensitive manner. This study aims to investigate the practical application of D&I guidelines in AI-driven online job-seeking systems, specifically exploring how these principles can be operationalised to create more inclusive recruitment processes. We conducted a co-design workshop with a large multinational recruitment company focusing on two AI-driven recruitment use cases. User stories and personas were applied to evaluate the impacts of AI on diverse stakeholders. Follow-up interviews were conducted to assess the workshop's long-term effects on participants' awareness and application of D&I principles. The co-design workshop successfully increased participants' understanding of D&I in AI. However, translating awareness into operational practice posed challenges, particularly in balancing D&I with business goals. The results suggest developing tailored D&I guidelines and ongoing support to ensure the effective adoption of inclusive AI practices.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
FinQAPT: Empowering Financial Decisions with End-to-End LLM-driven Question Answering Pipeline
Authors:
Kuldeep Singh,
Simerjot Kaur,
Charese Smiley
Abstract:
Financial decision-making hinges on the analysis of relevant information embedded in the enormous volume of documents in the financial domain. To address this challenge, we developed FinQAPT, an end-to-end pipeline that streamlines the identification of relevant financial reports based on a query, extracts pertinent context, and leverages Large Language Models (LLMs) to perform downstream tasks. T…
▽ More
Financial decision-making hinges on the analysis of relevant information embedded in the enormous volume of documents in the financial domain. To address this challenge, we developed FinQAPT, an end-to-end pipeline that streamlines the identification of relevant financial reports based on a query, extracts pertinent context, and leverages Large Language Models (LLMs) to perform downstream tasks. To evaluate the pipeline, we experimented with various techniques to optimize the performance of each module using the FinQA dataset. We introduced a novel clustering-based negative sampling technique to enhance context extraction and a novel prompting method called Dynamic N-shot Prompting to boost the numerical question-answering capabilities of LLMs. At the module level, we achieved state-of-the-art accuracy on FinQA, attaining an accuracy of 80.6%. However, at the pipeline level, we observed decreased performance due to challenges in extracting relevant context from financial reports. We conducted a detailed error analysis of each module and the end-to-end pipeline, pinpointing specific challenges that must be addressed to develop a robust solution for handling complex financial tasks.
△ Less
Submitted 31 October, 2024; v1 submitted 17 October, 2024;
originally announced October 2024.
-
Can Models Learn Skill Composition from Examples?
Authors:
Haoyu Zhao,
Simran Kaur,
Dingli Yu,
Anirudh Goyal,
Sanjeev Arora
Abstract:
As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent stud…
▽ More
As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the SKILL-MIX evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified $k$-tuple of language skills. While small models struggled with composing even with $k=3$, larger models like GPT-4 performed reasonably well with $k=5$ and $6$.
In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills -- including rhetorical, literary, reasoning, theory of mind, and common sense -- GPT-4 was used to generate text samples that exhibit random subsets of $k$ skills. Subsequent fine-tuning of 7B and 13B parameter models on these combined skill texts, for increasing values of $k$, revealed the following findings: (1) Training on combinations of $k=2$ and $3$ skills results in noticeable improvements in the ability to compose texts with $k=4$ and $5$ skills, despite models never having seen such examples during training. (2) When skill categories are split into training and held-out groups, models significantly improve at composing texts with held-out skills during testing despite having only seen training skills during fine-tuning, illustrating the efficacy of the training approach even with previously unseen skills. This study also suggests that incorporating skill-rich (potentially synthetic) text into training can substantially enhance the compositional capabilities of models.
△ Less
Submitted 18 January, 2025; v1 submitted 29 September, 2024;
originally announced September 2024.
-
CRoP: Context-wise Robust Static Human-Sensing Personalization
Authors:
Sawinder Kaur,
Avery Gump,
Yi Xiao,
Jingyu Xin,
Harshit Sharma,
Nina R Benway,
Jonathan L Preston,
Asif Salekin
Abstract:
The advancement in deep learning and internet-of-things have led to diverse human sensing applications. However, distinct patterns in human sensing, influenced by various factors or contexts, challenge the generic neural network model's performance due to natural distribution shifts. To address this, personalization tailors models to individual users. Yet most personalization studies overlook intr…
▽ More
The advancement in deep learning and internet-of-things have led to diverse human sensing applications. However, distinct patterns in human sensing, influenced by various factors or contexts, challenge the generic neural network model's performance due to natural distribution shifts. To address this, personalization tailors models to individual users. Yet most personalization studies overlook intra-user heterogeneity across contexts in sensory data, limiting intra-user generalizability. This limitation is especially critical in clinical applications, where limited data availability hampers both generalizability and personalization. Notably, intra-user sensing attributes are expected to change due to external factors such as treatment progression, further complicating the challenges. To address the intra-user generalization challenge, this work introduces CRoP, a novel static personalization approach. CRoP leverages off-the-shelf pre-trained models as generic starting points and captures user-specific traits through adaptive pruning on a minimal sub-network while allowing generic knowledge to be incorporated in remaining parameters. CRoP demonstrates superior personalization effectiveness and intra-user robustness across four human-sensing datasets, including two from real-world health domains, underscoring its practical and social impact. Additionally, to support CRoP's generalization ability and design choices, we provide empirical justification through gradient inner product analysis, ablation studies, and comparisons against state-of-the-art baselines.
△ Less
Submitted 20 May, 2025; v1 submitted 26 September, 2024;
originally announced September 2024.
-
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Authors:
Simran Kaur,
Simon Park,
Anirudh Goyal,
Sanjeev Arora
Abstract:
We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core "skills" for instruction-following by directly prompting the model. This is inspired by ``LLM metacognition'' of Didolkar et al. (2024); (2) Data ge…
▽ More
We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core "skills" for instruction-following by directly prompting the model. This is inspired by ``LLM metacognition'' of Didolkar et al. (2024); (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty. The estimated cost of creating the dataset is under $600.
Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just 4K examples, LLaMA-3-8B-Base achieves 42.76% length-controlled win rate on AlpacaEval 2.0, a level similar to frontier models like Claude 3 Opus and LLaMA-3.1-405B-Instruct. Ablation studies also suggest plausible reasons for why creating open instruction-tuning datasets via naive crowd-sourcing has proved difficult. In our dataset, adding 20% low quality answers (``shirkers'') causes a noticeable degradation in performance. The Instruct-SkillMix pipeline seems flexible and adaptable to other settings.
△ Less
Submitted 28 May, 2025; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency
Authors:
Toyin Aguda,
Suchetha Siddagangappa,
Elena Kochkina,
Simerjot Kaur,
Dongsheng Wang,
Charese Smiley,
Sameena Shah
Abstract:
Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data a…
▽ More
Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data annotators for extracting relations in financial documents. We compare the annotations produced by three LLMs (GPT-4, PaLM 2, and MPT Instruct) against expert annotators and crowdworkers. We demonstrate that the current state-of-the-art LLMs can be sufficient alternatives to non-expert crowdworkers. We analyze models using various prompts and parameter settings and find that customizing the prompts for each relation group by providing specific examples belonging to those groups is paramount. Furthermore, we introduce a reliability index (LLM-RelIndex) used to identify outputs that may require expert attention. Finally, we perform an extensive time, cost and error analysis and provide recommendations for the collection and usage of automated annotations in domain-specific settings.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Identifying Multiple Personalities in Large Language Models with External Evaluation
Authors:
Xiaoyang Song,
Yuta Adachi,
Jessie Feng,
Mouwei Lin,
Linhao Yu,
Frank Li,
Akshat Gupta,
Gopala Anumanchipalli,
Simerjot Kaur
Abstract:
As Large Language Models (LLMs) are integrated with human daily applications rapidly, many societal and ethical concerns are raised regarding the behavior of LLMs. One of the ways to comprehend LLMs' behavior is to analyze their personalities. Many recent studies quantify LLMs' personalities using self-assessment tests that are created for humans. Yet many critiques question the applicability and…
▽ More
As Large Language Models (LLMs) are integrated with human daily applications rapidly, many societal and ethical concerns are raised regarding the behavior of LLMs. One of the ways to comprehend LLMs' behavior is to analyze their personalities. Many recent studies quantify LLMs' personalities using self-assessment tests that are created for humans. Yet many critiques question the applicability and reliability of these self-assessment tests when applied to LLMs. In this paper, we investigate LLM personalities using an alternate personality measurement method, which we refer to as the external evaluation method, where instead of prompting LLMs with multiple-choice questions in the Likert scale, we evaluate LLMs' personalities by analyzing their responses toward open-ended situational questions using an external machine learning model. We first fine-tuned a Llama2-7B model as the MBTI personality predictor that outperforms the state-of-the-art models as the tool to analyze LLMs' responses. Then, we prompt the LLMs with situational questions and ask them to generate Twitter posts and comments, respectively, in order to assess their personalities when playing two different roles. Using the external personality evaluation method, we identify that the obtained personality types for LLMs are significantly different when generating posts versus comments, whereas humans show a consistent personality profile in these two different situations. This shows that LLMs can exhibit different personalities based on different scenarios, thus highlighting a fundamental difference between personality in LLMs and humans. With our work, we call for a re-evaluation of personality definition and measurement in LLMs.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
The Influence of Biomedical Research on Future Business Funding: Analyzing Scientific Impact and Content in Industrial Investments
Authors:
Reza Khanmohammadi,
Simerjot Kaur,
Charese H. Smiley,
Tuka Alhanai,
Ivan Brugere,
Armineh Nourbakhsh,
Mohammad M. Ghassemi
Abstract:
This paper investigates the relationship between scientific innovation in biomedical sciences and its impact on industrial activities, focusing on how the historical impact and content of scientific papers influenced future funding and innovation grant application content for small businesses. The research incorporates bibliometric analyses along with SBIR (Small Business Innovation Research) data…
▽ More
This paper investigates the relationship between scientific innovation in biomedical sciences and its impact on industrial activities, focusing on how the historical impact and content of scientific papers influenced future funding and innovation grant application content for small businesses. The research incorporates bibliometric analyses along with SBIR (Small Business Innovation Research) data to yield a holistic view of the science-industry interface. By evaluating the influence of scientific innovation on industry across 10,873 biomedical topics and taking into account their taxonomic relationships, we present an in-depth exploration of science-industry interactions where we quantify the temporal effects and impact latency of scientific advancements on industrial activities, spanning from 2010 to 2021. Our findings indicate that scientific progress substantially influenced industrial innovation funding and the direction of industrial innovation activities. Approximately 76% and 73% of topics showed a correlation and Granger-causality between scientific interest in papers and future funding allocations to relevant small businesses. Moreover, around 74% of topics demonstrated an association between the semantic content of scientific abstracts and future grant applications. Overall, the work contributes to a more nuanced and comprehensive understanding of the science-industry interface, opening avenues for more strategic resource allocation and policy developments aimed at fostering innovation.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Authors:
Dongsheng Wang,
Natraj Raman,
Mathieu Sibue,
Zhiqiang Ma,
Petr Babkin,
Simerjot Kaur,
Yulong Pei,
Armineh Nourbakhsh,
Xiaomo Liu
Abstract:
Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs…
▽ More
Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout. Our model differs from existing multimodal LLMs by avoiding expensive image encoders and focuses exclusively on bounding box information to incorporate the spatial layout structure. Specifically, the cross-alignment between text and spatial modalities is captured by decomposing the attention mechanism in classical transformers to a set of disentangled matrices. Furthermore, we devise a pre-training objective that learns to infill text segments. This approach allows us to address irregular layouts and heterogeneous content frequently encountered in visual documents. The pre-trained model is fine-tuned using a large-scale instruction dataset, covering four core document intelligence tasks. We demonstrate that our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Authors:
Dingli Yu,
Simran Kaur,
Arushi Gupta,
Jonah Brown-Cohen,
Anirudh Goyal,
Sanjeev Arora
Abstract:
With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023).
This…
▽ More
With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023).
This work introduces Skill-Mix, a new evaluation to measure ability to combine skills. Using a list of $N$ skills the evaluator repeatedly picks random subsets of $k$ skills and asks the LLM to produce text combining that subset of skills. Since the number of subsets grows like $N^k$, for even modest $k$ this evaluation will, with high probability, require the LLM to produce text significantly different from any text in the training set. The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.
Administering a version of to popular chatbots gave results that, while generally in line with prior expectations, contained surprises. Sizeable differences exist among model capabilities that are not captured by their ranking on popular LLM leaderboards ("cramming for the leaderboard"). Furthermore, simple probability calculations indicate that GPT-4's reasonable performance on $k=5$ is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training.
We sketch how the methodology can lead to a Skill-Mix based eco-system of open evaluations for AI capabilities of future models.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
REFinD: Relation Extraction Financial Dataset
Authors:
Simerjot Kaur,
Charese Smiley,
Akshat Gupta,
Joy Sain,
Dongsheng Wang,
Suchetha Siddagangappa,
Toyin Aguda,
Sameena Shah
Abstract:
A number of datasets for Relation Extraction (RE) have been created to aide downstream tasks such as information retrieval, semantic search, question answering and textual entailment. However, these datasets fail to capture financial-domain specific challenges since most of these datasets are compiled using general knowledge sources such as Wikipedia, web-based text and news articles, hindering re…
▽ More
A number of datasets for Relation Extraction (RE) have been created to aide downstream tasks such as information retrieval, semantic search, question answering and textual entailment. However, these datasets fail to capture financial-domain specific challenges since most of these datasets are compiled using general knowledge sources such as Wikipedia, web-based text and news articles, hindering real-life progress and adoption within the financial world. To address this limitation, we propose REFinD, the first large-scale annotated dataset of relations, with $\sim$29K instances and 22 relations amongst 8 types of entity pairs, generated entirely over financial documents. We also provide an empirical evaluation with various state-of-the-art models as benchmarks for the RE task and highlight the challenges posed by our dataset. We observed that various state-of-the-art deep learning models struggle with numeric inference, relational and directional ambiguity.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Causal Categorization of Mental Health Posts using Transformers
Authors:
Simranjeet Kaur,
Ritika Bhardwaj,
Aastha Jain,
Muskan Garg,
Chandni Saxena
Abstract:
With recent developments in digitization of clinical psychology, NLP research community has revolutionized the field of mental health detection on social media. Existing research in mental health analysis revolves around the cross-sectional studies to classify users' intent on social media. For in-depth analysis, we investigate existing classifiers to solve the problem of causal categorization whi…
▽ More
With recent developments in digitization of clinical psychology, NLP research community has revolutionized the field of mental health detection on social media. Existing research in mental health analysis revolves around the cross-sectional studies to classify users' intent on social media. For in-depth analysis, we investigate existing classifiers to solve the problem of causal categorization which suggests the inefficiency of learning based methods due to limited training samples. To handle this challenge, we use transformer models and demonstrate the efficacy of a pre-trained transfer learning on "CAMS" dataset. The experimental result improves the accuracy and depicts the importance of identifying cause-and-effect relationships in the underlying text.
△ Less
Submitted 15 January, 2023; v1 submitted 6 January, 2023;
originally announced January 2023.
-
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Authors:
Zachary Novack,
Simran Kaur,
Tanya Marwah,
Saurabh Garg,
Zachary C. Lipton
Abstract:
A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training. However, to date, empirical evidence assessing the explanatory power of these hypotheses is lacking. In this paper, we conduct an…
▽ More
A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training. However, to date, empirical evidence assessing the explanatory power of these hypotheses is lacking. In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap. Additionally, we characterize how the quantities that SGD has been claimed to (implicitly) regularize change over the course of training. By using micro-batches, i.e. disjoint smaller subsets of each mini-batch, we empirically show that explicitly penalizing the gradient norm or the Fisher Information Matrix trace, averaged over micro-batches, in the large-batch regime recovers small-batch SGD generalization, whereas Jacobian-based regularizations fail to do so. This generalization performance is shown to often be correlated with how well the regularized model's gradient norms resemble those of small-batch SGD. We additionally show that this behavior breaks down as the micro-batch size approaches the batch size. Finally, we note that in this line of inquiry, positive experimental findings on CIFAR10 are often reversed on other datasets like CIFAR100, highlighting the need to test hypotheses on a wider collection of datasets.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
VeriCompress: A Tool to Streamline the Synthesis of Verified Robust Compressed Neural Networks from Scratch
Authors:
Sawinder Kaur,
Yi Xiao,
Asif Salekin
Abstract:
AI's widespread integration has led to neural networks (NNs) deployment on edge and similar limited-resource platforms for safety-critical scenarios. Yet, NN's fragility raises concerns about reliable inference. Moreover, constrained platforms demand compact networks. This study introduces VeriCompress, a tool that automates the search and training of compressed models with robustness guarantees.…
▽ More
AI's widespread integration has led to neural networks (NNs) deployment on edge and similar limited-resource platforms for safety-critical scenarios. Yet, NN's fragility raises concerns about reliable inference. Moreover, constrained platforms demand compact networks. This study introduces VeriCompress, a tool that automates the search and training of compressed models with robustness guarantees. These models are well-suited for safety-critical applications and adhere to predefined architecture and size limitations, making them deployable on resource-restricted platforms. The method trains models 2-3 times faster than the state-of-the-art approaches, surpassing relevant baseline approaches by average accuracy and robustness gains of 15.1 and 9.8 percentage points, respectively. When deployed on a resource-restricted generic platform, these models require 5-8 times less memory and 2-4 times less inference time than models used in verified robustness literature. Our comprehensive evaluation across various model architectures and datasets, including MNIST, CIFAR, SVHN, and a relevant pedestrian detection dataset, showcases VeriCompress's capacity to identify compressed verified robust models with reduced computation overhead compared to current standards. This underscores its potential as a valuable tool for end users, such as developers of safety-critical applications on edge or Internet of Things platforms, empowering them to create suitable models for safety-critical, resource-constrained platforms in their respective domains.
△ Less
Submitted 21 November, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
A survey on scheduling and mapping techniques in 3D Network-on-chip
Authors:
Simran Preet Kaur,
Manojit Ghose,
Ananya Pathak,
Rutuja Patole
Abstract:
Network-on-Chips (NoCs) have been widely employed in the design of multiprocessor system-on-chips (MPSoCs) as a scalable communication solution. NoCs enable communications between on-chip Intellectual Property (IP) cores and allow those cores to achieve higher performance by outsourcing their communication tasks. Mapping and Scheduling methodologies are key elements in assigning application tasks,…
▽ More
Network-on-Chips (NoCs) have been widely employed in the design of multiprocessor system-on-chips (MPSoCs) as a scalable communication solution. NoCs enable communications between on-chip Intellectual Property (IP) cores and allow those cores to achieve higher performance by outsourcing their communication tasks. Mapping and Scheduling methodologies are key elements in assigning application tasks, allocating the tasks to the IPs, and organising communication among them to achieve some specified objectives. The goal of this paper is to present a detailed state-of-the-art of research in the field of mapping and scheduling of applications on 3D NoC, classifying the works based on several dimensions and giving some potential research directions.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
On the Maximum Hessian Eigenvalue and Generalization
Authors:
Simran Kaur,
Jeremy Cohen,
Zachary C. Lipton
Abstract:
The mechanisms by which certain training interventions, such as increasing learning rates and applying batch normalization, improve the generalization of deep networks remains a mystery. Prior works have speculated that "flatter" solutions generalize better than "sharper" solutions to unseen data, motivating several metrics for measuring flatness (particularly $λ_{max}$, the largest eigenvalue of…
▽ More
The mechanisms by which certain training interventions, such as increasing learning rates and applying batch normalization, improve the generalization of deep networks remains a mystery. Prior works have speculated that "flatter" solutions generalize better than "sharper" solutions to unseen data, motivating several metrics for measuring flatness (particularly $λ_{max}$, the largest eigenvalue of the Hessian of the loss); and algorithms, such as Sharpness-Aware Minimization (SAM) [1], that directly optimize for flatness. Other works question the link between $λ_{max}$ and generalization. In this paper, we present findings that call $λ_{max}$'s influence on generalization further into question. We show that: (1) while larger learning rates reduce $λ_{max}$ for all batch sizes, generalization benefits sometimes vanish at larger batch sizes; (2) by scaling batch size and learning rate simultaneously, we can change $λ_{max}$ without affecting generalization; (3) while SAM produces smaller $λ_{max}$ for all batch sizes, generalization benefits (also) vanish with larger batch sizes; (4) for dropout, excessively high dropout probabilities can degrade generalization, even as they promote smaller $λ_{max}$; and (5) while batch-normalization does not consistently produce smaller $λ_{max}$, it nevertheless confers generalization benefits. While our experiments affirm the generalization benefits of large learning rates and SAM for minibatch SGD, the GD-SGD discrepancy demonstrates limits to $λ_{max}$'s ability to explain generalization in neural networks.
△ Less
Submitted 23 May, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
PCA-RF: An Efficient Parkinson's Disease Prediction Model based on Random Forest Classification
Authors:
Ishu Gupta,
Vartika Sharma,
Sizman Kaur,
Ashutosh Kumar Singh
Abstract:
In this modern era of overpopulation disease prediction is a crucial step in diagnosing various diseases at an early stage. With the advancement of various machine learning algorithms, the prediction has become quite easy. However, the complex and the selection of an optimal machine learning technique for the given dataset greatly affects the accuracy of the model. A large amount of datasets exist…
▽ More
In this modern era of overpopulation disease prediction is a crucial step in diagnosing various diseases at an early stage. With the advancement of various machine learning algorithms, the prediction has become quite easy. However, the complex and the selection of an optimal machine learning technique for the given dataset greatly affects the accuracy of the model. A large amount of datasets exists globally but there is no effective use of it due to its unstructured format. Hence, a lot of different techniques are available to extract something useful for the real world to implement. Therefore, accuracy becomes a major metric in evaluating the model. In this paper, a disease prediction approach is proposed that implements a random forest classifier on Parkinson's disease. We compared the accuracy of this model with the Principal Component Analysis (PCA) applied Artificial Neural Network (ANN) model and captured a visible difference. The model secured a significant accuracy of up to 90%.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Deadwooding: Robust Global Pruning for Deep Neural Networks
Authors:
Sawinder Kaur,
Ferdinando Fioretto,
Asif Salekin
Abstract:
The ability of Deep Neural Networks to approximate highly complex functions is key to their success. This benefit, however, comes at the expense of a large model size, which challenges its deployment in resource-constrained environments. Pruning is an effective technique used to limit this issue, but often comes at the cost of reduced accuracy and adversarial robustness. This paper addresses these…
▽ More
The ability of Deep Neural Networks to approximate highly complex functions is key to their success. This benefit, however, comes at the expense of a large model size, which challenges its deployment in resource-constrained environments. Pruning is an effective technique used to limit this issue, but often comes at the cost of reduced accuracy and adversarial robustness. This paper addresses these shortcomings and introduces Deadwooding, a novel global pruning technique that exploits a Lagrangian Dual method to encourage model sparsity while retaining accuracy and ensuring robustness. The resulting model is shown to significantly outperform the state-of-the-art studies in measures of robustness and accuracy.
△ Less
Submitted 22 September, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Insights Into Incitement: A Computational Perspective on Dangerous Speech on Twitter in India
Authors:
Saloni Dash,
Rynaa Grover,
Gazal Shekhawat,
Sukhnidh Kaur,
Dibyendu Mishra,
Joyojeet Pal
Abstract:
Dangerous speech on social media platforms can be framed as blatantly inflammatory, or be couched in innuendo. It is also centrally tied to who engages it - it can be driven by openly sectarian social media accounts, or through subtle nudges by influential accounts, allowing for complex means of reinforcing vilification of marginalized groups, an increasingly significant problem in the media envir…
▽ More
Dangerous speech on social media platforms can be framed as blatantly inflammatory, or be couched in innuendo. It is also centrally tied to who engages it - it can be driven by openly sectarian social media accounts, or through subtle nudges by influential accounts, allowing for complex means of reinforcing vilification of marginalized groups, an increasingly significant problem in the media environment in the Global South. We identify dangerous speech by influential accounts on Twitter in India around three key events, examining both the language and networks of messaging that condones or actively promotes violence against vulnerable groups. We characterize dangerous speech users by assigning Danger Amplification Belief scores and show that dangerous users are more active on Twitter as compared to other users as well as most influential in the network, in terms of a larger following as well as volume of verified accounts. We find that dangerous users have a more polarized viewership, suggesting that their audience is more susceptible to incitement. Using a mix of network centrality measures and qualitative analysis, we find that most dangerous accounts tend to either be in mass media related occupations or allied with low-ranking, right-leaning politicians, and act as "broadcasters" in the network, where they are best positioned to spearhead the rapid dissemination of dangerous speech across the platform.
△ Less
Submitted 6 November, 2021;
originally announced November 2021.
-
Parameterized Explanations for Investor / Company Matching
Authors:
Simerjot Kaur,
Ivan Brugere,
Andrea Stefanucci,
Armineh Nourbakhsh,
Sameena Shah,
Manuela Veloso
Abstract:
Matching companies and investors is usually considered a highly specialized decision making process. Building an AI agent that can automate such recommendation process can significantly help reduce costs, and eliminate human biases and errors. However, limited sample size of financial data-sets and the need for not only good recommendations, but also explaining why a particular recommendation is b…
▽ More
Matching companies and investors is usually considered a highly specialized decision making process. Building an AI agent that can automate such recommendation process can significantly help reduce costs, and eliminate human biases and errors. However, limited sample size of financial data-sets and the need for not only good recommendations, but also explaining why a particular recommendation is being made, makes this a challenging problem. In this work we propose a representation learning based recommendation engine that works extremely well with small datasets and demonstrate how it can be coupled with a parameterized explanation generation engine to build an explainable recommendation system for investor-company matching. We compare the performance of our system with human generated recommendations and demonstrate the ability of our algorithm to perform extremely well on this task. We also highlight how explainability helps with real-life adoption of our system.
△ Less
Submitted 27 October, 2021;
originally announced November 2021.
-
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Authors:
Jeremy M. Cohen,
Simran Kaur,
Yuanzhi Li,
J. Zico Kolter,
Ameet Talwalkar
Abstract:
We empirically demonstrate that full-batch gradient descent on neural network training objectives typically operates in a regime we call the Edge of Stability. In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / \text{(step size)}$, and the training loss behaves non-monotonically over short timescales, yet consistently decreases over long…
▽ More
We empirically demonstrate that full-batch gradient descent on neural network training objectives typically operates in a regime we call the Edge of Stability. In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / \text{(step size)}$, and the training loss behaves non-monotonically over short timescales, yet consistently decreases over long timescales. Since this behavior is inconsistent with several widespread presumptions in the field of optimization, our findings raise questions as to whether these presumptions are relevant to neural network training. We hope that our findings will inspire future efforts aimed at rigorously understanding optimization at the Edge of Stability. Code is available at https://github.com/locuslab/edge-of-stability.
△ Less
Submitted 23 November, 2022; v1 submitted 26 February, 2021;
originally announced March 2021.
-
Convolutional Neural Networks Towards Arduino Navigation of Indoor Environments
Authors:
Michael Muratov,
Sachkiran Kaur,
Michael Szpakowicz
Abstract:
In this paper we propose a number of tested ways in which a low-budget demo car could be made to navigate an indoor environment. Canny Edge Detection, Supervised Floor Detection and Imitation Learning were used separately and are contrasted in their effectiveness. The equipment used in this paper approximated an autonomous robot configured to work with a mobile device for image processing. This pa…
▽ More
In this paper we propose a number of tested ways in which a low-budget demo car could be made to navigate an indoor environment. Canny Edge Detection, Supervised Floor Detection and Imitation Learning were used separately and are contrasted in their effectiveness. The equipment used in this paper approximated an autonomous robot configured to work with a mobile device for image processing. This paper does not provide definitive solutions and simply illustrates the approaches taken to successfully achieve autonomous navigation of indoor environments. The successes and failures of all approaches were recorded and elaborated on to give the reader an insight into the construction of such an autonomous robot.
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
QnAMaker: Data to Bot in 2 Minutes
Authors:
Parag Agrawal,
Tulasi Menon,
Aya Kamel,
Michel Naim,
Chaikesh Chouragade,
Gurvinder Singh,
Rohan Kulkarni,
Anshuman Suri,
Sahithi Katakam,
Vineet Pratik,
Prakul Bansal,
Simerpreet Kaur,
Neha Rajput,
Anand Duggal,
Achraf Chalabi,
Prashant Choudhari,
Reddy Satti,
Niranjan Nayak
Abstract:
Having a bot for seamless conversations is a much-desired feature that products and services today seek for their websites and mobile apps. These bots help reduce traffic received by human support significantly by handling frequent and directly answerable known questions. Many such services have huge reference documents such as FAQ pages, which makes it hard for users to browse through this data.…
▽ More
Having a bot for seamless conversations is a much-desired feature that products and services today seek for their websites and mobile apps. These bots help reduce traffic received by human support significantly by handling frequent and directly answerable known questions. Many such services have huge reference documents such as FAQ pages, which makes it hard for users to browse through this data. A conversation layer over such raw data can lower traffic to human support by a great margin. We demonstrate QnAMaker, a service that creates a conversational layer over semi-structured data such as FAQ pages, product manuals, and support documents. QnAMaker is the popular choice for Extraction and Question-Answering as a service and is used by over 15,000 bots in production. It is also used by search interfaces and not just bots.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.
-
SUPAID: A Rule mining based method for automatic rollout decision aid for supervisors in fleet management systems
Authors:
Sahil Manchanda,
Arun Rajkumar,
Simarjot Kaur,
Narayanan Unny
Abstract:
The decision to rollout a vehicle is critical to fleet management companies as wrong decisions can lead to additional cost of maintenance and failures during journey. With the availability of large amount of data and advancement of machine learning techniques, the rollout decisions of a supervisor can be effectively automated and the mistakes in decisions made by the supervisor learnt. In this pap…
▽ More
The decision to rollout a vehicle is critical to fleet management companies as wrong decisions can lead to additional cost of maintenance and failures during journey. With the availability of large amount of data and advancement of machine learning techniques, the rollout decisions of a supervisor can be effectively automated and the mistakes in decisions made by the supervisor learnt. In this paper, we propose a novel learning algorithm SUPAID which under a natural 'one-way efficiency' assumption on the supervisor, uses a rule mining approach to rank the vehicles based on their roll-out feasibility thus helping prevent the supervisor from makingerroneous decisions. Our experimental results on real data from a public transit agency from a city in U.S show that the proposed method SUPAID can result in significant cost savings.
△ Less
Submitted 15 January, 2020; v1 submitted 10 January, 2020;
originally announced January 2020.
-
Are Perceptually-Aligned Gradients a General Property of Robust Classifiers?
Authors:
Simran Kaur,
Jeremy Cohen,
Zachary C. Lipton
Abstract:
For a standard convolutional neural network, optimizing over the input pixels to maximize the score of some target class will generally produce a grainy-looking version of the original image. However, Santurkar et al. (2019) demonstrated that for adversarially-trained neural networks, this optimization produces images that uncannily resemble the target class. In this paper, we show that these "per…
▽ More
For a standard convolutional neural network, optimizing over the input pixels to maximize the score of some target class will generally produce a grainy-looking version of the original image. However, Santurkar et al. (2019) demonstrated that for adversarially-trained neural networks, this optimization produces images that uncannily resemble the target class. In this paper, we show that these "perceptually-aligned gradients" also occur under randomized smoothing, an alternative means of constructing adversarially-robust classifiers. Our finding supports the hypothesis that perceptually-aligned gradients may be a general property of robust classifiers. We hope that our results will inspire research aimed at explaining this link between perceptually-aligned gradients and adversarial robustness.
△ Less
Submitted 23 October, 2019; v1 submitted 18 October, 2019;
originally announced October 2019.
-
How does Object-Oriented Code Refactoring Influence Software Quality? Research Landscape and Challenges
Authors:
Satnam Kaur,
Paramvir Singh
Abstract:
Context: Software refactoring aims to improve software quality and developer productivity. Numerous empirical studies investigating the impact of refactoring activities on software quality have been conducted over the last two decades. Objective: This study aims to perform a comprehensive systematic mapping study of existing empirical studies on evaluation of the effect of object-oriented code ref…
▽ More
Context: Software refactoring aims to improve software quality and developer productivity. Numerous empirical studies investigating the impact of refactoring activities on software quality have been conducted over the last two decades. Objective: This study aims to perform a comprehensive systematic mapping study of existing empirical studies on evaluation of the effect of object-oriented code refactoring activities on software quality attributes. Method: We followed a multi-stage scrutinizing process to select 142 primary studies published till December 2017. The selected primary studies were further classified based on several aspects to answer the research questions defined for this work. In addition, we applied vote-counting approach to combine the empirical results and their analysis reported in primary studies. Results: The findings indicate that studies conducted in academic settings found more positive impact of refactoring on software quality than studies performed in industries. In general, refactoring activities caused all quality attributes to improve or degrade except for cohesion, complexity, inheritance, fault-proneness and power consumption attributes. Furthermore, individual refactoring activities have variable effects on most quality attributes explored in primary studies, indicating that refactoring does not always improve all quality attributes. Conclusions: This study points out several open issues which require further investigation, e.g., lack of industrial validation, lesser coverage of refactoring activities, limited tool support, etc.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
Social Centrality using Network Hierarchy and Community Structure
Authors:
Rakhi Saxena,
Sharanjit Kaur,
Vasudha Bhatnagar
Abstract:
Several centrality measures have been formulated to quantify the notion of 'importance' of actors in social networks. Current measures scrutinize either local or global connectivity of the nodes and have been found to be inadequate for social networks. Ignoring hierarchy and community structure, which are inherent in all human social networks, is the primary cause of this inadequacy. Positional hi…
▽ More
Several centrality measures have been formulated to quantify the notion of 'importance' of actors in social networks. Current measures scrutinize either local or global connectivity of the nodes and have been found to be inadequate for social networks. Ignoring hierarchy and community structure, which are inherent in all human social networks, is the primary cause of this inadequacy. Positional hierarchy and embeddedness of an actor in the community are intuitively crucial determinants of his importance. The theory of social capital asserts that an actor's importance is derived from his position in network hierarchy as well as from the potential to mobilize resources through intra-community (bonding) and inter-community (bridging) ties. Inspired by this idea, we propose a novel centrality measure SC (Social Centrality) for actors in social networks. Our measure accounts for - i) an individual's propensity to socialize, and ii) his connections within and outside the community. These two factors are suitably aggregated to produce social centrality score. Comparative analysis of SC measure with classical and recent centrality measures using large public networks shows that it consistently produces a more realistic ranking of nodes. The inference is based on the available ground truth for each tested networks. Extensive analysis of rankings delivered by SC measure and mapping with known facts in well-studied networks justifies its effectiveness in diverse social networks. Scalability evaluation of SC measure justifies its efficacy for real-world large networks.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.
-
A Novel Framework for Intelligent Information Retrieval in Wireless Sensor Networks
Authors:
Savneet Kaur,
Deepali Virmani,
Satbir Jain
Abstract:
Recent advances in the development of the low-cost, power-efficient embedded devices, coupled with the rising need for support of new information processing paradigms such as smart spaces and military surveillance systems, have led to active research in large-scale, highly distributed sensor networks of small, wireless, low-power, unattended sensors and actuators. While applications keep diversify…
▽ More
Recent advances in the development of the low-cost, power-efficient embedded devices, coupled with the rising need for support of new information processing paradigms such as smart spaces and military surveillance systems, have led to active research in large-scale, highly distributed sensor networks of small, wireless, low-power, unattended sensors and actuators. While applications keep diversifying, one common property they share is the need for an efficient network architecture tailored towards information retrieval in sensor networks. Previous solutions designed for traditional networks serve as good references; however, due to the vast differences between previous paradigms and needs of sensor networks, a framework is required to gather and impart only the required information .To achieve this goal in this paper we have proposed a framework for intelligent information retrieval and dissemination to desired destination node. The proposed frame work combines three major concern areas in WSNs i.e. data aggregation, information retrieval and data dissemination in a single scenario. In the proposed framework data aggregation is responsible for combining information from all nodes and removing the redundant data. Information retrieval filters the processed data to obtain final information termed as intelligent data to be disseminated to the required destination node.
△ Less
Submitted 2 March, 2015;
originally announced March 2015.
-
Toward Refactoring of DMARF and GIPSY Case Studies -- a Team 9 SOEN6471-S14 Project Report
Authors:
Manpreet Kaur,
Ravjeet Singh,
Sukhveer Kaur,
Baljot Singh,
Savpreet Kaur,
Navkaran Singh,
Aman Ohri,
Ravenna Sharma
Abstract:
Software architecture consists of series of decisions taken to give a structural solution that meets all the technical and operational requirements. The paper involves code refactoring. Code refactoring is a process of changing the internal structure of the code without altering its external behavior. This paper focuses over open source systems experimental studies that are DMARF and GIPSY. We hav…
▽ More
Software architecture consists of series of decisions taken to give a structural solution that meets all the technical and operational requirements. The paper involves code refactoring. Code refactoring is a process of changing the internal structure of the code without altering its external behavior. This paper focuses over open source systems experimental studies that are DMARF and GIPSY. We have gone through various research papers and analyzed their architectures. Refactoring improves understandability, maintainability, extensibility of the code. Code smells were identified through various tools such as JDeodorant, Logiscope, and CodePro. Reverse engineering of DMARF and GIPSY were done for understanding the system. Tool used for this was Object Aid UML. For better understanding use cases, domain model, design class diagram are built.
△ Less
Submitted 23 December, 2014;
originally announced December 2014.
-
Modification of Contract Net Protocol(CNP) : A Rule-Updation Approach
Authors:
Sandeep Kaur,
Harjot Kaur,
Sumeet Kaur Sehra
Abstract:
Coordination in multi-agent system is very essential, in order to perform complex tasks and lead MAS towards its goal. Also, the member agents of multi-agent system should be autonomous as well as collaborative to accomplish the complex task for which multi-agent system is designed specifically. Contract-Net Protocol (CNP) is one of the coordination mechanisms which is used by multi-agent systems…
▽ More
Coordination in multi-agent system is very essential, in order to perform complex tasks and lead MAS towards its goal. Also, the member agents of multi-agent system should be autonomous as well as collaborative to accomplish the complex task for which multi-agent system is designed specifically. Contract-Net Protocol (CNP) is one of the coordination mechanisms which is used by multi-agent systems which prefer coordination through interaction protocols. In order to overcome the limitations of conventional CNP, this paper proposes a modification in conventional CNP called updated-CNP. Updated-CNP is an effort towards updating of a CNP in terms of its limitations of modifiability and communication overhead. The limitation of the modification of tasks, if the task requirements change at any instance, corresponding to tasks which are allocated to contractor agents by manager agents is possible in our updated-CNP version, which was not possible in the case of conventional-CNP, as it has to be restarted in the case of task modification. This in turn will be reducing the communication overhead of CNP, which is time taken by various agents using CNP to pass messages to each other. For the illustration of the updated CNP, we have used a sound predator-prey case study.
△ Less
Submitted 16 December, 2013;
originally announced December 2013.
-
Design of Generic Framework for Botnet Detection in Network Forensics
Authors:
Sukhdilpreet Kaur,
Amandeep Verma
Abstract:
With the raise in practice of Internet, in social, personal, commercial and other aspects of life, the cybercrime is as well escalating at an alarming rate. Such usage of Internet in diversified areas also augmented the illegal activities, which in turn, bids many network attacks and threats. Network forensics is used to detect the network attacks. This can be viewed as the extension of network se…
▽ More
With the raise in practice of Internet, in social, personal, commercial and other aspects of life, the cybercrime is as well escalating at an alarming rate. Such usage of Internet in diversified areas also augmented the illegal activities, which in turn, bids many network attacks and threats. Network forensics is used to detect the network attacks. This can be viewed as the extension of network security. It is the technology, which detects and also suggests prevention of the various network attacks. Botnet is one of the most common attacks and is regarded as a network of hacked computers. It captures the network packet, store it and then analyze and correlate to find the source of attack. Various methods based on this approach for botnet detection are in literature, but a generalized method is lacking. So, there is a requirement to design a generic framework that can be used by any botnet detection. This framework is of use for researchers, in the development of their own method of botnet detection, by means of providing methodology and guidelines. In this paper, various prevalent methods of botnet detection are studied, commonalities among them are established and then a generalized model for the detection of botnet is proposed. The proposed framework is described as UML diagrams.
△ Less
Submitted 2 October, 2013;
originally announced October 2013.
-
Scheduling arc shut downs in a network to maximize flow over time with a bounded number of jobs per time period
Authors:
Natashia Boland,
Thomas Kalinowski,
Simranjit Kaur
Abstract:
We study the problem of scheduling maintenance on arcs of a capacitated network so as to maximize the total flow from a source node to a sink node over a set of time periods. Maintenance on an arc shuts down the arc for the duration of the period in which its maintenance is scheduled, making its capacity zero for that period. A set of arcs is designated to have maintenance during the planning peri…
▽ More
We study the problem of scheduling maintenance on arcs of a capacitated network so as to maximize the total flow from a source node to a sink node over a set of time periods. Maintenance on an arc shuts down the arc for the duration of the period in which its maintenance is scheduled, making its capacity zero for that period. A set of arcs is designated to have maintenance during the planning period, which will require each to be shut down for exactly one time period. In general this problem is known to be NP-hard, and several special instance classes have been studied. Here we propose an additional constraint which limits the number of maintenance jobs per time period, and we study the impact of this on the complexity.
△ Less
Submitted 19 January, 2015; v1 submitted 13 July, 2013;
originally announced July 2013.
-
Scheduling unit processing time arc shutdown jobs to maximize network flow over time: complexity results
Authors:
Natashia Boland,
Thomas Kalinowski,
Reena Kapoor,
Simranjit Kaur
Abstract:
We study the problem of scheduling maintenance on arcs of a capacitated network so as to maximize the total flow from a source node to a sink node over a set of time periods. Maintenance on an arc shuts down the arc for the duration of the period in which its maintenance is scheduled, making its capacity zero for that period. A set of arcs is designated to have maintenance during the planning peri…
▽ More
We study the problem of scheduling maintenance on arcs of a capacitated network so as to maximize the total flow from a source node to a sink node over a set of time periods. Maintenance on an arc shuts down the arc for the duration of the period in which its maintenance is scheduled, making its capacity zero for that period. A set of arcs is designated to have maintenance during the planning period, which will require each to be shut down for exactly one time period. In general this problem is known to be NP-hard. Here we identify a number of characteristics that are relevant for the complexity of instance classes. In particular, we discuss instances with restrictions on the set of arcs that have maintenance to be scheduled; series parallel networks; capacities that are balanced, in the sense that the total capacity of arcs entering a (non-terminal) node equals the total capacity of arcs leaving the node; and identical capacities on all arcs.
△ Less
Submitted 20 June, 2013;
originally announced June 2013.
-
Improved Accuracy of PSO and DE using Normalization: an Application to Stock Price Prediction
Authors:
Savinderjit Kaur,
Veenu Mangat
Abstract:
Data Mining is being actively applied to stock market since 1980s. It has been used to predict stock prices, stock indexes, for portfolio management, trend detection and for developing recommender systems. The various algorithms which have been used for the same include ANN, SVM, ARIMA, GARCH etc. Different hybrid models have been developed by combining these algorithms with other algorithms like…
▽ More
Data Mining is being actively applied to stock market since 1980s. It has been used to predict stock prices, stock indexes, for portfolio management, trend detection and for developing recommender systems. The various algorithms which have been used for the same include ANN, SVM, ARIMA, GARCH etc. Different hybrid models have been developed by combining these algorithms with other algorithms like roughest, fuzzy logic, GA, PSO, DE, ACO etc. to improve the efficiency. This paper proposes DE-SVM model (Differential EvolutionSupport vector Machine) for stock price prediction. DE has been used to select best free parameters combination for SVM to improve results. The paper also compares the results of prediction with the outputs of SVM alone and PSO-SVM model (Particle Swarm Optimization). The effect of normalization of data on the accuracy of prediction has also been studied.
△ Less
Submitted 5 February, 2013;
originally announced February 2013.
-
High Speed and Area Efficient 2D DWT Processor based Image Compression" Signal & Image Processing
Authors:
Sugreev Kaur,
Rajesh Mehra
Abstract:
This paper presents a high speed and area efficient DWT processor based design for Image Compression applications. In this proposed design, pipelined partially serial architecture has been used to enhance the speed along with optimal utilization and resources available on target FPGA. The proposed model has been designed and simulated using Simulink and System Generator blocks, synthesized with Xi…
▽ More
This paper presents a high speed and area efficient DWT processor based design for Image Compression applications. In this proposed design, pipelined partially serial architecture has been used to enhance the speed along with optimal utilization and resources available on target FPGA. The proposed model has been designed and simulated using Simulink and System Generator blocks, synthesized with Xilinx Synthesis tool (XST) and implemented on Spartan 2 and 3 based XC2S100-5tq144 and XC3S500E-4fg320 target device. The results show that proposed design can operate at maximum frequency 231 MHz in case of Spartan 3 by consuming power of 117mW at 28 degree/c junction temperature. The result comparison has shown an improvement of 15% in speed.
△ Less
Submitted 31 December, 2010;
originally announced January 2011.
-
An Efficient Watermarking Algorithm to Improve Payload and Robustness without Affecting Image Perceptual Quality
Authors:
Er. Deepak Aggarwal,
Er. Sandeep Kaur,
Er. Anantdeep
Abstract:
Capacity, Robustness, & Perceptual quality of watermark data are very important issues to be considered. A lot of research is going on to increase these parameters for watermarking of the digital images, as there is always a tradeoff among them. . In this paper an efficient watermarking algorithm to improve payload and robustness without affecting perceptual quality of image data based on DWT is d…
▽ More
Capacity, Robustness, & Perceptual quality of watermark data are very important issues to be considered. A lot of research is going on to increase these parameters for watermarking of the digital images, as there is always a tradeoff among them. . In this paper an efficient watermarking algorithm to improve payload and robustness without affecting perceptual quality of image data based on DWT is discussed. The aim of the paper is to employ the nested watermarks in wavelet domain which increases the capacity and ultimately the robustness against attacks and selection of different scaling factor values for LL & HH bands and during embedding not to create the visible artifacts in the original image and therefore the original and watermarked image is similar.
△ Less
Submitted 26 April, 2010;
originally announced April 2010.
-
Effect of Crosstalk on Permutation in Optical Multistage Interconnection Networks
Authors:
Er. Sandeep Kaur,
Er. Anantdeep,
Er. Deepak Aggarwal
Abstract:
Optical MINs hold great promise and have advantages over their electronic networks.they also hold their own challenges. More research has been done on Electronic Multistage Interconnection Networks, (EMINs) but these days optical communication is a good networking choice to meet the increasing demands of high-performance computing communication applications for high bandwidth applications. The ele…
▽ More
Optical MINs hold great promise and have advantages over their electronic networks.they also hold their own challenges. More research has been done on Electronic Multistage Interconnection Networks, (EMINs) but these days optical communication is a good networking choice to meet the increasing demands of high-performance computing communication applications for high bandwidth applications. The electronic Multistage Interconnection Networks (EMINs) and the Optical Multistage Interconnection Networks (OMINs) have many similarities, but there are some fundamental differences between them such as the optical-loss during switching and the crosstalk problem in the optical switches. To reduce the negative effect of crosstalk, various approaches which apply the concept of dilation in either the space or time domain have been proposed. With the space domain approach, extra SEs are used to ensure that at most one input and one output of every SE will be used at any given time. For an Optical network without crosstalk, it is needed to divide the messages into several groups, and then deliver the messages using one time slot (pass) for each group, which is called the time division multiplexing. This Paper discusses the permutation passability behavior of optical MINs. The bandwidth of optical MINs with or without crosstalk has also been explained. The results thus obtained shows that the performance of the networks improves by allowing crosstalk to some extent.
△ Less
Submitted 26 April, 2010;
originally announced April 2010.
-
Mobile Zigbee Sensor Networks
Authors:
Er. Anantdeep,
Er. Sandeep kaur,
Er. Balpreet Kaur
Abstract:
OPNET Modeler accelerates network R&D and improves product quality through high-fidelity modeling and scalable simulation. It provides a virtual environment for designing protocols and devices, and for testing and demonstrating designs in realistic scenarios prior to production. OPNET Modeler supports 802.15.4 standard and has been used to make a model of PAN. Iterations have been performed by cha…
▽ More
OPNET Modeler accelerates network R&D and improves product quality through high-fidelity modeling and scalable simulation. It provides a virtual environment for designing protocols and devices, and for testing and demonstrating designs in realistic scenarios prior to production. OPNET Modeler supports 802.15.4 standard and has been used to make a model of PAN. Iterations have been performed by changing the Power of the transmitter and the throughput will has been analyzed to arrive at optimal values.An energy-efficient wireless home network based on IEEE 802.15.4, a novel architecture has been proposed. In this architecture, all nodes are classified into stationary nodes and mobile nodes according to the functionality of each node. Mobile nodes are usually battery-powered, and therefore need low-power operation. In order to improve power consumption of mobile nodes, effective handover sequence based on MAC broadcast and transmission power control based on LQ (link quality) are employed. Experimental results demonstrate that by using the proposed architecture, communication time and power consumption of mobile nodes can be reduced by 1.2 seconds and 42.8%, respectively.
△ Less
Submitted 26 April, 2010;
originally announced April 2010.