-
Training with Pseudo-Code for Instruction Following
Authors:
Prince Kumar,
Rudra Murthy,
Riyaz Bhat,
Danish Contractor
Abstract:
Despite the rapid progress in the capabilities of Large Language Models (LLMs), they continue to have difficulty following relatively simple, unambiguous instructions, especially when compositions are involved. In this paper, we take inspiration from recent work that suggests that models may follow instructions better when they are expressed in pseudo-code. However, writing pseudo-code programs ca…
▽ More
Despite the rapid progress in the capabilities of Large Language Models (LLMs), they continue to have difficulty following relatively simple, unambiguous instructions, especially when compositions are involved. In this paper, we take inspiration from recent work that suggests that models may follow instructions better when they are expressed in pseudo-code. However, writing pseudo-code programs can be tedious and using few-shot demonstrations to craft code representations for use in inference can be unnatural for non-expert users of LLMs. To overcome these limitations, we propose fine-tuning LLMs with instruction-tuning data that additionally includes instructions re-expressed in pseudo-code along with the final response. We evaluate models trained using our method on $11$ publicly available benchmarks comprising of tasks related to instruction-following, mathematics, and common-sense reasoning. We conduct rigorous experiments with $5$ different models and find that not only do models follow instructions better when trained with pseudo-code, they also retain their capabilities on the other tasks related to mathematical and common sense reasoning. Specifically, we observe a relative gain of $3$--$19$% on instruction-following benchmark, and an average gain of upto 14% across all tasks.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Federated Learning over 5G, WiFi, and Ethernet: Measurements and Evaluation
Authors:
Robert J. Hayek,
Joaquin Chung,
Kayla Comer,
Chandra R. Murthy,
Rajkumar Kettimuthu,
Igor Kadota
Abstract:
Federated Learning (FL) deployments using IoT devices is an area that is poised to significantly benefit from advances in NextG wireless. In this paper, we deploy a FL application using a 5G-NR Standalone (SA) testbed with open-source and Commercial Off-the-Shelf (COTS) components. The 5G testbed architecture consists of a network of resource-constrained edge devices, namely Raspberry Pi's, and a…
▽ More
Federated Learning (FL) deployments using IoT devices is an area that is poised to significantly benefit from advances in NextG wireless. In this paper, we deploy a FL application using a 5G-NR Standalone (SA) testbed with open-source and Commercial Off-the-Shelf (COTS) components. The 5G testbed architecture consists of a network of resource-constrained edge devices, namely Raspberry Pi's, and a central server equipped with a Software Defined Radio (SDR) and running O-RAN software. Our testbed allows edge devices to communicate with the server using WiFi and Ethernet, instead of 5G. FL is deployed using the Flower FL framework, for which we developed a comprehensive instrumentation tool to collect and analyze diverse communications and machine learning performance metrics including: model aggregation time, downlink transmission time, training time, and uplink transmission time. Leveraging these measurements, we perform a comparative analysis of the FL application across three network interfaces: 5G, WiFi, and Ethernet. Our experimental results suggest that, on 5G, the uplink model transfer time is a significant factor in convergence time of FL. In particular, we find that the 5G uplink contributes to roughly 23% of the duration of one average communication round when using all edge devices in our testbed. When comparing the uplink time of the 5G testbed, we find that it is 33.3x higher than Ethernet and 17.8x higher than WiFi. Our results also suggest that 5G exacerbates the well-known straggler effect. For reproducibility, we have open-sourced our FL application, instrumentation tools, and testbed configuration.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
Performance Analysis of Multi-IRS Aided Multiple Operator Systems at mmWave Frequencies
Authors:
Souradeep Ghosh,
L. Yashvanth,
Chandra R. Murthy
Abstract:
Intelligent reflecting surfaces (IRSs) are envisioned to enhance the performance of mmWave wireless systems. In practice, multiple mobile operators (MO) coexist in an area and provide simultaneous and independent services to user-equipments (UEs) on different frequency bands. Then, if each MO deploys an IRS to enhance its performance, the IRSs also alter the channels of UEs of other MOs. In this c…
▽ More
Intelligent reflecting surfaces (IRSs) are envisioned to enhance the performance of mmWave wireless systems. In practice, multiple mobile operators (MO) coexist in an area and provide simultaneous and independent services to user-equipments (UEs) on different frequency bands. Then, if each MO deploys an IRS to enhance its performance, the IRSs also alter the channels of UEs of other MOs. In this context, this paper addresses the following questions: can an MO still continue to control its IRS independently of other MOs and IRSs? Is joint optimization of IRSs deployed by different MOs and inter-MO cooperation needed? To that end, by considering the mmWave bands, we first derive the ergodic sum spectral efficiency (SE) in a $2$-MO system for the following schemes: 1) joint optimization of an overall phase angle of the IRSs with MO cooperation, 2) MO cooperation via time-sharing, and 3) no cooperation between the MOs. We find that even with no cooperation between the MOs, the performance of a given MO is not degraded by the presence of an out-of-band (OOB) MO deploying and independently controlling its own IRS. On the other hand, the SE gain obtained at a given MO using joint optimization and cooperation over the no-cooperation scheme decreases inversely with the number of elements in the IRS deployed by the other MO. We generalize our results to a multiple MO setup and show that the gain in the sum-SE over the no-cooperation case increases at least linearly with the number of OOB MOs. Finally, we numerically verify our findings and conclude that every MO can independently operate and tune its IRS; cooperation via optimizing an overall phase only brings marginal benefits in practice.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data
Authors:
Juntao Tan,
Liangwei Yang,
Zuxin Liu,
Zhiwei Liu,
Rithesh Murthy,
Tulika Manoj Awalgaonkar,
Jianguo Zhang,
Weiran Yao,
Ming Zhu,
Shirley Kokane,
Silvio Savarese,
Huan Wang,
Caiming Xiong,
Shelby Heinecke
Abstract:
Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users. A key scenario in this domain involves enabling AI models to access and interpret a user's private data (e.g., conversation history, user-AI interactions, app usage) to understand personal details such as biographical information, preferences, and social connections. Howe…
▽ More
Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users. A key scenario in this domain involves enabling AI models to access and interpret a user's private data (e.g., conversation history, user-AI interactions, app usage) to understand personal details such as biographical information, preferences, and social connections. However, due to the sensitive nature of such data, there are no publicly available datasets that allow us to assess an AI model's ability to understand users through direct access to personal information.
To address this gap, we introduce a synthetic data generation pipeline that creates diverse, realistic user profiles and private documents simulating human activities. Leveraging this synthetic data, we present PersonaBench, a benchmark designed to evaluate AI models' performance in understanding personal information derived from simulated private user data.
We evaluate Retrieval-Augmented Generation (RAG) pipelines using questions directly related to a user's personal information, supported by the relevant private documents provided to the models. Our results reveal that current retrieval-augmented AI models struggle to answer private questions by extracting personal information from user documents, highlighting the need for improved methodologies to enhance personalization capabilities in AI.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Granite Embedding Models
Authors:
Parul Awasthy,
Aashka Trivedi,
Yulong Li,
Mihaela Bornea,
David Cox,
Abraham Daniels,
Martin Franz,
Gabe Goodhart,
Bhavani Iyer,
Vishwajeet Kumar,
Luis Lastras,
Scott McCarley,
Rudra Murthy,
Vignesh P,
Sara Rosenthal,
Salim Roukos,
Jaydeep Sen,
Sukriti Sharma,
Avirup Sil,
Kate Soule,
Arafat Sultan,
Radu Florian
Abstract:
We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive…
▽ More
We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive evaluations show that the models, developed with techniques like retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging significantly outperform publicly available models of similar sizes on both internal IBM retrieval and search tasks, and have equivalent performance on widely used information retrieval benchmarks, while being trained on high-quality data suitable for enterprise use. We publicly release all our Granite Embedding models under the Apache 2.0 license, allowing both research and commercial use at https://huggingface.co/collections/ibm-granite.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Stroke classification using Virtual Hybrid Edge Detection from in silico electrical impedance tomography data
Authors:
Juan Pablo Agnelli,
Fernando S. Moura,
Siiri Rautio,
Melody Alsaker,
Rashmi Murthy,
Matti Lassas,
Samuli Siltanen
Abstract:
Electrical impedance tomography (EIT) is a non-invasive imaging method for recovering the internal conductivity of a physical body from electric boundary measurements. EIT combined with machine learning has shown promise for the classification of strokes. However, most previous works have used raw EIT voltage data as network inputs. We build upon a recent development which suggested the use of spe…
▽ More
Electrical impedance tomography (EIT) is a non-invasive imaging method for recovering the internal conductivity of a physical body from electric boundary measurements. EIT combined with machine learning has shown promise for the classification of strokes. However, most previous works have used raw EIT voltage data as network inputs. We build upon a recent development which suggested the use of special noise-robust Virtual Hybrid Edge Detection (VHED) functions as network inputs, although that work used only highly simplified and mathematically ideal models. In this work we strengthen the case for the use of EIT, and VHED functions especially, for stroke classification. We design models with high detail and mathematical realism to test the use of VHED functions as inputs. Virtual patients are created using a physically detailed 2D head model which includes features known to create challenges in real-world imaging scenarios. Conductivity values are drawn from statistically realistic distributions, and phantoms are afflicted with either hemorrhagic or ischemic strokes of various shapes and sizes. Simulated noisy EIT electrode data, generated using the realistic Complete Electrode Model (CEM) as opposed to the mathematically ideal continuum model, is processed to obtain VHED functions. We compare the use of VHED functions as inputs against the alternative paradigm of using raw EIT voltages. Our results show that (i) stroke classification can be performed with high accuracy using 2D EIT data from physically detailed and mathematically realistic models, and (ii) in the presence of noise, VHED functions outperform raw data as network inputs.
△ Less
Submitted 29 January, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
A Probably Approximately Correct Analysis of Group Testing Algorithms
Authors:
Sameera Bharadwaja H.,
Chandra R. Murthy
Abstract:
We consider the problem of identifying the defectives from a population of items via a non-adaptive group testing framework with a random pooling-matrix design. We analyze the sufficient number of tests needed for approximate set identification, i.e., for identifying almost all the defective and non-defective items with high confidence. To this end, we view the group testing problem as a function…
▽ More
We consider the problem of identifying the defectives from a population of items via a non-adaptive group testing framework with a random pooling-matrix design. We analyze the sufficient number of tests needed for approximate set identification, i.e., for identifying almost all the defective and non-defective items with high confidence. To this end, we view the group testing problem as a function learning problem and develop our analysis using the probably approximately correct (PAC) framework. Using this formulation, we derive sufficiency bounds on the number of tests for three popular binary group testing algorithms: column matching, combinatorial basis pursuit, and definite defectives. We compare the derived bounds with the existing ones in the literature for exact recovery theoretically and using simulations. Finally, we contrast the three group testing algorithms under consideration in terms of the sufficient testing rate surface and the sufficient number of tests contours across the range of the approximation and confidence levels.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs
Authors:
Shirley Kokane,
Ming Zhu,
Tulika Awalgaonkar,
Jianguo Zhang,
Thai Hoang,
Akshara Prabhakar,
Zuxin Liu,
Tian Lan,
Liangwei Yang,
Juntao Tan,
Rithesh Murthy,
Weiran Yao,
Zhiwei Liu,
Juan Carlos Niebles,
Huan Wang,
Shelby Heinecke,
Caiming Xiong,
Silivo Savarese
Abstract:
Evaluating the output of Large Language Models (LLMs) is one of the most critical aspects of building a performant compound AI system. Since the output from LLMs propagate to downstream steps, identifying LLM errors is crucial to system performance. A common task for LLMs in AI systems is tool use. While there are several benchmark environments for evaluating LLMs on this task, they typically only…
▽ More
Evaluating the output of Large Language Models (LLMs) is one of the most critical aspects of building a performant compound AI system. Since the output from LLMs propagate to downstream steps, identifying LLM errors is crucial to system performance. A common task for LLMs in AI systems is tool use. While there are several benchmark environments for evaluating LLMs on this task, they typically only give a success rate without any explanation of the failure cases. To solve this problem, we introduce SpecTool, a new benchmark to identify error patterns in LLM output on tool-use tasks. Our benchmark data set comprises of queries from diverse environments that can be used to test for the presence of seven newly characterized error patterns. Using SPECTOOL , we show that even the most prominent LLMs exhibit these error patterns in their outputs. Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
MILU: A Multi-task Indic Language Understanding Benchmark
Authors:
Sshubam Verma,
Mohammed Safi Ur Rahman Khan,
Vishwajeet Kumar,
Rudra Murthy,
Jaydeep Sen
Abstract:
Evaluating Large Language Models (LLMs) in low-resource and linguistically diverse languages remains a significant challenge in NLP, particularly for languages using non-Latin scripts like those spoken in India. Existing benchmarks predominantly focus on English, leaving substantial gaps in assessing LLM capabilities in these languages. We introduce MILU, a Multi task Indic Language Understanding…
▽ More
Evaluating Large Language Models (LLMs) in low-resource and linguistically diverse languages remains a significant challenge in NLP, particularly for languages using non-Latin scripts like those spoken in India. Existing benchmarks predominantly focus on English, leaving substantial gaps in assessing LLM capabilities in these languages. We introduce MILU, a Multi task Indic Language Understanding Benchmark, a comprehensive evaluation benchmark designed to address this gap. MILU spans 8 domains and 41 subjects across 11 Indic languages, reflecting both general and culturally specific knowledge. With an India-centric design, incorporates material from regional and state-level examinations, covering topics such as local history, arts, festivals, and laws, alongside standard subjects like science and mathematics. We evaluate over 42 LLMs, and find that current LLMs struggle with MILU, with GPT-4o achieving the highest average accuracy at 74 percent. Open multilingual models outperform language-specific fine-tuned models, which perform only slightly better than random baselines. Models also perform better in high resource languages as compared to low resource ones. Domain-wise analysis indicates that models perform poorly in culturally relevant areas like Arts and Humanities, Law and Governance compared to general fields like STEM. To the best of our knowledge, MILU is the first of its kind benchmark focused on Indic languages, serving as a crucial step towards comprehensive cultural evaluation. All code, benchmarks, and artifacts are publicly available to foster open research.
△ Less
Submitted 4 February, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Authors:
Zhiwei Liu,
Weiran Yao,
Jianguo Zhang,
Rithesh Murthy,
Liangwei Yang,
Zuxin Liu,
Tian Lan,
Ming Zhu,
Juntao Tan,
Shirley Kokane,
Thai Hoang,
Juan Carlos Niebles,
Shelby Heinecke,
Huan Wang,
Silvio Savarese,
Caiming Xiong
Abstract:
We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle…
▽ More
We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly. We develop the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, two RPO methods, RPO-Traj and RPO-Batch, is introduced to adapt to different settings. Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, effectively learns and applies action principles to enhance performance.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
KCIF: Knowledge-Conditioned Instruction Following
Authors:
Rudra Murthy,
Praveen Venkateswaran,
Prince Kumar,
Danish Contractor
Abstract:
LLM evaluation benchmarks have traditionally separated the testing of knowledge/reasoning capabilities from instruction following. In this work, we study the interaction between knowledge and instruction following, and observe that LLMs struggle to follow simple answer modifying instructions, and are also distracted by instructions that should have no bearing on the original knowledge task answer.…
▽ More
LLM evaluation benchmarks have traditionally separated the testing of knowledge/reasoning capabilities from instruction following. In this work, we study the interaction between knowledge and instruction following, and observe that LLMs struggle to follow simple answer modifying instructions, and are also distracted by instructions that should have no bearing on the original knowledge task answer. We leverage existing multiple-choice answer based knowledge benchmarks and apply a set of simple instructions which include manipulating text (eg.: change case), numeric quantities (eg.: increase value, change formatting), operate on lists (eg.: sort answer candidates) and distractor instructions (eg.: change case of numeric answers). We evaluate models at varying parameter sizes (1B-405B) from different model families and find that, surprisingly, all models report a significant drop in performance on such simple task compositions. While large-sized and frontier models report performance drops of 40-50%, in small and medium sized models the drop is severe (sometimes exceeding 80%). Our results highlight a limitation in the traditional separation of knowledge/reasoning and instruction following, and suggest that joint-study of these capabilities are important. We release our benchmark dataset, evaluation framework code, and results for future work.
△ Less
Submitted 23 May, 2025; v1 submitted 16 October, 2024;
originally announced October 2024.
-
Exploiting Beam-Split in IRS-aided Systems via OFDMA
Authors:
P. Siddhartha,
L. Yashvanth,
Chandra R. Murthy
Abstract:
In wideband systems operating at mmWave frequencies, intelligent reflecting surfaces (IRSs) equipped with many passive elements can compensate for channel propagation losses. Then, a phenomenon known as the beam-split (B-SP) occurs in which the phase shifters at the IRS elements fail to beamform at a desired user equipment (UE) over the total allotted bandwidth (BW). Although B-SP is usually seen…
▽ More
In wideband systems operating at mmWave frequencies, intelligent reflecting surfaces (IRSs) equipped with many passive elements can compensate for channel propagation losses. Then, a phenomenon known as the beam-split (B-SP) occurs in which the phase shifters at the IRS elements fail to beamform at a desired user equipment (UE) over the total allotted bandwidth (BW). Although B-SP is usually seen as an impairment, in this paper, we take an optimistic view and exploit the B-SP effect to enhance the system performance via an orthogonal frequency division multiple access (OFDMA). We argue that due to the B-SP, when an IRS is tuned to beamform at a particular angle on one frequency, it also forms beams in different directions on other frequencies. Then, by opportunistically scheduling different UEs on different subcarriers (SCs), we show that, almost surely, the optimal array gain that scales quadratically in the number of IRS elements can be achieved on all SCs in the system. We derive the achievable throughput of the proposed scheme and deduce that the system also enjoys additional multi-user diversity benefits on top of the optimal beamforming gain over the full BW. Finally, we verify our findings via numerical simulations.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5
Authors:
Arkadeep Acharya,
Rudra Murthy,
Vishwajeet Kumar,
Jaydeep Sen
Abstract:
Given the large number of Hindi speakers worldwide, there is a pressing need for robust and efficient information retrieval systems for Hindi. Despite ongoing research, comprehensive benchmarks for evaluating retrieval models in Hindi are lacking. To address this gap, we introduce the Hindi-BEIR benchmark, comprising 15 datasets across seven distinct tasks. We evaluate state-of-the-art multilingua…
▽ More
Given the large number of Hindi speakers worldwide, there is a pressing need for robust and efficient information retrieval systems for Hindi. Despite ongoing research, comprehensive benchmarks for evaluating retrieval models in Hindi are lacking. To address this gap, we introduce the Hindi-BEIR benchmark, comprising 15 datasets across seven distinct tasks. We evaluate state-of-the-art multilingual retrieval models on the Hindi-BEIR benchmark, identifying task and domain-specific challenges that impact Hindi retrieval performance. Building on the insights from these results, we introduce NLLB-E5, a multilingual retrieval model that leverages a zero-shot approach to support Hindi without the need for Hindi training data. We believe our contributions, which include the release of the Hindi-BEIR benchmark and the NLLB-E5 model, will prove to be a valuable resource for researchers and promote advancements in multilingual retrieval models.
△ Less
Submitted 21 June, 2025; v1 submitted 9 September, 2024;
originally announced September 2024.
-
xLAM: A Family of Large Action Models to Empower AI Agent Systems
Authors:
Jianguo Zhang,
Tian Lan,
Ming Zhu,
Zuxin Liu,
Thai Hoang,
Shirley Kokane,
Weiran Yao,
Juntao Tan,
Akshara Prabhakar,
Haolin Chen,
Zhiwei Liu,
Yihao Feng,
Tulika Awalgaonkar,
Rithesh Murthy,
Eric Hu,
Zeyuan Chen,
Ran Xu,
Juan Carlos Niebles,
Shelby Heinecke,
Huan Wang,
Silvio Savarese,
Caiming Xiong
Abstract:
Autonomous agents powered by large language models (LLMs) have attracted significant research interest. However, the open-source community faces many challenges in developing specialized models for agent tasks, driven by the scarcity of high-quality agent datasets and the absence of standard protocols in this area. We introduce and publicly release xLAM, a series of large action models designed fo…
▽ More
Autonomous agents powered by large language models (LLMs) have attracted significant research interest. However, the open-source community faces many challenges in developing specialized models for agent tasks, driven by the scarcity of high-quality agent datasets and the absence of standard protocols in this area. We introduce and publicly release xLAM, a series of large action models designed for AI agent tasks. The xLAM series includes five models with both dense and mixture-of-expert architectures, ranging from 1B to 8x22B parameters, trained using a scalable, flexible pipeline that unifies, augments, and synthesizes diverse datasets to enhance AI agents' generalizability and performance across varied environments. Our experimental results demonstrate that xLAM consistently delivers exceptional performance across multiple agent ability benchmarks, notably securing the 1st position on the Berkeley Function-Calling Leaderboard, outperforming GPT-4, Claude-3, and many other models in terms of tool use. By releasing the xLAM series, we aim to advance the performance of open-source LLMs for autonomous AI agents, potentially accelerating progress and democratizing access to high-performance models for agent tasks. Models are available at https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
CT scans without X-rays: parallel-beam imaging from nonlinear current flows
Authors:
Melody Alsaker,
Siiri Rautio,
Fernando Moura,
Juan Pablo Agnelli,
Rashmi Murthy,
Matti Lassas,
Jennifer L. Mueller,
Samuli Siltanen
Abstract:
Parallel-beam X-ray computed tomography (CT) and electrical impedance tomography (EIT) are two imaging modalities which stem from completely different underlying physics, and for decades have been thought to have little in common either practically or mathematically. CT is only mildly ill-posed and uses straight X-rays as measurement energy, which admits simple linear mathematics. However, CT reli…
▽ More
Parallel-beam X-ray computed tomography (CT) and electrical impedance tomography (EIT) are two imaging modalities which stem from completely different underlying physics, and for decades have been thought to have little in common either practically or mathematically. CT is only mildly ill-posed and uses straight X-rays as measurement energy, which admits simple linear mathematics. However, CT relies on exposing targets to ionizing radiation and requires cumbersome setups with expensive equipment. In contrast, EIT uses harmless electrical currents as measurement energy and can be implemented using simple low-cost portable setups. But EIT is burdened by nonlinearity stemming from the curved paths of electrical currents, as well as extreme ill-posedness which causes characteristic low spatial resolution. In practical EIT reconstruction methods, nonlinearity and ill-posedness have been considered intertwined in a complicated fashion. In this work we demonstrate a surprising connection between CT and EIT which partly unravels the main problems of EIT and leads directly to a proposed imaging modality which we call virtual hybrid parallel-beam tomography (VHPT). We show that hidden deep within EIT data is information which possesses the same linear geometry as parallel-beam CT data. This admits a fundamental restructuring of EIT, separating ill-posedness and nonlinearity into simple modular sub-problems, and yields ''virtual radiographs'' and CT-like images which reveal previously concealed information. Furthermore, as proof of concept we present VHPT images of real-world objects.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Mistral-SPLADE: LLMs for better Learned Sparse Retrieval
Authors:
Meet Doshi,
Vishwajeet Kumar,
Rudra Murthy,
Vignesh P,
Jaydeep Sen
Abstract:
Learned Sparse Retrievers (LSR) have evolved into an effective retrieval strategy that can bridge the gap between traditional keyword-based sparse retrievers and embedding-based dense retrievers. At its core, learned sparse retrievers try to learn the most important semantic keyword expansions from a query and/or document which can facilitate better retrieval with overlapping keyword expansions. L…
▽ More
Learned Sparse Retrievers (LSR) have evolved into an effective retrieval strategy that can bridge the gap between traditional keyword-based sparse retrievers and embedding-based dense retrievers. At its core, learned sparse retrievers try to learn the most important semantic keyword expansions from a query and/or document which can facilitate better retrieval with overlapping keyword expansions. LSR like SPLADE has typically been using encoder only models with MLM (masked language modeling) style objective in conjunction with known ways of retrieval performance improvement such as hard negative mining, distillation, etc. In this work, we propose to use decoder-only model for learning semantic keyword expansion. We posit, decoder only models that have seen much higher magnitudes of data are better equipped to learn keyword expansions needed for improved retrieval. We use Mistral as the backbone to develop our Learned Sparse Retriever similar to SPLADE and train it on a subset of sentence-transformer data which is often used for training text embedding models. Our experiments support the hypothesis that a sparse retrieval model based on decoder only large language model (LLM) surpasses the performance of existing LSR systems, including SPLADE and all its variants. The LLM based model (Echo-Mistral-SPLADE) now stands as a state-of-the-art learned sparse retrieval model on the BEIR text retrieval benchmark.
△ Less
Submitted 21 August, 2024; v1 submitted 20 August, 2024;
originally announced August 2024.
-
Hindi-BEIR : A Large Scale Retrieval Benchmark in Hindi
Authors:
Arkadeep Acharya,
Rudra Murthy,
Vishwajeet Kumar,
Jaydeep Sen
Abstract:
Given the large number of Hindi speakers worldwide, there is a pressing need for robust and efficient information retrieval systems for Hindi. Despite ongoing research, there is a lack of comprehensive benchmark for evaluating retrieval models in Hindi. To address this gap, we introduce the Hindi version of the BEIR benchmark, which includes a subset of English BEIR datasets translated to Hindi, e…
▽ More
Given the large number of Hindi speakers worldwide, there is a pressing need for robust and efficient information retrieval systems for Hindi. Despite ongoing research, there is a lack of comprehensive benchmark for evaluating retrieval models in Hindi. To address this gap, we introduce the Hindi version of the BEIR benchmark, which includes a subset of English BEIR datasets translated to Hindi, existing Hindi retrieval datasets, and synthetically created datasets for retrieval. The benchmark is comprised of $15$ datasets spanning across $8$ distinct tasks. We evaluate state-of-the-art multilingual retrieval models on this benchmark to identify task and domain-specific challenges and their impact on retrieval performance. By releasing this benchmark and a set of relevant baselines, we enable researchers to understand the limitations and capabilities of current Hindi retrieval models, promoting advancements in this critical area. The datasets from Hindi-BEIR are publicly available.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Authors:
Kexun Zhang,
Weiran Yao,
Zuxin Liu,
Yihao Feng,
Zhiwei Liu,
Rithesh Murthy,
Tian Lan,
Lei Li,
Renze Lou,
Jiacheng Xu,
Bo Pang,
Yingbo Zhou,
Shelby Heinecke,
Silvio Savarese,
Huan Wang,
Caiming Xiong
Abstract:
Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agent…
▽ More
Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agents, we propose DEI (Diversity Empowered Intelligence), a framework that leverages their unique expertise. DEI functions as a meta-module atop existing SWE agent frameworks, managing agent collectives for enhanced problem-solving. Experimental results show that a DEI-guided committee of agents is able to surpass the best individual agent's performance by a large margin. For instance, a group of open-source SWE agents, with a maximum individual resolve rate of 27.3% on SWE-Bench Lite, can achieve a 34.3% resolve rate with DEI, making a 25% improvement and beating most closed-source solutions. Our best-performing group excels with a 55% resolve rate, securing the highest ranking on SWE-Bench Lite. Our findings contribute to the growing body of research on collaborative AI systems and their potential to solve complex software engineering challenges.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities
Authors:
Ali Riza Durmaz,
Akhil Thomas,
Lokesh Mishra,
Rachana Niranjan Murthy,
Thomas Straub
Abstract:
While large language models learn sound statistical representations of the language and information therein, ontologies are symbolic knowledge representations that can complement the former ideally. Research at this critical intersection relies on datasets that intertwine ontologies and text corpora to enable training and comprehensive benchmarking of neurosymbolic models. We present the MaterioMi…
▽ More
While large language models learn sound statistical representations of the language and information therein, ontologies are symbolic knowledge representations that can complement the former ideally. Research at this critical intersection relies on datasets that intertwine ontologies and text corpora to enable training and comprehensive benchmarking of neurosymbolic models. We present the MaterioMiner dataset and the linked materials mechanics ontology where ontological concepts from the mechanics of materials domain are associated with textual entities within the literature corpus. Another distinctive feature of the dataset is its eminently fine-granular annotation. Specifically, 179 distinct classes are manually annotated by three raters within four publications, amounting to a total of 2191 entities that were annotated and curated. Conceptual work is presented for the symbolic representation of causal composition-process-microstructure-property relationships. We explore the annotation consistency between the three raters and perform fine-tuning of pre-trained models to showcase the feasibility of named-entity recognition model training. Reusing the dataset can foster training and benchmarking of materials language models, automated ontology construction, and knowledge graph generation from textual data.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
The Llama 3 Herd of Models
Authors:
Aaron Grattafiori,
Abhimanyu Dubey,
Abhinav Jauhri,
Abhinav Pandey,
Abhishek Kadian,
Ahmad Al-Dahle,
Aiesha Letman,
Akhil Mathur,
Alan Schelten,
Alex Vaughan,
Amy Yang,
Angela Fan,
Anirudh Goyal,
Anthony Hartshorn,
Aobo Yang,
Archi Mitra,
Archie Sravankumar,
Artem Korenev,
Arthur Hinsvark,
Arun Rao,
Aston Zhang,
Aurelien Rodriguez,
Austen Gregerson,
Ava Spataru,
Baptiste Roziere
, et al. (536 additional authors not shown)
Abstract:
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical…
▽ More
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
△ Less
Submitted 23 November, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Personalized Multi-task Training for Recommender System
Authors:
Liangwei Yang,
Zhiwei Liu,
Jianguo Zhang,
Rithesh Murthy,
Shelby Heinecke,
Huan Wang,
Caiming Xiong,
Philip S. Yu
Abstract:
In the vast landscape of internet information, recommender systems (RecSys) have become essential for guiding users through a sea of choices aligned with their preferences. These systems have applications in diverse domains, such as news feeds, game suggestions, and shopping recommendations. Personalization is a key technique in RecSys, where modern methods leverage representation learning to enco…
▽ More
In the vast landscape of internet information, recommender systems (RecSys) have become essential for guiding users through a sea of choices aligned with their preferences. These systems have applications in diverse domains, such as news feeds, game suggestions, and shopping recommendations. Personalization is a key technique in RecSys, where modern methods leverage representation learning to encode user/item interactions into embeddings, forming the foundation for personalized recommendations. However, integrating information from multiple sources to enhance recommendation performance remains challenging. This paper introduces a novel approach named PMTRec, the first personalized multi-task learning algorithm to obtain comprehensive user/item embeddings from various information sources. Addressing challenges specific to personalized RecSys, we develop modules to handle personalized task weights, diverse task orientations, and variations in gradient magnitudes across tasks. PMTRec dynamically adjusts task weights based on gradient norms for each user/item, employs a Task Focusing module to align gradient combinations with the main recommendation task, and uses a Gradient Magnitude Balancing module to ensure balanced training across tasks. Through extensive experiments on three real-world datasets with different scales, we demonstrate that PMTRec significantly outperforms existing multi-task learning methods, showcasing its effectiveness in achieving enhanced recommendation accuracy by leveraging multiple tasks simultaneously. Our contributions open new avenues for advancing personalized multi-task training in recommender systems.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Evaporation limited spreading of ethanol on rectangular porous strips: an experimental and theoretical investigation
Authors:
Rampally Srirama Chandra Murthy,
Navneet Kumar
Abstract:
Wicking is a widely studied process in both natural and artificial systems. In many industrial applications, such as heat pipes, the wicking liquid evaporates to regulate temperature effectively. This study focuses on a simpler scenario where liquid ethanol climbs a vertically oriented filter paper FP under laboratory conditions, facilitating mass loss through evaporation and inducing cooling. Thr…
▽ More
Wicking is a widely studied process in both natural and artificial systems. In many industrial applications, such as heat pipes, the wicking liquid evaporates to regulate temperature effectively. This study focuses on a simpler scenario where liquid ethanol climbs a vertically oriented filter paper FP under laboratory conditions, facilitating mass loss through evaporation and inducing cooling. Three filter papers with different permeability values were used, and three diagnostic methods optical imaging, thermal imaging, and precision weighing were employed to understand the dynamics of the process. The results showed a steady state height Lc significantly lower than Jurins limit in all cases, indicating that evaporative mass loss, and not gravity, limits the process. For instance, the filter paper 1005FP, with a capillary radius of 59microm and an average pore size of 2.50microm, would reach a Jurins height of 9.6cm with ethanol if evaporation were not allowed. However, when evaporation occurred, the height reduced to 1.2cm, an eightfold decrease, a similar reduction by a factor of 3 was observed for 1004FP. Further, thermal imaging revealed a non constant temperature distribution along the filter paper, with an unusual temperature inversion near the middle of the wicking liquid. This observation led to an improvement of the Constant Evaporation Model CEM by Fries et al 2008 by accounting for the nonlinear behavior of evaporation rates varying with vertical position. This new model termed the Non-Constant Evaporation Model NCEM, tested two power-law relations for evaporation rates , both of which successfully captured the key features of the process.
△ Less
Submitted 18 February, 2025; v1 submitted 30 July, 2024;
originally announced July 2024.
-
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages
Authors:
Abhishek Kumar Singh,
Vishwajeet kumar,
Rudra Murthy,
Jaydeep Sen,
Ashish Mittal,
Ganesh Ramakrishnan
Abstract:
Large Language Models (LLMs) perform well on unseen tasks in English, but their abilities in non English languages are less explored due to limited benchmarks and training data. To bridge this gap, we introduce the Indic QA Benchmark, a large dataset for context grounded question answering in 11 major Indian languages, covering both extractive and abstractive tasks. Evaluations of multilingual LLM…
▽ More
Large Language Models (LLMs) perform well on unseen tasks in English, but their abilities in non English languages are less explored due to limited benchmarks and training data. To bridge this gap, we introduce the Indic QA Benchmark, a large dataset for context grounded question answering in 11 major Indian languages, covering both extractive and abstractive tasks. Evaluations of multilingual LLMs, including instruction finetuned versions, revealed weak performance in low resource languages due to a strong English language bias in their training data. We also investigated the Translate Test paradigm,where inputs are translated to English for processing and the results are translated back into the source language for output. This approach outperformed multilingual LLMs, particularly in low resource settings. By releasing Indic QA, we aim to promote further research into LLMs question answering capabilities in low resource languages. This benchmark offers a critical resource to address existing limitations and foster multilingual understanding.
△ Less
Submitted 24 February, 2025; v1 submitted 18 July, 2024;
originally announced July 2024.
-
Sparse Actuator Scheduling for Discrete-Time Linear Dynamical Systems
Authors:
Krishna Praveen V. S. Kondapi,
Chandrasekhar Sriram,
Geethu Joseph,
Chandra R. Murthy
Abstract:
We consider the control of discrete-time linear dynamical systems using sparse inputs where we limit the number of active actuators at every time step. We develop an algorithm for determining a sparse actuator schedule that ensures the existence of a sparse control input sequence, following the schedule, that takes the system from any given initial state to any desired final state. Since such an a…
▽ More
We consider the control of discrete-time linear dynamical systems using sparse inputs where we limit the number of active actuators at every time step. We develop an algorithm for determining a sparse actuator schedule that ensures the existence of a sparse control input sequence, following the schedule, that takes the system from any given initial state to any desired final state. Since such an actuator schedule is not unique, we look for a schedule that minimizes the energy of sparse inputs. For this, we optimize the trace of the inverse of the resulting controllability Gramian, which is an approximate measure of the average energy of the inputs. We present a greedy algorithm along with its theoretical guarantees. Finally, we empirically show that our greedy algorithm ensures the controllability of the linear system with a small number of active actuators per time step without a significant average energy expenditure compared to the fully actuated system.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
Authors:
Zuxin Liu,
Thai Hoang,
Jianguo Zhang,
Ming Zhu,
Tian Lan,
Shirley Kokane,
Juntao Tan,
Weiran Yao,
Zhiwei Liu,
Yihao Feng,
Rithesh Murthy,
Liangwei Yang,
Silvio Savarese,
Juan Carlos Niebles,
Huan Wang,
Shelby Heinecke,
Caiming Xiong
Abstract:
The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal…
▽ More
The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Authors:
Rithesh Murthy,
Liangwei Yang,
Juntao Tan,
Tulika Manoj Awalgaonkar,
Yilun Zhou,
Shelby Heinecke,
Sachin Desai,
Jason Wu,
Ran Xu,
Sarah Tan,
Jianguo Zhang,
Zhiwei Liu,
Shirley Kokane,
Zuxin Liu,
Ming Zhu,
Huan Wang,
Caiming Xiong,
Silvio Savarese
Abstract:
The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand…
▽ More
The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Authors:
Jianguo Zhang,
Tian Lan,
Rithesh Murthy,
Zhiwei Liu,
Weiran Yao,
Ming Zhu,
Juntao Tan,
Thai Hoang,
Zuxin Liu,
Liangwei Yang,
Yihao Feng,
Shirley Kokane,
Tulika Awalgaonkar,
Juan Carlos Niebles,
Silvio Savarese,
Shelby Heinecke,
Huan Wang,
Caiming Xiong
Abstract:
Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \…
▽ More
Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \textit{AgentOhana} aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. It meticulously standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training. Leveraging the data unification, our training pipeline maintains equilibrium across different data sources and preserves independent randomness across devices during dataset partitioning and model training. Additionally, we present \textbf{xLAM-v0.1}, a large action model tailored for AI agents, which demonstrates exceptional performance across various benchmarks. Begin the exploration at \url{https://github.com/SalesforceAIResearch/xLAM}.
△ Less
Submitted 8 November, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
Airavata: Introducing Hindi Instruction-tuned LLM
Authors:
Jay Gala,
Thanmay Jayakumar,
Jaavid Aktar Husain,
Aswanth Kumar M,
Mohammed Safi Ur Rahman Khan,
Diptesh Kanojia,
Ratish Puduppully,
Mitesh M. Khapra,
Raj Dabre,
Rudra Murthy,
Anoop Kunchukuttan
Abstract:
We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additional…
▽ More
We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additionally, we present evaluation benchmarks and a framework for assessing LLM performance across tasks in Hindi. Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages. You can access all artifacts at https://ai4bharat.github.io/airavata.
△ Less
Submitted 26 February, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Distributed IRSs Always Benefit Every Mobile Operator
Authors:
L. Yashvanth,
Chandra R. Murthy
Abstract:
We investigate the impact of multiple distributed intelligent reflecting surfaces (IRSs), which are deployed and optimized by a mobile operator (MO), on the performance of user equipments (UEs) served by other co-existing out-of-band (OOB) MOs that do not control the IRSs. We show that, under round-robin scheduling, in mmWave frequencies, the ergodic sum spectral efficiency (SE) of an OOB MO incre…
▽ More
We investigate the impact of multiple distributed intelligent reflecting surfaces (IRSs), which are deployed and optimized by a mobile operator (MO), on the performance of user equipments (UEs) served by other co-existing out-of-band (OOB) MOs that do not control the IRSs. We show that, under round-robin scheduling, in mmWave frequencies, the ergodic sum spectral efficiency (SE) of an OOB MO increases logarithmically in the total number of IRS elements with a pre-log factor that increases with the ratio of the number of OOB paths through the IRS to the number of elements at an IRS. We further show that the maximum achievable SE of the OOB MO scales log-linearly with the total IRS elements, with a pre-log factor of $1$. Then, we specify the minimum number of IRSs as a function of the channel parameters and design a distributed IRS system in which an OOB MO almost surely obtains the maximum SE. Finally, we prove that the outage probability at an OOB UE decreases exponentially as the number of IRSs increases, even though they are randomly configured from the OOB UE's viewpoint. We numerically verify our theory and conclude that distributed IRSs always help every MO, but the MO controlling the IRSs benefits the most.
△ Less
Submitted 13 July, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities
Authors:
Settaluri Lakshmi Sravanthi,
Meet Doshi,
Tankala Pavan Kalyan,
Rudra Murthy,
Pushpak Bhattacharyya,
Raj Dabre
Abstract:
LLMs have demonstrated remarkable capability for understanding semantics, but they often struggle with understanding pragmatics. To demonstrate this fact, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely, Implicature, Presupposition, Reference, and Deixis. We curated high-quality test sets for each task, consisting of M…
▽ More
LLMs have demonstrated remarkable capability for understanding semantics, but they often struggle with understanding pragmatics. To demonstrate this fact, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely, Implicature, Presupposition, Reference, and Deixis. We curated high-quality test sets for each task, consisting of Multiple Choice Question Answers (MCQA). PUB includes a total of 28k data points, 6.1k of which have been created by us, and the rest are adapted from existing datasets. We evaluated nine models varying in the number of parameters and type of training. Our study indicates that fine-tuning for instruction-following and chat significantly enhances the pragmatics capabilities of smaller language models. However, for larger models, the base versions perform comparably with their chat-adapted counterparts. Additionally, there is a noticeable performance gap between human capabilities and model capabilities. Furthermore, unlike the consistent performance of humans across various tasks, the models demonstrate variability in their proficiency, with performance levels fluctuating due to different hints and the complexities of tasks within the same dataset. Overall, the benchmark aims to provide a comprehensive evaluation of LLM's ability to handle real-world language tasks that require pragmatic reasoning.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Thermal transport measurements of the charge density wave transition in CsV$_3$Sb$_5$
Authors:
Erik D. Kountz,
Chaitanya R. Murthy,
Dong Chen,
Linda Ye,
Mark Zic,
Claudia Felser,
Ian R. Fisher,
Steven A. Kivelson,
Aharon Kapitulnik
Abstract:
We study thermalization and thermal transport in single crystals of CsV$_3$Sb$_5$ through the CDW transition by directly measuring thermal diffusivity ($D$), thermal conductivity ($κ$), resistivity ($ρ$), and specific heat ($c$). Commensurate with previous reports, we observe a sharp, narrow anomaly in specific heat associated with a first order transition that results in a CDW state below…
▽ More
We study thermalization and thermal transport in single crystals of CsV$_3$Sb$_5$ through the CDW transition by directly measuring thermal diffusivity ($D$), thermal conductivity ($κ$), resistivity ($ρ$), and specific heat ($c$). Commensurate with previous reports, we observe a sharp, narrow anomaly in specific heat associated with a first order transition that results in a CDW state below $\sim94$ K. While a corresponding sharp anomaly in thermal diffusivity is also observed, resistivity and thermal conductivity only exhibit small steps at the transition, where the feature is sharp for resistivity and broader for thermal conductivity. Scrutinizing the thermal Einstein relation $κ=cD$, we find that this relation is satisfied in the entire temperature range, except in a narrow range around the transition. The Wiedemann-Franz law seems to work outside the critical region as well. Below the transition and persisting below the two-phase regime we find strong resemblance between the resistivity anomaly and the specific heat, which may point to a secondary electronic order parameter that emerges continuously below the transition.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Joint State and Sparse Input Estimation in Linear Dynamical Systems
Authors:
Rupam Kalyan Chakraborty,
Geethu Joseph,
Chandra R. Murthy
Abstract:
Sparsity constraints on the control inputs of a linear dynamical system naturally arise in several practical applications such as networked control, computer vision, seismic signal processing, and cyber-physical systems. In this work, we consider the problem of jointly estimating the states and sparse inputs of such systems from low-dimensional (compressive) measurements. Due to the low-dimensiona…
▽ More
Sparsity constraints on the control inputs of a linear dynamical system naturally arise in several practical applications such as networked control, computer vision, seismic signal processing, and cyber-physical systems. In this work, we consider the problem of jointly estimating the states and sparse inputs of such systems from low-dimensional (compressive) measurements. Due to the low-dimensional measurements, conventional Kalman filtering and smoothing algorithms fail to accurately estimate the states and inputs. We present a Bayesian approach that exploits the input sparsity to significantly improve estimation accuracy. Sparsity in the input estimates is promoted by using different prior distributions on the input. We investigate two main approaches: regularizer-based MAP, and {Bayesian learning-based estimation}. We also extend the approaches to handle control inputs with common support and analyze the time and memory complexities of the presented algorithms. Finally, using numerical simulations, we show that our algorithms outperform the state-of-the-art methods in terms of accuracy and time/memory complexities, especially in the low-dimensional measurement regime.
△ Less
Submitted 10 September, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Tradeoff of age-of-information and power under reliability constraint for short-packet communication with block-length adaptation
Authors:
Sudarsanan A. K.,
Vineeth B. S.,
Chandra R. Murthy
Abstract:
In applications such as remote estimation and monitoring, update packets are transmitted by power-constrained devices using short-packet codes over wireless networks. Therefore, networks need to be end-to-end optimized using information freshness metrics such as age of information under transmit power and reliability constraints to ensure support for such applications. For short-packet coding, mod…
▽ More
In applications such as remote estimation and monitoring, update packets are transmitted by power-constrained devices using short-packet codes over wireless networks. Therefore, networks need to be end-to-end optimized using information freshness metrics such as age of information under transmit power and reliability constraints to ensure support for such applications. For short-packet coding, modelling and understanding the effect of block codeword length on transmit power and other performance metrics is important. To understand the above optimization for short-packet coding, we consider the optimal tradeoff problem between age of information and transmit power under reliability constraints for short packet point-to-point communication model with an exogenous packet generation process. In contrast to prior work, we consider scheduling policies that can possibly adapt the block-length or transmission time of short packet codes in order to achieve the optimal tradeoff. We characterize the tradeoff using a semi-Markov decision process formulation. We also obtain analytical upper bounds as well as numerical, analytical, and asymptotic lower bounds on the optimal tradeoff. We show that in certain regimes, such as high reliability and high packet generation rate, non-adaptive scheduling policies (fixed transmission time policies) are close-to-optimal. Furthermore, in a high-power or in a low-power regime, non-adaptive as well as state-independent randomized scheduling policies are order-optimal. These results are corroborated by numerical and simulation experiments. The tradeoff is then characterized for a wireless point-to-point channel with block fading as well as for other packet generation models (including an age-dependent packet generation model).
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Half-Duplex APs with Dynamic TDD vs. Full-Duplex APs in Cell-Free Systems
Authors:
Anubhab Chowdhury,
Chandra R. Murthy
Abstract:
In this paper, we present a comparative study of half-duplex (HD) access points (APs) with dynamic time-division duplex (DTDD) and full-duplex (FD) APs in cell-free (CF) systems. Although both DTDD and FD CF systems support concurrent downlink (DL) transmission and uplink (UL) reception capability, the sum spectral efficiency (SE) is limited by various cross-link interferences. We first present a…
▽ More
In this paper, we present a comparative study of half-duplex (HD) access points (APs) with dynamic time-division duplex (DTDD) and full-duplex (FD) APs in cell-free (CF) systems. Although both DTDD and FD CF systems support concurrent downlink (DL) transmission and uplink (UL) reception capability, the sum spectral efficiency (SE) is limited by various cross-link interferences. We first present a novel pilot allocation scheme that minimizes the pilot length required to ensure no pilot contamination among the user equipments (UEs) served by at least one common AP. Then, we derive the sum SE in closed form, considering zero-forcing combining and precoding along with the signal-to-interference plus noise ratio optimal weighting at the central processing unit. We also present a provably convergent algorithm for joint UL-DL power allocation and UL/DL mode scheduling of the APs (for DTDD) to maximize the sum SE. Further, the proposed algorithms are precoder and combiner agnostic and come with closed-form update equations for the UL and DL power control coefficients. Our numerical results illustrate the superiority of the proposed pilot allocation and power control algorithms over several benchmark schemes and show that the sum SE with DTDD can outperform an FD CF system with similar antenna density. Thus, DTDD combined with CF is a promising alternative to FD that attains the same performance using HD APs, while obviating the burden of intra-AP interference cancellation.
△ Less
Submitted 31 January, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
On the Impact of an IRS on the Out-of-Band Performance in Sub-6 GHz & mmWave Frequencies
Authors:
L. Yashvanth,
Chandra R. Murthy
Abstract:
Intelligent reflecting surfaces (IRSs) were introduced to enhance the performance of wireless communication systems. However, from a service provider's viewpoint, a concern with the use of an IRS is its effect on out-of-band (OOB) quality of service. Specifically, if two operators, say X and Y, provide services in a given geographical area using non-overlapping frequency bands, and if operator X u…
▽ More
Intelligent reflecting surfaces (IRSs) were introduced to enhance the performance of wireless communication systems. However, from a service provider's viewpoint, a concern with the use of an IRS is its effect on out-of-band (OOB) quality of service. Specifically, if two operators, say X and Y, provide services in a given geographical area using non-overlapping frequency bands, and if operator X uses an IRS to enhance the spectral efficiency (SE) of its users (UEs), does it degrade the performance of UEs served by operator Y? We answer this by analyzing the average and instantaneous performances of the OOB operator considering both sub-6 GHz and mmWave bands. Specifically, we derive the ergodic sum SE achieved by the operators under round-robin scheduling. We also derive the outage probability and analyze the change in the SNR caused by the IRS at an OOB UE using stochastic dominance theory. Surprisingly, even though the IRS is randomly configured from operator Y's point of view, the OOB operator still benefits from the presence of the IRS, witnessing a performance enhancement for free in both sub-6 GHz and mmWave bands. This is because the IRS introduces additional paths between the transmitter and receiver, increasing the overall signal power arriving at the UE and providing diversity benefits. Finally, we show that the use of opportunistic scheduling schemes can further enhance the benefit of the uncontrolled IRS at OOB UEs. We numerically illustrate our findings and conclude that an IRS is always beneficial to every operator, even when the IRS is deployed & controlled by only one operator.
△ Less
Submitted 10 June, 2024; v1 submitted 27 August, 2023;
originally announced August 2023.
-
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
Authors:
Zhiwei Liu,
Weiran Yao,
Jianguo Zhang,
Le Xue,
Shelby Heinecke,
Rithesh Murthy,
Yihao Feng,
Zeyuan Chen,
Juan Carlos Niebles,
Devansh Arpit,
Ran Xu,
Phil Mui,
Huan Wang,
Caiming Xiong,
Silvio Savarese
Abstract:
The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limi…
▽ More
The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limited explorations are available. Therefore, we provide a comprehensive comparison of LAA in terms of both agent architectures and LLM backbones. Additionally, we propose a new strategy to orchestrate multiple LAAs such that each labor LAA focuses on one type of action, \textit{i.e.} BOLAA, where a controller manages the communication among multiple agents. We conduct simulations on both decision-making and multi-step reasoning environments, which comprehensively justify the capacity of LAAs. Our performance results provide quantitative suggestions for designing LAA architectures and the optimal choice of LLMs, as well as the compatibility of both. We release our implementation code of LAAs to the public at \url{https://github.com/salesforce/BOLAA}.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Authors:
Weiran Yao,
Shelby Heinecke,
Juan Carlos Niebles,
Zhiwei Liu,
Yihao Feng,
Le Xue,
Rithesh Murthy,
Zeyuan Chen,
Jianguo Zhang,
Devansh Arpit,
Ran Xu,
Phil Mui,
Huan Wang,
Caiming Xiong,
Silvio Savarese
Abstract:
Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely responding to queries from human users. Most existing language agents, however, are not optimized using environment-specific rewards. Although some agents ena…
▽ More
Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely responding to queries from human users. Most existing language agents, however, are not optimized using environment-specific rewards. Although some agents enable iterative refinement through verbal feedback, they do not reason and plan in ways that are compatible with gradient-based learning from rewards. This paper introduces a principled framework for reinforcing large language agents by learning a retrospective model, which automatically tunes the language agent prompts from environment feedback through policy gradient. Specifically, our proposed agent architecture learns from rewards across multiple environments and tasks, for fine-tuning a pre-trained language model which refines the language agent prompt by summarizing the root cause of prior failed attempts and proposing action plans. Experimental results on various tasks demonstrate that the language agents improve over time and that our approach considerably outperforms baselines that do not properly leverage gradients from the environment. This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.
△ Less
Submitted 5 May, 2024; v1 submitted 4 August, 2023;
originally announced August 2023.
-
REX: Rapid Exploration and eXploitation for AI Agents
Authors:
Rithesh Murthy,
Shelby Heinecke,
Juan Carlos Niebles,
Zhiwei Liu,
Le Xue,
Weiran Yao,
Yihao Feng,
Zeyuan Chen,
Akash Gokul,
Devansh Arpit,
Ran Xu,
Phil Mui,
Huan Wang,
Caiming Xiong,
Silvio Savarese
Abstract:
In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer…
▽ More
In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios.
△ Less
Submitted 26 January, 2024; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Prompting with Pseudo-Code Instructions
Authors:
Mayank Mishra,
Prince Kumar,
Riyaz Bhat,
Rudra Murthy V,
Danish Contractor,
Srikanth Tamilselvam
Abstract:
Prompting with natural language instructions has recently emerged as a popular method of harnessing the capabilities of large language models. Given the inherent ambiguity present in natural language, it is intuitive to consider the possible advantages of prompting with less ambiguous prompt styles, such as the use of pseudo-code.
In this paper we explore if prompting via pseudo-code instruction…
▽ More
Prompting with natural language instructions has recently emerged as a popular method of harnessing the capabilities of large language models. Given the inherent ambiguity present in natural language, it is intuitive to consider the possible advantages of prompting with less ambiguous prompt styles, such as the use of pseudo-code.
In this paper we explore if prompting via pseudo-code instructions helps improve the performance of pre-trained language models. We manually create a dataset of pseudo-code prompts for 132 different tasks spanning classification, QA and generative language tasks, sourced from the Super-NaturalInstructions dataset. Using these prompts along with their counterparts in natural language, we study their performance on two LLM families - BLOOM and CodeGen. Our experiments show that using pseudo-code instructions leads to better results, with an average increase (absolute) of 7-16 points in F1 scores for classification tasks and an improvement (relative) of 12-38% in aggregate ROUGE-L scores across all tasks. We include detailed ablation studies which indicate that code comments, docstrings, and the structural clues encoded in pseudo-code all contribute towards the improvement in performance.
To the best of our knowledge our work is the first to demonstrate how pseudo-code prompts can be helpful in improving the performance of pre-trained LMs.
△ Less
Submitted 19 October, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
StarCoder: may the source be with you!
Authors:
Raymond Li,
Loubna Ben Allal,
Yangtian Zi,
Niklas Muennighoff,
Denis Kocetkov,
Chenghao Mou,
Marc Marone,
Christopher Akiki,
Jia Li,
Jenny Chim,
Qian Liu,
Evgenii Zheltonozhskii,
Terry Yue Zhuo,
Thomas Wang,
Olivier Dehaene,
Mishig Davaadorj,
Joel Lamy-Poirier,
João Monteiro,
Oleh Shliazhko,
Nicolas Gontier,
Nicholas Meade,
Armel Zebaze,
Ming-Ho Yee,
Logesh Kumar Umapathi,
Jian Zhu
, et al. (42 additional authors not shown)
Abstract:
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle…
▽ More
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.
△ Less
Submitted 13 December, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Denoising-based UNMT is more robust to word-order divergence than MASS-based UNMT
Authors:
Tamali Banerjee,
Rudra Murthy V,
Pushpak Bhattacharyya
Abstract:
We aim to investigate whether UNMT approaches with self-supervised pre-training are robust to word-order divergence between language pairs. We achieve this by comparing two models pre-trained with the same self-supervised pre-training objective. The first model is trained on language pairs with different word-orders, and the second model is trained on the same language pairs with source language r…
▽ More
We aim to investigate whether UNMT approaches with self-supervised pre-training are robust to word-order divergence between language pairs. We achieve this by comparing two models pre-trained with the same self-supervised pre-training objective. The first model is trained on language pairs with different word-orders, and the second model is trained on the same language pairs with source language re-ordered to match the word-order of the target language. Ideally, UNMT approaches which are robust to word-order divergence should exhibit no visible performance difference between the two configurations. In this paper, we investigate two such self-supervised pre-training based UNMT approaches, namely Masked Sequence-to-Sequence Pre-Training, (MASS) (which does not have shuffling noise) and Denoising AutoEncoder (DAE), (which has shuffling noise).
We experiment with five English$\rightarrow$Indic language pairs, i.e., en-hi, en-bn, en-gu, en-kn, and en-ta) where word-order of the source language is SVO (Subject-Verb-Object), and the word-order of the target languages is SOV (Subject-Object-Verb). We observed that for these language pairs, DAE-based UNMT approach consistently outperforms MASS in terms of translation accuracies. Moreover, bridging the word-order gap using reordering improves the translation accuracy of MASS-based UNMT models, while it cannot improve the translation accuracy of DAE-based UNMT models. This observation indicates that DAE-based UNMT is more robust to word-order divergence than MASS-based UNMT. Word-shuffling noise in DAE approach could be the possible reason for the approach being robust to word-order divergence.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Does an IRS Degrade Out-of-Band Performance?
Authors:
L. Yashvanth,
Chandra R. Murthy
Abstract:
Intelligent reflecting surfaces (IRSs) were introduced to enhance the performance of wireless systems. However, from a cellular service provider's view, a concern with the use of an IRS is its effect on out-of-band (OOB) quality of service. Specifically, given two operators, say X and Y, providing services in a geographical area using non-overlapping frequency bands, if operator-X uses an IRS to o…
▽ More
Intelligent reflecting surfaces (IRSs) were introduced to enhance the performance of wireless systems. However, from a cellular service provider's view, a concern with the use of an IRS is its effect on out-of-band (OOB) quality of service. Specifically, given two operators, say X and Y, providing services in a geographical area using non-overlapping frequency bands, if operator-X uses an IRS to optimally enhance the throughput of its users, does the IRS degrade the performance of operator-Y? We answer this by deriving the ergodic sum spectral efficiency (SE) of both operators under round-robin scheduling. We also derive the complementary cumulative distribution function of the change in effective channel at an OOB user with and without the IRS, which provides deeper insights into OOB performance. Surprisingly, we find that even though the IRS is randomly configured from operator-Y's view, the OOB operator still benefits from the IRS, witnessing a performance enhancement for free. This happens because the IRS introduces additional paths between the nodes, increasing the signal power at the receiver and providing diversity benefits. We verify our findings numerically and conclude that an IRS is beneficial to every operator, even when the IRS is deployed to optimally serve only one operator.
△ Less
Submitted 30 June, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Channel State Information Based User Censoring in Irregular Repetition Slotted Aloha
Authors:
Chirag Ramesh Srivatsa,
Chandra R. Murthy
Abstract:
Irregular repetition slotted aloha (IRSA) is a massive random access protocol which can be used to serve a large number of users while achieving a packet loss rate (PLR) close to zero. However, if the number of users is too high, then the system is interference limited and the PLR is close to one. In this paper, we propose a variant of IRSA in the interference limited regime, namely Censored-IRSA…
▽ More
Irregular repetition slotted aloha (IRSA) is a massive random access protocol which can be used to serve a large number of users while achieving a packet loss rate (PLR) close to zero. However, if the number of users is too high, then the system is interference limited and the PLR is close to one. In this paper, we propose a variant of IRSA in the interference limited regime, namely Censored-IRSA (C-IRSA), wherein users with poor channel states censor themselves from transmitting their packets. We theoretically analyze the throughput performance of C-IRSA via density evolution. Using this, we derive closed-form expressions for the optimal choice of the censor threshold which maximizes the throughput while achieving zero PLR among uncensored users. Through extensive numerical simulations, we show that C-IRSA can achieve a 4$\times$ improvement in the peak throughput compared to conventional IRSA.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
Multi-Carrier Wideband OCDM-Based THz Automotive Radar
Authors:
Sangeeta Bhattacharjee,
Kumar Vijay Mishra,
Ramesh Annavajjala,
Chandra R. Murthy
Abstract:
Automotive radars at the Terahertz (THz) frequency band have the potential to be compact and lightweight while providing high (nearly-optical) angular resolution. In this paper, we propose a bistatic THz automotive radar that employs the recently proposed orthogonal chirp division multiplexing (OCDM) multi-carrier waveform. As a stand-alone communications waveform, OCDM has been investigated for r…
▽ More
Automotive radars at the Terahertz (THz) frequency band have the potential to be compact and lightweight while providing high (nearly-optical) angular resolution. In this paper, we propose a bistatic THz automotive radar that employs the recently proposed orthogonal chirp division multiplexing (OCDM) multi-carrier waveform. As a stand-alone communications waveform, OCDM has been investigated for robustness against interference in time-frequency selective channels. The THz-band path loss, and, hence, radar signal bandwidth, are range-dependent. We address this unique feature through a multi-carrier wideband OCDM sensing transceiver that exploits the coherence bandwidth of the THz channel. We develop an optimal scheme to combine the returns at different range/bandwidths by assigning weights based on the Cramer-Rao lower bound on the range and velocity estimates. Numerical experiments demonstrate improved target estimates using our proposed combined estimation from multiple varied-attenuation THz frequencies.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Semi-Structured Object Sequence Encoders
Authors:
Rudra Murthy V,
Riyaz Bhat,
Chulaka Gunasekara,
Siva Sankalp Patel,
Hui Wan,
Tejas Indulal Dhamecha,
Danish Contractor,
Marina Danilevsky
Abstract:
In this paper we explore the task of modeling semi-structured object sequences; in particular, we focus our attention on the problem of developing a structure-aware input representation for such sequences. Examples of such data include user activity on websites, machine logs, and many others. This type of data is often represented as a sequence of sets of key-value pairs over time and can present…
▽ More
In this paper we explore the task of modeling semi-structured object sequences; in particular, we focus our attention on the problem of developing a structure-aware input representation for such sequences. Examples of such data include user activity on websites, machine logs, and many others. This type of data is often represented as a sequence of sets of key-value pairs over time and can present modeling challenges due to an ever-increasing sequence length. We propose a two-part approach, which first considers each key independently and encodes a representation of its values over time; we then self-attend over these value-aware key representations to accomplish a downstream task. This allows us to operate on longer object sequences than existing methods. We introduce a novel shared-attention-head architecture between the two modules and present an innovative training schedule that interleaves the training of both modules with shared weights for some attention heads. Our experiments on multiple prediction tasks using real-world data demonstrate that our approach outperforms a unified network with hierarchical encoding, as well as other methods including a record-centric representation and a flattened representation of the sequence.
△ Less
Submitted 22 May, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages
Authors:
Arnav Mhaske,
Harshit Kedia,
Sumanth Doddapaneni,
Mitesh M. Khapra,
Pratyush Kumar,
Rudra Murthy V,
Anoop Kunchukuttan
Abstract:
We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. The dataset contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 languages. The training dataset has been automaticall…
▽ More
We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. The dataset contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language translation. We also create manually annotated testsets for 9 languages. We demonstrate the utility of the obtained dataset on the Naamapadam-test dataset. We also release IndicNER, a multilingual IndicBERT model fine-tuned on Naamapadam training set. IndicNER achieves an F1 score of more than $80$ for $7$ out of $9$ test languages. The dataset and models are available under open-source licences at https://ai4bharat.iitm.ac.in/naamapadam.
△ Less
Submitted 28 May, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Impact of Mobility on Downlink Cell-Free Massive MIMO Systems
Authors:
Abhinav Anand,
Chandra R. Murthy,
Ribhu Chopra
Abstract:
In this paper, we analyze the achievable downlink spectral efficiency of cell-free massive multiple input multiple output (CF-mMIMO) systems, accounting for the effects of channel aging (caused by user mobility) and pilot contamination. We consider two cases, one where user equipments (UEs) rely on downlink pilots beamformed by the access points (APs) to estimate downlink channel, and another wher…
▽ More
In this paper, we analyze the achievable downlink spectral efficiency of cell-free massive multiple input multiple output (CF-mMIMO) systems, accounting for the effects of channel aging (caused by user mobility) and pilot contamination. We consider two cases, one where user equipments (UEs) rely on downlink pilots beamformed by the access points (APs) to estimate downlink channel, and another where UEs utilize statistical channel state information (CSI) for data decoding. For comparison, we also consider cellular mMIMO and derive its achievable spectral efficiency with channel aging and pilot contamination in the above two cases. Our results show that, in CF-mMIMO, downlink training is preferable over statistical CSI when the length of the data sequence is chosen optimally to maximize the spectral efficiency. In cellular mMIMO, however, either one of the two schemes may be better depending on whether user fairness or sum spectral efficiency is prioritized. Furthermore, the CF-mMIMO system generally outperforms cellular mMIMO even after accounting for the effects of channel aging and pilot contamination. Through numerical results, we illustrate the effect of various system parameters such as the maximum user velocity, uplink/downlink pilot lengths, data duration, network densification, and provide interesting insights into the key differences between cell-free and cellular mMIMO systems.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
The Airport Check-in Counter Allocation Problem: A Survey
Authors:
T. R. Lalita,
G. S. R. Murthy
Abstract:
An important passenger service area that has an impact on passenger satisfaction and airport revenue is the check-in counters. The check-in counter allocation problem consists of allocating adjacent counters to airlines at an airport and scheduling the counters through the day subject to operational constraints. It is a special form of the well-known RCPSP and an NP-hard problem. In addition, the…
▽ More
An important passenger service area that has an impact on passenger satisfaction and airport revenue is the check-in counters. The check-in counter allocation problem consists of allocating adjacent counters to airlines at an airport and scheduling the counters through the day subject to operational constraints. It is a special form of the well-known RCPSP and an NP-hard problem. In addition, the continuous demand for the highest possible number of counters throughout the day by each airline makes it a daily challenge for airport operators. As the counters are a resource, this problem is equivalent to the adjacent resource scheduling problem, making the solutions for this problem extendable to any adjacent resource scheduling problem. Decisions made at the check-in counters affect the movement of passengers in the airport and bad decisions can result in chaos. Since the 1980s several authors have proposed a multitude of models with variations in the optimization criteria, modeling, airport requirements, and airport layouts. This article presents a state-of-art survey on the airport check-in counter allocation problem by focusing on relevant models and algorithms. Relevant literature is discussed based on the type of problem solved, objectives considered and methodology considered. The value of this research is to help airport operators in planning and allocation of check-in counters, increase airport revenue, improve passenger flow within existing constraints, and optimize utilization of the existing infrastructure.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
An Efficient Algorithm to the Integrated Shift and Task Scheduling Problem
Authors:
G S R Murthy,
T R Lalita
Abstract:
This paper deals with operational models for integrated shift and task scheduling problem. Staff scheduling problem is a special case of this with staff requirements as given input to the problem. Both problems become hard to solve when the problems are considered with flexible shifts. Current literature on these problems leaves good scope for potential research. In this article, we propose a new…
▽ More
This paper deals with operational models for integrated shift and task scheduling problem. Staff scheduling problem is a special case of this with staff requirements as given input to the problem. Both problems become hard to solve when the problems are considered with flexible shifts. Current literature on these problems leaves good scope for potential research. In this article, we propose a new method to solve the integrated problem and its special case, the staff scheduling problem. We consider these problems with wide flexibility - a feature that is addressed in a limited way in the existing literature. We introduce a new technique to solve the problem with large demand efficiently. When the objective function is the number of workers, we provide a tight lower bound that is easily computable. Through a number of numerical experiments with live and simulated problem instances, we demonstrate huge savings in the solution times over the existing ones.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Performance Analysis of Irregular Repetition Slotted Aloha with Multi-Cell Interference
Authors:
Chirag Ramesh Srivatsa,
Chandra R. Murthy
Abstract:
Irregular repetition slotted aloha (IRSA) is a massive random access protocol in which users transmit several replicas of their packet over a frame to a base station. Existing studies have analyzed IRSA in the single-cell (SC) setup, which does not extend to the more practically relevant multi-cell (MC) setup due to the inter-cell interference. In this work, we analyze MC IRSA, accounting for pilo…
▽ More
Irregular repetition slotted aloha (IRSA) is a massive random access protocol in which users transmit several replicas of their packet over a frame to a base station. Existing studies have analyzed IRSA in the single-cell (SC) setup, which does not extend to the more practically relevant multi-cell (MC) setup due to the inter-cell interference. In this work, we analyze MC IRSA, accounting for pilot contamination and multiuser interference. Via numerical simulations, we illustrate that, in practical settings, MC IRSA can have a drastic loss of throughput, up to $70\%$, compared to SC IRSA. Further, MC IRSA requires a significantly higher training length (about 4-5x compared to SC IRSA), in order to support the same user density and achieve the same throughput. We also provide insights into the impact of the pilot length, number of antennas, and signal to noise ratio on the performance of MC IRSA.
△ Less
Submitted 28 May, 2022; v1 submitted 14 May, 2022;
originally announced May 2022.