Search | arXiv e-print repository

doi 10.1145/3722212.3724430

AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes

Authors: Anja Gruenheid, Jesús Camacho-Rodríguez, Carlo Curino, Raghu Ramakrishnan, Stanislav Pak, Sumedh Sakdeo, Lenisha Gandhi, Sandeep K. Singhal, Pooja Nilangekar, Daniel J. Abadi

Abstract: The proliferation of small files in data lakes poses significant challenges, including degraded query performance, increased storage costs, and scalability bottlenecks in distributed storage systems. Log-structured table formats (LSTs) such as Delta Lake, Apache Iceberg, and Apache Hudi exacerbate this issue due to their append-only write patterns and metadata-intensive operations. While compactio… ▽ More The proliferation of small files in data lakes poses significant challenges, including degraded query performance, increased storage costs, and scalability bottlenecks in distributed storage systems. Log-structured table formats (LSTs) such as Delta Lake, Apache Iceberg, and Apache Hudi exacerbate this issue due to their append-only write patterns and metadata-intensive operations. While compaction--the process of consolidating small files into fewer, larger files--is a common solution, existing automation mechanisms often lack the flexibility and scalability to adapt to diverse workloads and system requirements while balancing the trade-offs between compaction benefits and costs. In this paper, we present AutoComp, a scalable framework for automatic data compaction tailored to the needs of modern data lakes. Drawing on deployment experience at LinkedIn, we analyze the operational impact of small file proliferation, establish key requirements for effective automatic compaction, and demonstrate how AutoComp addresses these challenges. Our evaluation, conducted using synthetic benchmarks and production environments via integration with OpenHouse--a control plane for catalog management, schema governance, and data services--shows significant improvements in file count reduction and query performance. We believe AutoComp's built-in extensibility provides a robust foundation for evolving compaction systems, facilitating future integration of refined multi-objective optimization approaches, workload-aware compaction strategies, and expanded support for broader data layout optimizations. △ Less

Submitted 5 April, 2025; originally announced April 2025.

Journal ref: ACM SIGMOD 2025

arXiv:2504.01793 [pdf, ps, other]

Optimal shift-invariant spaces from uniform measurements

Authors: Rohan Joy, Radha Ramakrishnan

Abstract: Let $m$ be a positive integer and $\mathcal{C}$ be a collection of closed subspaces in $L^2(\mathbb{R})$. Given the measurements $\mathcal{F}_Y=\left\lbrace \left\lbrace y_k^1 \right\rbrace_{k\in \mathbb{Z}},\ldots, \left\lbrace y_k^m \right\rbrace_{k\in \mathbb{Z}} \right\rbrace \subset \ell^2(\mathbb{Z})$ of unknown functions… ▽ More Let $m$ be a positive integer and $\mathcal{C}$ be a collection of closed subspaces in $L^2(\mathbb{R})$. Given the measurements $\mathcal{F}_Y=\left\lbrace \left\lbrace y_k^1 \right\rbrace_{k\in \mathbb{Z}},\ldots, \left\lbrace y_k^m \right\rbrace_{k\in \mathbb{Z}} \right\rbrace \subset \ell^2(\mathbb{Z})$ of unknown functions $\mathcal{F}=\left\{f_1, \ldots,f_m \right\} \subset L^2( \mathbb{R})$, in this paper we study the problem of finding an optimal space $S$ in $\mathcal{C}$ that is ``closest" to the measurements $\mathcal{F}_Y$ of $\mathcal{F}$. Since the class of finitely generated shift-invariant spaces (FSISs) is popularly used for modelling signals, we assume $\mathcal{C}$ consists of FSISs. We will be considering three cases. In the first case, $\mathcal{C}$ consists of FSISs without any assumption on extra invariance. In the second case, we assume $\mathcal{C}$ consists of extra invariant FSISs, and in the third case, we assume $\mathcal{C}$ has translation-invariant FSISs. In all three cases, we prove the existence of an optimal space. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2503.14712 [pdf, other]

Distribution and Purification of Entanglement States in Quantum Networks

Authors: Xiaojie Fan, Yukun Yang, Himanshu Gupta, C. R. Ramakrishnan

Abstract: We consider problems of distributing high-fidelity entangled states across nodes of a quantum network. We consider a repeater-based network architecture with entanglement swapping (fusion) operations for generating long-distance entanglements, and purification operations that produce high-fidelity states from several lower-fidelity states. The contributions of this paper are two-fold: First, while… ▽ More We consider problems of distributing high-fidelity entangled states across nodes of a quantum network. We consider a repeater-based network architecture with entanglement swapping (fusion) operations for generating long-distance entanglements, and purification operations that produce high-fidelity states from several lower-fidelity states. The contributions of this paper are two-fold: First, while there have been several works on fidelity-aware routing and incorporating purification into routing for generating EPs, this paper presents the first algorithms for optimal solutions to the high-fidelity EP distribution problem. We provide a dynamic programming algorithm for generating the optimal tree of operations to produce a high-fidelity EP, and an LP-based algorithm for generating an optimal collection of trees. Second, following the EP algorithms, this paper presents the first algorithms for the high-fidelity GHZ-state distribution problem and characterizes its optimality. We evaluate our techniques via simulations over NetSquid, a quantum network simulator. △ Less

Submitted 23 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

Comments: 9 pages, 8 figures

arXiv:2503.10904 [pdf, other]

Transferring Kinesthetic Demonstrations across Diverse Objects for Manipulation Planning

Authors: Dibyendu Das, Aditya Patankar, Nilanjan Chakraborty, C. R. Ramakrishnan, I. V. Ramakrishnan

Abstract: Given a demonstration of a complex manipulation task such as pouring liquid from one container to another, we seek to generate a motion plan for a new task instance involving objects with different geometries. This is non-trivial since we need to simultaneously ensure that the implicit motion constraints are satisfied (glass held upright while moving), the motion is collision-free, and that the ta… ▽ More Given a demonstration of a complex manipulation task such as pouring liquid from one container to another, we seek to generate a motion plan for a new task instance involving objects with different geometries. This is non-trivial since we need to simultaneously ensure that the implicit motion constraints are satisfied (glass held upright while moving), the motion is collision-free, and that the task is successful (e.g. liquid is poured into the target container). We solve this problem by identifying positions of critical locations and associating a reference frame (called motion transfer frames) on the manipulated object and the target, selected based on their geometries and the task at hand. By tracking and transferring the path of the motion transfer frames, we generate motion plans for arbitrary task instances with objects of different geometries and poses. We show results from simulation as well as robot experiments on physical objects to evaluate the effectiveness of our solution. △ Less

Submitted 13 March, 2025; originally announced March 2025.

arXiv:2502.03429 [pdf, other]

On Fairness of Unified Multimodal Large Language Model for Image Generation

Authors: Ming Liu, Hao Chen, Jindong Wang, Liwen Wang, Bhiksha Raj Ramakrishnan, Wensheng Zhang

Abstract: Unified multimodal large language models (U-MLLMs) have demonstrated impressive performance in visual understanding and generation in an end-to-end pipeline. Compared with generation-only models (e.g., Stable Diffusion), U-MLLMs may raise new questions about bias in their outputs, which can be affected by their unified capabilities. This gap is particularly concerning given the under-explored risk… ▽ More Unified multimodal large language models (U-MLLMs) have demonstrated impressive performance in visual understanding and generation in an end-to-end pipeline. Compared with generation-only models (e.g., Stable Diffusion), U-MLLMs may raise new questions about bias in their outputs, which can be affected by their unified capabilities. This gap is particularly concerning given the under-explored risk of propagating harmful stereotypes. In this paper, we benchmark the latest U-MLLMs and find that most exhibit significant demographic biases, such as gender and race bias. To better understand and mitigate this issue, we propose a locate-then-fix strategy, where we audit and show how the individual model component is affected by bias. Our analysis shows that bias originates primarily from the language model. More interestingly, we observe a "partial alignment" phenomenon in U-MLLMs, where understanding bias appears minimal, but generation bias remains substantial. Thus, we propose a novel balanced preference model to balance the demographic distribution with synthetic data. Experiments demonstrate that our approach reduces demographic bias while preserving semantic fidelity. We hope our findings underscore the need for more holistic interpretation and debiasing strategies of U-MLLMs in the future. △ Less

Submitted 5 February, 2025; originally announced February 2025.

arXiv:2411.04036 [pdf, other]

Stepping Forward on the Last Mile

Authors: Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Andrew Zou Li

Abstract: Continuously adapting pre-trained models to local data on resource constrained edge devices is the $\emph{last mile}$ for model deployment. However, as models increase in size and depth, backpropagation requires a large amount of memory, which becomes prohibitive for edge devices. In addition, most existing low power neural processing engines (e.g., NPUs, DSPs, MCUs, etc.) are designed as fixed-po… ▽ More Continuously adapting pre-trained models to local data on resource constrained edge devices is the $\emph{last mile}$ for model deployment. However, as models increase in size and depth, backpropagation requires a large amount of memory, which becomes prohibitive for edge devices. In addition, most existing low power neural processing engines (e.g., NPUs, DSPs, MCUs, etc.) are designed as fixed-point inference accelerators, without training capabilities. Forward gradients, solely based on directional derivatives computed from two forward calls, have been recently used for model training, with substantial savings in computation and memory. However, the performance of quantized training with fixed-point forward gradients remains unclear. In this paper, we investigate the feasibility of on-device training using fixed-point forward gradients, by conducting comprehensive experiments across a variety of deep learning benchmark tasks in both vision and audio domains. We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation. An empirical study on how training with forward gradients navigates in the loss landscape is further explored. Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach. △ Less

Submitted 6 November, 2024; originally announced November 2024.

arXiv:2410.18275 [pdf, other]

Screw Geometry Meets Bandits: Incremental Acquisition of Demonstrations to Generate Manipulation Plans

Authors: Dibyendu Das, Aditya Patankar, Nilanjan Chakraborty, C. R. Ramakrishnan, I. V. Ramakrishnan

Abstract: In this paper, we study the problem of methodically obtaining a sufficient set of kinesthetic demonstrations, one at a time, such that a robot can be confident of its ability to perform a complex manipulation task in a given region of its workspace. Although Learning from Demonstrations has been an active area of research, the problems of checking whether a set of demonstrations is sufficient, and… ▽ More In this paper, we study the problem of methodically obtaining a sufficient set of kinesthetic demonstrations, one at a time, such that a robot can be confident of its ability to perform a complex manipulation task in a given region of its workspace. Although Learning from Demonstrations has been an active area of research, the problems of checking whether a set of demonstrations is sufficient, and systematically seeking additional demonstrations have remained open. We present a novel approach to address these open problems using (i) a screw geometric representation to generate manipulation plans from demonstrations, which makes the sufficiency of a set of demonstrations measurable; (ii) a sampling strategy based on PAC-learning from multi-armed bandit optimization to evaluate the robot's ability to generate manipulation plans in a subregion of its task space; and (iii) a heuristic to seek additional demonstration from areas of weakness. Thus, we present an approach for the robot to incrementally and actively ask for new demonstration examples until the robot can assess with high confidence that it can perform the task successfully. We present experimental results on two example manipulation tasks, namely, pouring and scooping, to illustrate our approach. A short video on the method: https://youtu.be/R-qICICdEos △ Less

Submitted 23 October, 2024; originally announced October 2024.

Comments: 8 pages, 6 figures, under review in IEEE Robotics and Automation Letters

arXiv:2410.18221 [pdf, other]

Data Augmentation for Automated Adaptive Rodent Training

Authors: Dibyendu Das, Alfredo Fontanini, Joshua F. Kogan, Haibin Ling, C. R. Ramakrishnan, I. V. Ramakrishnan

Abstract: Fully optimized automation of behavioral training protocols for lab animals like rodents has long been a coveted goal for researchers. It is an otherwise labor-intensive and time-consuming process that demands close interaction between the animal and the researcher. In this work, we used a data-driven approach to optimize the way rodents are trained in labs. In pursuit of our goal, we looked at da… ▽ More Fully optimized automation of behavioral training protocols for lab animals like rodents has long been a coveted goal for researchers. It is an otherwise labor-intensive and time-consuming process that demands close interaction between the animal and the researcher. In this work, we used a data-driven approach to optimize the way rodents are trained in labs. In pursuit of our goal, we looked at data augmentation, a technique that scales well in data-poor environments. Using data augmentation, we built several artificial rodent models, which in turn would be used to build an efficient and automatic trainer. Then we developed a novel similarity metric based on the action probability distribution to measure the behavioral resemblance of our models to that of real rodents. △ Less

Submitted 23 October, 2024; originally announced October 2024.

Comments: 5 pages, 3 figures

arXiv:2410.14357 [pdf, other]

Efficient charge-preserving excited state preparation with variational quantum algorithms

Authors: Zohim Chandani, Kazuki Ikeda, Zhong-Bo Kang, Dmitri E. Kharzeev, Alexander McCaskey, Andrea Palermo, C. R. Ramakrishnan, Pooja Rao, Ranjani G. Sundaram, Kwangmin Yu

Abstract: Determining the spectrum and wave functions of excited states of a system is crucial in quantum physics and chemistry. Low-depth quantum algorithms, such as the Variational Quantum Eigensolver (VQE) and its variants, can be used to determine the ground-state energy. However, current approaches to computing excited states require numerous controlled unitaries, making the application of the original… ▽ More Determining the spectrum and wave functions of excited states of a system is crucial in quantum physics and chemistry. Low-depth quantum algorithms, such as the Variational Quantum Eigensolver (VQE) and its variants, can be used to determine the ground-state energy. However, current approaches to computing excited states require numerous controlled unitaries, making the application of the original Variational Quantum Deflation (VQD) algorithm to problems in chemistry or physics suboptimal. In this study, we introduce a charge-preserving VQD (CPVQD) algorithm, designed to incorporate symmetry and the corresponding conserved charge into the VQD framework. This results in dimension reduction, significantly enhancing the efficiency of excited-state computations. We present benchmark results with GPU-accelerated simulations using systems up to 24 qubits, showcasing applications in high-energy physics, nuclear physics, and quantum chemistry. This work is performed on NERSC's Perlmutter system using NVIDIA's open-source platform for accelerated quantum supercomputing - CUDA-Q. △ Less

Submitted 18 October, 2024; originally announced October 2024.

Comments: 20 pages, 6 figures, 1 table

arXiv:2410.09135 [pdf, other]

Enabling Advanced Land Cover Analytics: An Integrated Data Extraction Pipeline for Predictive Modeling with the Dynamic World Dataset

Authors: Victor Radermecker, Andrea Zanon, Nancy Thomas, Annita Vapsi, Saba Rahimi, Rama Ramakrishnan, Daniel Borrajo

Abstract: Understanding land cover holds considerable potential for a myriad of practical applications, particularly as data accessibility transitions from being exclusive to governmental and commercial entities to now including the broader research community. Nevertheless, although the data is accessible to any community member interested in exploration, there exists a formidable learning curve and no stan… ▽ More Understanding land cover holds considerable potential for a myriad of practical applications, particularly as data accessibility transitions from being exclusive to governmental and commercial entities to now including the broader research community. Nevertheless, although the data is accessible to any community member interested in exploration, there exists a formidable learning curve and no standardized process for accessing, pre-processing, and leveraging the data for subsequent tasks. In this study, we democratize this data by presenting a flexible and efficient end to end pipeline for working with the Dynamic World dataset, a cutting-edge near-real-time land use/land cover (LULC) dataset. This includes a pre-processing and representation framework which tackles noise removal, efficient extraction of large amounts of data, and re-representation of LULC data in a format well suited for several downstream tasks. To demonstrate the power of our pipeline, we use it to extract data for an urbanization prediction problem and build a suite of machine learning models with excellent performance. This task is easily generalizable to the prediction of any type of land cover and our pipeline is also compatible with a series of other downstream tasks. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2405.07499 [pdf, other]

Distributed Quantum Computation with Minimum Circuit Execution Time over Quantum Networks

Authors: Ranjani G Sundaram, Himanshu Gupta, C. R. Ramakrishnan

Abstract: Present quantum computers are constrained by limited qubit capacity and restricted physical connectivity, leading to challenges in large-scale quantum computations. Distributing quantum computations across a network of quantum computers is a promising way to circumvent these challenges and facilitate large quantum computations. However, distributed quantum computations require entanglements (to ex… ▽ More Present quantum computers are constrained by limited qubit capacity and restricted physical connectivity, leading to challenges in large-scale quantum computations. Distributing quantum computations across a network of quantum computers is a promising way to circumvent these challenges and facilitate large quantum computations. However, distributed quantum computations require entanglements (to execute remote gates) which can incur significant generation latency and, thus, lead to decoherence of qubits. In this work, we consider the problem of distributing quantum circuits across a quantum network to minimize the execution time. The problem entails mapping the circuit qubits to network memories, including within each computer since limited connectivity within computers can affect the circuit execution time. We provide two-step solutions for the above problem: In the first step, we allocate qubits to memories to minimize the estimated execution time; for this step, we design an efficient algorithm based on an approximation algorithm for the max-quadratic-assignment problem. In the second step, we determine an efficient execution scheme, including generating required entanglements with minimum latency under the network resource and decoherence constraints; for this step, we develop two algorithms with appropriate performance guarantees under certain settings or assumptions. We consider multiple protocols for executing remote gates, viz., telegates and cat-entanglements. With extensive simulations over NetSquid, a quantum network simulator, we demonstrate the effectiveness of our developed techniques and show that they outperform a scheme based on prior work by up to 95%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.01813 [pdf, other]

doi 10.1145/3555041.3589674

Towards Building Autonomous Data Services on Azure

Authors: Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas Mueller, Kartheek Muthyala, Harsha Nagulapalli , et al. (13 additional authors not shown)

Abstract: Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to… ▽ More Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. This paper presents our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: SIGMOD Companion of the 2023 International Conference on Management of Data. 2023

arXiv:2405.00222 [pdf, other]

Optimized Distribution of Entanglement Graph States in Quantum Networks

Authors: Xiaojie Fan, Caitao Zhan, Himanshu Gupta, C. R. Ramakrishnan

Abstract: Building large-scale quantum computers, essential to demonstrating quantum advantage, is a key challenge. Quantum Networks (QNs) can help address this challenge by enabling the construction of large, robust, and more capable quantum computing platforms by connecting smaller quantum computers. Moreover, unlike classical systems, QNs can enable fully secured long-distance communication. Thus, quantu… ▽ More Building large-scale quantum computers, essential to demonstrating quantum advantage, is a key challenge. Quantum Networks (QNs) can help address this challenge by enabling the construction of large, robust, and more capable quantum computing platforms by connecting smaller quantum computers. Moreover, unlike classical systems, QNs can enable fully secured long-distance communication. Thus, quantum networks lie at the heart of the success of future quantum information technologies. In quantum networks, multipartite entangled states distributed over the network help implement and support many quantum network applications for communications, sensing, and computing. Our work focuses on developing optimal techniques to generate and distribute multipartite entanglement states efficiently. Prior works on generating general multipartite entanglement states have focused on the objective of minimizing the number of maximally entangled pairs (EPs) while ignoring the heterogeneity of the network nodes and links as well as the stochastic nature of underlying processes. In this work, we develop a hypergraph based linear programming framework that delivers optimal (under certain assumptions) generation schemes for general multipartite entanglement represented by graph states, under the network resources, decoherence, and fidelity constraints, while considering the stochasticity of the underlying processes. We illustrate our technique by developing generation schemes for the special cases of path and tree graph states, and discuss optimized generation schemes for more general classes of graph states. Using extensive simulations over a quantum network simulator (NetSquid), we demonstrate the effectiveness of our developed techniques and show that they outperform prior known schemes by up to orders of magnitude. △ Less

Submitted 18 March, 2025; v1 submitted 30 April, 2024; originally announced May 2024.

Comments: 16 pages, 20 figures

arXiv:2401.11162 [pdf]

Extending Polaris to Support Transactions

Authors: Josep Aguilar-Saborit, Raghu Ramakrishnan, Kevin Bocksrocker, Alan Halverson, Konstantin Kosinsky, Ryan O'Connor, Nadejda Poliakova, Moe Shafiei, Taewoo Kim, Phil Kon-Kim, Haris Mahmud-Ansari, Blazej Matuszyk, Matt Miles, Sumin Mohanan, Cristian Petculescu, Ishan Rahesh-Madan, Emma Rose-Wirshing, Elias Yousefi

Abstract: In Polaris, we introduced a cloud-native distributed query processor to perform analytics at scale. In this paper, we extend the underlying Polaris distributed computation framework, which can be thought of as a read-only transaction engine, to execute general transactions (including updates, deletes, inserts and bulk loads, in addition to queries) for Tier 1 warehousing workloads in a highly perf… ▽ More In Polaris, we introduced a cloud-native distributed query processor to perform analytics at scale. In this paper, we extend the underlying Polaris distributed computation framework, which can be thought of as a read-only transaction engine, to execute general transactions (including updates, deletes, inserts and bulk loads, in addition to queries) for Tier 1 warehousing workloads in a highly performant and predictable manner. We take advantage of the immutability of data files in log-structured data stores and build on SQL Server transaction management to deliver full transactional support with Snapshot Isolation semantics, including multi-table and multi-statement transactions. With the enhancements described in this paper, Polaris supports both query processing and transactions for T-SQL in Microsoft Fabric. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: 12 pages, 12 Figures

arXiv:2401.09621 [pdf, other]

XTable in Action: Seamless Interoperability in Data Lakes

Authors: Ashvin Agrawal, Tim Brown, Anoop Johnson, Jesús Camacho-Rodríguez, Kyle Weller, Carlo Curino, Raghu Ramakrishnan

Abstract: Contemporary approaches to data management are increasingly relying on unified analytics and AI platforms to foster collaboration, interoperability, seamless access to reliable data, and high performance. Data Lakes featuring open standard table formats such as Delta Lake, Apache Hudi, and Apache Iceberg are central components of these data architectures. Choosing the right format for managing a t… ▽ More Contemporary approaches to data management are increasingly relying on unified analytics and AI platforms to foster collaboration, interoperability, seamless access to reliable data, and high performance. Data Lakes featuring open standard table formats such as Delta Lake, Apache Hudi, and Apache Iceberg are central components of these data architectures. Choosing the right format for managing a table is crucial for achieving the objectives mentioned above. The challenge lies in selecting the best format, a task that is onerous and can yield temporary results, as the ideal choice may shift over time with data growth, evolving workloads, and the competitive development of table formats and processing engines. Moreover, restricting data access to a single format can hinder data sharing resulting in diminished business value over the long term. The ability to seamlessly interoperate between formats and with negligible overhead can effectively address these challenges. Our solution in this direction is an innovative omni-directional translator, XTable, that facilitates writing data in one format and reading it in any format, thus achieving the desired format interoperability. In this work, we demonstrate the effectiveness of XTable through application scenarios inspired by real-world use cases. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2312.11569 [pdf]

doi 10.5281/zenodo.10500601

Application of AI in Nutrition

Authors: Ritu Ramakrishnan, Tianxiang Xing, Tianfeng Chen, Ming-Hao Lee, Jinzhu Gao

Abstract: In healthcare, artificial intelligence (AI) has been changing the way doctors and health experts take care of people. This paper will cover how AI is making major changes in the health care system, especially with nutrition. Various machine learning and deep learning algorithms have been developed to extract valuable information from healthcare data which help doctors, nutritionists, and health ex… ▽ More In healthcare, artificial intelligence (AI) has been changing the way doctors and health experts take care of people. This paper will cover how AI is making major changes in the health care system, especially with nutrition. Various machine learning and deep learning algorithms have been developed to extract valuable information from healthcare data which help doctors, nutritionists, and health experts to make better decisions and make our lifestyle healthy. This paper provides an overview of the current state of AI applications in healthcare with a focus on the utilization of AI-driven recommender systems in nutrition. It will discuss the positive outcomes and challenges that arise when AI is used in this field. This paper addresses the challenges to develop AI recommender systems in healthcare, providing a well-rounded perspective on the complexities. Real-world examples and research findings are presented to underscore the tangible and significant impact AI recommender systems have in the field of healthcare, particularly in nutrition. The ongoing efforts of applying AI in nutrition lay the groundwork for a future where personalized recommendations play a pivotal role in guiding individuals toward healthier lifestyles. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Journal ref: Journal of Advances in Information Science and Technology, Volume 1, Issue 1, 2023, Pages 7-12

arXiv:2312.08598 [pdf, other]

MotherNet: Fast Training and Inference via Hyper-Network Transformers

Authors: Andreas Müller, Carlo Curino, Raghu Ramakrishnan

Abstract: Foundation models are transforming machine learning across many modalities, with in-context learning replacing classical model training. Recent work on tabular data hints at a similar opportunity to build foundation models for classification for numerical data. However, existing meta-learning approaches can not compete with tree-based methods in terms of inference time. In this paper, we propose M… ▽ More Foundation models are transforming machine learning across many modalities, with in-context learning replacing classical model training. Recent work on tabular data hints at a similar opportunity to build foundation models for classification for numerical data. However, existing meta-learning approaches can not compete with tree-based methods in terms of inference time. In this paper, we propose MotherNet, a hypernetwork architecture trained on synthetic classification tasks that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network by in-context learning using a single forward pass. In contrast to most existing hypernetworks that are usually trained for relatively constrained multi-task settings, MotherNet can create models for multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent. The child network generated by MotherNet outperforms neural networks trained using gradient descent on small datasets, and is comparable to predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of TabPFN, MotherNet generated networks are highly efficient at inference time. We also demonstrate that HyperFast is unable to perform effective in-context learning on small datasets, and heavily relies on dataset specific fine-tuning and hyper-parameter tuning, while MotherNet requires no fine-tuning or per-dataset hyper-parameters. △ Less

Submitted 9 May, 2025; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: 17 pages, 13 figures

ACM Class: I.2.6

arXiv:2311.09593 [pdf, other]

Multi-Step Dialogue Workflow Action Prediction

Authors: Ramya Ramakrishnan, Ethan R. Elenberg, Hashan Narangodage, Ryan McDonald

Abstract: In task-oriented dialogue, a system often needs to follow a sequence of actions, called a workflow, that complies with a set of guidelines in order to complete a task. In this paper, we propose the novel problem of multi-step workflow action prediction, in which the system predicts multiple future workflow actions. Accurate prediction of multiple steps allows for multi-turn automation, which can f… ▽ More In task-oriented dialogue, a system often needs to follow a sequence of actions, called a workflow, that complies with a set of guidelines in order to complete a task. In this paper, we propose the novel problem of multi-step workflow action prediction, in which the system predicts multiple future workflow actions. Accurate prediction of multiple steps allows for multi-turn automation, which can free up time to focus on more complex tasks. We propose three modeling approaches that are simple to implement yet lead to more action automation: 1) fine-tuning on a training dataset, 2) few-shot in-context learning leveraging retrieval and large language model prompting, and 3) zero-shot graph traversal, which aggregates historical action sequences into a graph for prediction. We show that multi-step action prediction produces features that improve accuracy on downstream dialogue tasks like predicting task success, and can increase automation of steps by 20% without requiring as much feedback from a human overseeing the system. △ Less

Submitted 12 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.08300 [pdf, other]

Workflow-Guided Response Generation for Task-Oriented Dialogue

Authors: Do June Min, Paloma Sodhi, Ramya Ramakrishnan

Abstract: Task-oriented dialogue (TOD) systems aim to achieve specific goals through interactive dialogue. Such tasks usually involve following specific workflows, i.e. executing a sequence of actions in a particular order. While prior work has focused on supervised learning methods to condition on past actions, they do not explicitly optimize for compliance to a desired workflow. In this paper, we propose… ▽ More Task-oriented dialogue (TOD) systems aim to achieve specific goals through interactive dialogue. Such tasks usually involve following specific workflows, i.e. executing a sequence of actions in a particular order. While prior work has focused on supervised learning methods to condition on past actions, they do not explicitly optimize for compliance to a desired workflow. In this paper, we propose a novel framework based on reinforcement learning (RL) to generate dialogue responses that are aligned with a given workflow. Our framework consists of ComplianceScorer, a metric designed to evaluate how well a generated response executes the specified action, combined with an RL opimization process that utilizes an interactive sampling technique. We evaluate our approach on two TOD datasets, Action-Based Conversations Dataset (ABCD) (Chen et al., 2021a) and MultiWOZ 2.2 (Zang et al., 2020) on a range of automated and human evaluation metrics. Our findings indicate that our RL-based framework outperforms baselines and is effective at enerating responses that both comply with the intended workflows while being expressed in a natural and fluent manner. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2306.15758 [pdf, ps, other]

On the reconstruction of bandlimited signals from random samples quantized via noise-shaping

Authors: Rohan Joy, Felix Krahmer, Alessandro Lupoli, Radha Ramakrishnan

Abstract: Noise-shaping quantization techniques are widely used for converting bandlimited signals from the analog to the digital domain. They work by "shaping" the quantization noise so that it falls close to the reconstruction operator's null space. We investigate the compatibility of two such schemes, specifically $ΣΔ$ quantization and distributed noise-shaping quantization, with random samples of bandli… ▽ More Noise-shaping quantization techniques are widely used for converting bandlimited signals from the analog to the digital domain. They work by "shaping" the quantization noise so that it falls close to the reconstruction operator's null space. We investigate the compatibility of two such schemes, specifically $ΣΔ$ quantization and distributed noise-shaping quantization, with random samples of bandlimited functions. Let $f$ be a real-valued $π$-bandlimited function. Suppose $R>1$ is a real number and assume that $\{x_i\}_{i=1}^m$ is a sequence of i.i.d random variables uniformly distributed on $[-\tilde{R},\tilde{R}]$, where $\tilde{R}>R$ is appropriately chosen. We show that by using a noise-shaping quantizer to quantize the values of $f$ at $\{x_i\}_{i=1}^m$, a function $f^{\sharp}$ can be reconstructed from these quantized values such that $\|f-f^{\sharp}\|_{L^2[-R, R]}$ decays with high probability as $m$ and $\tilde{R}$ increase. We emphasize that the sample points $\{x_i\}_{i=1}^m$ are completely random, i.e., they have no predefined structure, which makes our findings the first of their kind. △ Less

Submitted 1 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: 31 pages, 3 figures

MSC Class: 94A20; 94A12; 42C15; 41A29

arXiv:2305.01120 [pdf, other]

doi 10.1145/3639314

LST-Bench: Benchmarking Log-Structured Tables in the Cloud

Authors: Jesús Camacho-Rodríguez, Ashvin Agrawal, Anja Gruenheid, Ashit Gosalia, Cristian Petculescu, Josep Aguilar-Saborit, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan

Abstract: Data processing engines increasingly leverage distributed file systems for scalable, cost-effective storage. While the Apache Parquet columnar format has become a popular choice for data storage and retrieval, the immutability of Parquet files renders it impractical to meet the demands of frequent updates in contemporary analytical workloads. Log-Structured Tables (LSTs), such as Delta Lake, Apach… ▽ More Data processing engines increasingly leverage distributed file systems for scalable, cost-effective storage. While the Apache Parquet columnar format has become a popular choice for data storage and retrieval, the immutability of Parquet files renders it impractical to meet the demands of frequent updates in contemporary analytical workloads. Log-Structured Tables (LSTs), such as Delta Lake, Apache Iceberg, and Apache Hudi, offer an alternative for scenarios requiring data mutability, providing a balance between efficient updates and the benefits of columnar storage. They provide features like transactions, time-travel, and schema evolution, enhancing usability and enabling access from multiple engines. Moreover, engines like Apache Spark and Trino can be configured to leverage the optimizations and controls offered by LSTs to meet specific business needs. Conventional benchmarks and tools are inadequate for evaluating the transformative changes in the storage layer resulting from these advancements, as they do not allow us to measure the impact of design and optimization choices in this new setting. In this paper, we propose a novel benchmarking approach and metrics that build upon existing benchmarks, aiming to systematically assess LSTs. We develop a framework, LST-Bench, which facilitates effective exploration and evaluation of the collaborative functioning of LSTs and data processing engines through tailored benchmark packages. A package is a mix of use patterns reflecting a target workload; LST-Bench makes it easy to define a wide range of use patterns and combine them into a package, and we include a baseline package for completeness. Our assessment demonstrates the effectiveness of our framework and benchmark packages in extracting valuable insights across diverse environments. The code for LST-Bench is open-sourced and is available at https://github.com/microsoft/lst-bench/ . △ Less

Submitted 19 January, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

Journal ref: Proceedings of the ACM on Management of Data (2024) Volume 2 Issue 1

arXiv:2210.14047 [pdf, other]

OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance from Database Logs [Technical Report]

Authors: Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesús Camacho-Rodríguez, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan

Abstract: Provenance encodes information that connects datasets, their generation workflows, and associated metadata (e.g., who or when executed a query). As such, it is instrumental for a wide range of critical governance applications (e.g., observability and auditing). Unfortunately, in the context of database systems, extracting coarse-grained provenance is a long-standing problem due to the complexity a… ▽ More Provenance encodes information that connects datasets, their generation workflows, and associated metadata (e.g., who or when executed a query). As such, it is instrumental for a wide range of critical governance applications (e.g., observability and auditing). Unfortunately, in the context of database systems, extracting coarse-grained provenance is a long-standing problem due to the complexity and sheer volume of database workflows. Provenance extraction from query event logs has been recently proposed as favorable because, in principle, can result in meaningful provenance graphs for provenance applications. Current approaches, however, (a) add substantial overhead to the database and provenance extraction workflows and (b)~extract provenance that is noisy, omits query execution dependencies, and is not rich enough for upstream applications. To address these problems, we introduce OneProvenance: an efficient provenance extraction system from query event logs. OneProvenance addresses the unique challenges of log-based extraction by (a)~identifying query execution dependencies through efficient log analysis, (b) extracting provenance through novel event transformations that account for query dependencies, and (c)~introducing effective filtering optimizations. Our thorough experimental analysis shows that OneProvenance can improve extraction by up to ~18X compared to state-of-the-art baselines; our optimizations reduce the extraction noise and optimize performance even further. OneProvenance is deployed at scale by Microsoft Purview and actively supports customer provenance extraction needs (https://bit.ly/3N2JVGF). △ Less

Submitted 3 March, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

ACM Class: H.2

arXiv:2206.06437 [pdf, other]

Distribution of Quantum Circuits Over General Quantum Networks

Authors: Ranjani G Sundaram, Himanshu Gupta, C. R. Ramakrishnan

Abstract: Near-term quantum computers can hold only a small number of qubits. One way to facilitate large-scale quantum computations is through a distributed network of quantum computers. In this work, we consider the problem of distributing quantum programs represented as quantum circuits across a quantum network of heterogeneous quantum computers, in a way that minimizes the overall communication cost req… ▽ More Near-term quantum computers can hold only a small number of qubits. One way to facilitate large-scale quantum computations is through a distributed network of quantum computers. In this work, we consider the problem of distributing quantum programs represented as quantum circuits across a quantum network of heterogeneous quantum computers, in a way that minimizes the overall communication cost required to execute the distributed circuit. We consider two ways of communicating: cat-entanglement that creates linked copies of qubits across pairs of computers, and teleportation. The heterogeneous computers impose constraints on cat-entanglement and teleportation operations that can be chosen by an algorithm. We first focus on a special case that only allows cat-entanglements and not teleportations for communication. We provide a two-step heuristic for solving this specialized setting: (i) finding an assignment of qubits to computers using Tabu search, and (ii) using an iterative greedy algorithm designed for a constrained version of the set cover problem to determine cat-entanglement operations required to execute gates locally. For the general case, which allows both forms of communication, we propose two algorithms that subdivide the quantum circuit into several portions and apply the heuristic for the specialized setting on each portion. Teleportations are then used to stitch together the solutions for each portion. Finally, we simulate our algorithms on a wide range of randomly generated quantum networks and circuits, and study the properties of their results with respect to several varying parameters. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2205.07352 [pdf, other]

Long-term Control for Dialogue Generation: Methods and Evaluation

Authors: Ramya Ramakrishnan, Hashan Buddhika Narangodage, Mauro Schilman, Kilian Q. Weinberger, Ryan McDonald

Abstract: Current approaches for controlling dialogue response generation are primarily focused on high-level attributes like style, sentiment, or topic. In this work, we focus on constrained long-term dialogue generation, which involves more fine-grained control and requires a given set of control words to appear in generated responses. This setting requires a model to not only consider the generation of t… ▽ More Current approaches for controlling dialogue response generation are primarily focused on high-level attributes like style, sentiment, or topic. In this work, we focus on constrained long-term dialogue generation, which involves more fine-grained control and requires a given set of control words to appear in generated responses. This setting requires a model to not only consider the generation of these control words in the immediate context, but also produce utterances that will encourage the generation of the words at some time in the (possibly distant) future. We define the problem of constrained long-term control for dialogue generation, identify gaps in current methods for evaluation, and propose new metrics that better measure long-term control. We also propose a retrieval-augmented method that improves performance of long-term controlled generation via logit modification techniques. We show through experiments on three task-oriented dialogue datasets that our metrics better assess dialogue control relative to current alternatives and that our method outperforms state-of-the-art constrained generation baselines. △ Less

Submitted 15 May, 2022; originally announced May 2022.

arXiv:2205.04036 [pdf, other]

doi 10.1109/QCE53715.2022.00064

Pre-Distribution of Entanglements in Quantum Networks

Authors: Mohammad Ghaderibaneh, Himanshu Gupta, C. R. Ramakrishnan, Ertai Luo

Abstract: Quantum network communication is challenging, as the No-Cloning theorem in quantum regime makes many classical techniques inapplicable. For long-distance communication, the only viable approach is teleportation of quantum states, which requires a prior distribution of entangled pairs (EPs) of qubits. Establishment of EPs across remote nodes can incur significant latency due to the low probability… ▽ More Quantum network communication is challenging, as the No-Cloning theorem in quantum regime makes many classical techniques inapplicable. For long-distance communication, the only viable approach is teleportation of quantum states, which requires a prior distribution of entangled pairs (EPs) of qubits. Establishment of EPs across remote nodes can incur significant latency due to the low probability of success of the underlying physical processes. To reduce EP generation latency, prior works have looked at selection of efficient entanglement-routing paths and simultaneous use of multiple such paths for EP generation. In this paper, we propose and investigate a complementary technique to reduce EP generation latency--to pre-distribute EPs over certain (pre-determined) pairs of network nodes; these pre-distributed EPs can then be used to generate EPs for the requested pairs, when needed, with lower generation latency. For such an pre-distribution approach to be most effective, we need to address an optimization problem of selection of node-pairs where the EPs should be pre-distributed to minimize the generation latency of expected EP requests, under a given cost constraint. In this paper, we appropriately formulate the above optimization problem and design two efficient algorithms, one of which is a greedy approach based on an approximation algorithm for a special case. Via extensive evaluations over the NetSquid simulator, we demonstrate the effectiveness of our approach and developed techniques; we show that our developed algorithms outperform a naive approach by up to an order of magnitude. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 11 pages, 9 figures

arXiv:2203.05492 [pdf, other]

An Empirical Study of Low Precision Quantization for TinyML

Authors: Shaojie Zhuo, Hongyu Chen, Ramchalam Kinattinkara Ramakrishnan, Tommy Chen, Chen Feng, Yicheng Lin, Parker Zhang, Liang Shen

Abstract: Tiny machine learning (tinyML) has emerged during the past few years aiming to deploy machine learning models to embedded AI processors with highly constrained memory and computation capacity. Low precision quantization is an important model compression technique that can greatly reduce both memory consumption and computation cost of model inference. In this study, we focus on post-training quanti… ▽ More Tiny machine learning (tinyML) has emerged during the past few years aiming to deploy machine learning models to embedded AI processors with highly constrained memory and computation capacity. Low precision quantization is an important model compression technique that can greatly reduce both memory consumption and computation cost of model inference. In this study, we focus on post-training quantization (PTQ) algorithms that quantize a model to low-bit (less than 8-bit) precision with only a small set of calibration data and benchmark them on different tinyML use cases. To achieve a fair comparison, we build a simulated quantization framework to investigate recent PTQ algorithms. Furthermore, we break down those algorithms into essential components and re-assembled a generic PTQ pipeline. With ablation study on different alternatives of components in the pipeline, we reveal key design choices when performing low precision quantization. We hope this work could provide useful data points and shed lights on the future research of low precision quantization. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: tinyML Research Symposium 2022

arXiv:2202.03487 [pdf]

Targeted-BEHRT: Deep learning for observational causal inference on longitudinal electronic health records

Authors: Shishir Rao, Mohammad Mamouei, Gholamreza Salimi-Khorshidi, Yikuan Li, Rema Ramakrishnan, Abdelaali Hassaine, Dexter Canoy, Kazem Rahimi

Abstract: Observational causal inference is useful for decision making in medicine when randomized clinical trials (RCT) are infeasible or non generalizable. However, traditional approaches fail to deliver unconfounded causal conclusions in practice. The rise of "doubly robust" non-parametric tools coupled with the growth of deep learning for capturing rich representations of multimodal data, offers a uniqu… ▽ More Observational causal inference is useful for decision making in medicine when randomized clinical trials (RCT) are infeasible or non generalizable. However, traditional approaches fail to deliver unconfounded causal conclusions in practice. The rise of "doubly robust" non-parametric tools coupled with the growth of deep learning for capturing rich representations of multimodal data, offers a unique opportunity to develop and test such models for causal inference on comprehensive electronic health records (EHR). In this paper, we investigate causal modelling of an RCT-established null causal association: the effect of antihypertensive use on incident cancer risk. We develop a dataset for our observational study and a Transformer-based model, Targeted BEHRT coupled with doubly robust estimation, we estimate average risk ratio (RR). We compare our model to benchmark statistical and deep learning models for causal inference in multiple experiments on semi-synthetic derivations of our dataset with various types and intensities of confounding. In order to further test the reliability of our approach, we test our model on situations of limited data. We find that our model provides more accurate estimates of RR (least sum absolute error from ground truth) compared to benchmarks for risk ratio estimation on high-dimensional EHR across experiments. Finally, we apply our model to investigate the original case study: antihypertensives' effect on cancer and demonstrate that our model generally captures the validated null association. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2112.11002 [pdf, other]

doi 10.1109/TQE.2022.3168784

Efficient Quantum Network Communication using Optimized Entanglement-Swapping Trees

Authors: Mohammad Ghaderibaneh, Caitao Zhan, Himanshu Gupta, C. R. Ramakrishnan

Abstract: Quantum network communication is challenging, as the No-cloning theorem in quantum regime makes many classical techniques inapplicable. For long-distance communication, the only viable communication approach is teleportation of quantum states, which requires a prior distribution of entangled pairs (EPs) of qubits. Establishment of EPs across remote nodes can incur significant latency due to the lo… ▽ More Quantum network communication is challenging, as the No-cloning theorem in quantum regime makes many classical techniques inapplicable. For long-distance communication, the only viable communication approach is teleportation of quantum states, which requires a prior distribution of entangled pairs (EPs) of qubits. Establishment of EPs across remote nodes can incur significant latency due to the low probability of success of the underlying physical processes. The focus of our work is to develop efficient techniques that minimize EP generation latency. Prior works have focused on selecting entanglement paths; in contrast, we select entanglement swapping trees--a more accurate representation of the entanglement generation structure. We develop a dynamic programming algorithm to select an optimal swapping-tree for a single pair of nodes, under the given capacity and fidelity constraints. For the general setting, we develop an efficient iterative algorithm to compute a set of swapping trees. We present simulation results which show that our solutions outperform the prior approaches by an order of magnitude and are viable for long-distance entanglement generation. △ Less

Submitted 4 April, 2024; v1 submitted 21 December, 2021; originally announced December 2021.

arXiv:2103.15171 [pdf, other]

A Bayesian Approach to Identifying Representational Errors

Authors: Ramya Ramakrishnan, Vaibhav Unhelkar, Ece Kamar, Julie Shah

Abstract: Trained AI systems and expert decision makers can make errors that are often difficult to identify and understand. Determining the root cause for these errors can improve future decisions. This work presents Generative Error Model (GEM), a generative model for inferring representational errors based on observations of an actor's behavior (either simulated agent, robot, or human). The model conside… ▽ More Trained AI systems and expert decision makers can make errors that are often difficult to identify and understand. Determining the root cause for these errors can improve future decisions. This work presents Generative Error Model (GEM), a generative model for inferring representational errors based on observations of an actor's behavior (either simulated agent, robot, or human). The model considers two sources of error: those that occur due to representational limitations -- "blind spots" -- and non-representational errors, such as those caused by noise in execution or systematic errors present in the actor's policy. Disambiguating these two error types allows for targeted refinement of the actor's policy (i.e., representational errors require perceptual augmentation, while other errors can be reduced through methods such as improved training or attention support). We present a Bayesian inference algorithm for GEM and evaluate its utility in recovering representational errors on multiple domains. Results show that our approach can recover blind spots of both reinforcement learning agents as well as human users. △ Less

Submitted 28 March, 2021; originally announced March 2021.

arXiv:2101.11359 [pdf]

An explainable Transformer-based deep learning model for the prediction of incident heart failure

Authors: Shishir Rao, Yikuan Li, Rema Ramakrishnan, Abdelaali Hassaine, Dexter Canoy, John Cleland, Thomas Lukasiewicz, Gholamreza Salimi-Khorshidi, Kazem Rahimi

Abstract: Predicting the incidence of complex chronic conditions such as heart failure is challenging. Deep learning models applied to rich electronic health records may improve prediction but remain unexplainable hampering their wider use in medical practice. We developed a novel Transformer deep-learning model for more accurate and yet explainable prediction of incident heart failure involving 100,071 pat… ▽ More Predicting the incidence of complex chronic conditions such as heart failure is challenging. Deep learning models applied to rich electronic health records may improve prediction but remain unexplainable hampering their wider use in medical practice. We developed a novel Transformer deep-learning model for more accurate and yet explainable prediction of incident heart failure involving 100,071 patients from longitudinal linked electronic health records across the UK. On internal 5-fold cross validation and held-out external validation, our model achieved 0.93 and 0.93 area under the receiver operator curve and 0.69 and 0.70 area under the precision-recall curve, respectively and outperformed existing deep learning models. Predictor groups included all community and hospital diagnoses and medications contextualised within the age and calendar year for each patient's clinical encounter. The importance of contextualised medical information was revealed in a number of sensitivity analyses, and our perturbation method provided a way of identifying factors contributing to risk. Many of the identified risk factors were consistent with existing knowledge from clinical and epidemiological research but several new associations were revealed which had not been considered in expert-driven risk prediction models. △ Less

Submitted 27 January, 2021; originally announced January 2021.

arXiv:2006.05442 [pdf, other]

Tensor train decompositions on recurrent networks

Authors: Alejandro Murua, Ramchalam Ramakrishnan, Xinlin Li, Rui Heng Yang, Vahid Partovi Nia

Abstract: Recurrent neural networks (RNN) such as long-short-term memory (LSTM) networks are essential in a multitude of daily live tasks such as speech, language, video, and multimodal learning. The shift from cloud to edge computation intensifies the need to contain the growth of RNN parameters. Current research on RNN shows that despite the performance obtained on convolutional neural networks (CNN), kee… ▽ More Recurrent neural networks (RNN) such as long-short-term memory (LSTM) networks are essential in a multitude of daily live tasks such as speech, language, video, and multimodal learning. The shift from cloud to edge computation intensifies the need to contain the growth of RNN parameters. Current research on RNN shows that despite the performance obtained on convolutional neural networks (CNN), keeping a good performance in compressed RNNs is still a challenge. Most of the literature on compression focuses on CNNs using matrix product (MPO) operator tensor trains. However, matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference. We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:2003.10170 [pdf, other]

Deep Bayesian Gaussian Processes for Uncertainty Estimation in Electronic Health Records

Authors: Yikuan Li, Shishir Rao, Abdelaali Hassaine, Rema Ramakrishnan, Yajie Zhu, Dexter Canoy, Gholamreza Salimi-Khorshidi, Thomas Lukasiewicz, Kazem Rahimi

Abstract: One major impediment to the wider use of deep learning for clinical decision making is the difficulty of assigning a level of confidence to model predictions. Currently, deep Bayesian neural networks and sparse Gaussian processes are the main two scalable uncertainty estimation methods. However, deep Bayesian neural network suffers from lack of expressiveness, and more expressive models such as de… ▽ More One major impediment to the wider use of deep learning for clinical decision making is the difficulty of assigning a level of confidence to model predictions. Currently, deep Bayesian neural networks and sparse Gaussian processes are the main two scalable uncertainty estimation methods. However, deep Bayesian neural network suffers from lack of expressiveness, and more expressive models such as deep kernel learning, which is an extension of sparse Gaussian process, captures only the uncertainty from the higher level latent space. Therefore, the deep learning model under it lacks interpretability and ignores uncertainty from the raw data. In this paper, we merge features of the deep Bayesian learning framework with deep kernel learning to leverage the strengths of both methods for more comprehensive uncertainty estimation. Through a series of experiments on predicting the first incidence of heart failure, diabetes and depression applied to large-scale electronic medical records, we demonstrate that our method is better at capturing uncertainty than both Gaussian processes and deep Bayesian neural networks in terms of indicating data insufficiency and distinguishing true positive and false positive predictions, with a comparable generalisation performance. Furthermore, by assessing the accuracy and area under the receiver operating characteristic curve over the predictive probability, we show that our method is less susceptible to making overconfident predictions, especially for the minority class in imbalanced datasets. Finally, we demonstrate how uncertainty information derived by the model can inform risk factor analysis towards model interpretability. △ Less

Submitted 23 March, 2020; originally announced March 2020.

Comments: 21 pages

arXiv:1911.00231 [pdf, other]

Extending Relational Query Processing with ML Inference

Authors: Konstantinos Karanasos, Matteo Interlandi, Doris Xin, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Supun Nakandal, Subru Krishnan, Markus Weimer, Yuan Yu, Raghu Ramakrishnan, Carlo Curino

Abstract: The broadening adoption of machine learning in the enterprise is increasing the pressure for strict governance and cost-effective performance, in particular for the common and consequential steps of model storage and inference. The RDBMS provides a natural starting point, given its mature infrastructure for fast data access and processing, along with support for enterprise features (e.g., encrypti… ▽ More The broadening adoption of machine learning in the enterprise is increasing the pressure for strict governance and cost-effective performance, in particular for the common and consequential steps of model storage and inference. The RDBMS provides a natural starting point, given its mature infrastructure for fast data access and processing, along with support for enterprise features (e.g., encryption, auditing, high-availability). To take advantage of all of the above, we need to address a key concern: Can in-RDBMS scoring of ML models match (outperform?) the performance of dedicated frameworks? We answer the above positively by building Raven, a system that leverages native integration of ML runtimes (i.e., ONNX Runtime) deep within SQL Server, and a unified intermediate representation (IR) to enable advanced cross-optimizations between ML and DB operators. In this optimization space, we discover the most exciting research opportunities that combine DB/Compiler/ML thinking. Our initial evaluation on real data demonstrates performance gains of up to 5.5x from the native integration of ML in SQL Server, and up to 24x from cross-optimizations--we will demonstrate Raven live during the conference talk. △ Less

Submitted 1 November, 2019; originally announced November 2019.

arXiv:1909.08234 [pdf, other]

doi 10.4204/EPTCS.306.14

Value of Information in Probabilistic Logic Programs

Authors: Sarthak Ghosh, C. R. Ramakrishnan

Abstract: In medical decision making, we have to choose among several expensive diagnostic tests such that the certainty about a patient's health is maximized while remaining within the bounds of resources like time and money. The expected increase in certainty in the patient's condition due to performing a test is called the value of information (VoI) for that test. In general, VoI relates to acquiring add… ▽ More In medical decision making, we have to choose among several expensive diagnostic tests such that the certainty about a patient's health is maximized while remaining within the bounds of resources like time and money. The expected increase in certainty in the patient's condition due to performing a test is called the value of information (VoI) for that test. In general, VoI relates to acquiring additional information to improve decision-making based on probabilistic reasoning in an uncertain system. This paper presents a framework for acquiring information based on VoI in uncertain systems modeled as Probabilistic Logic Programs (PLPs). Optimal decision-making in uncertain systems modeled as PLPs have already been studied before. But, acquiring additional information to further improve the results of making the optimal decision has remained open in this context. We model decision-making in an uncertain system with a PLP and a set of top-level queries, with a set of utility measures over the distributions of these queries. The PLP is annotated with a set of atoms labeled as "observable"; in the medical diagnosis example, the observable atoms will be results of diagnostic tests. Each observable atom has an associated cost. This setting of optimally selecting observations based on VoI is more general than that considered by any prior work. Given a limited budget, optimally choosing observable atoms based on VoI is intractable in general. We give a greedy algorithm for constructing a "conditional plan" of observations: a schedule where the selection of what atom to observe next depends on earlier observations. We show that, preempting the algorithm anytime before completion provides a usable result, the result improves over time, and, in the absence of a well-defined budget, converges to the optimal solution. △ Less

Submitted 18 September, 2019; originally announced September 2019.

Comments: In Proceedings ICLP 2019, arXiv:1909.07646

ACM Class: I.2.4

Journal ref: EPTCS 306, 2019, pp. 71-84

arXiv:1909.04567 [pdf, other]

Differentiable Mask for Pruning Convolutional and Recurrent Networks

Authors: Ramchalam Kinattinkara Ramakrishnan, Eyyüb Sari, Vahid Partovi Nia

Abstract: Pruning is one of the most effective model reduction techniques. Deep networks require massive computation and such models need to be compressed to bring them on edge devices. Most existing pruning techniques are focused on vision-based models like convolutional networks, while text-based models are still evolving. The emergence of multi-modal multi-task learning calls for a general method that wo… ▽ More Pruning is one of the most effective model reduction techniques. Deep networks require massive computation and such models need to be compressed to bring them on edge devices. Most existing pruning techniques are focused on vision-based models like convolutional networks, while text-based models are still evolving. The emergence of multi-modal multi-task learning calls for a general method that works on vision and text architectures simultaneously. We introduce a \emph{differentiable mask}, that induces sparsity on various granularity to fill this gap. We apply our method successfully to prune weights, filters, subnetwork of a convolutional architecture, as well as nodes of a recurrent network. △ Less

Submitted 29 April, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

arXiv:1909.00084 [pdf, other]

Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML

Authors: Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Gowdal, Matteo Interlandi, Alekh Jindal, Kostantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu

Abstract: Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex… ▽ More Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex financial predictions, just to name a few. Meanwhile, as the value of data is increasingly recognized and monetized, concerns about securing valuable data and risks to individual privacy have been growing. Consequently, rigorous data management has emerged as a key requirement in enterprise settings. How will these trends (ML growing popularity, and stricter data governance) intersect? What are the unmet requirements for applying ML in enterprise settings? What are the technical challenges for the DB community to solve? In this paper, we present our vision of how ML and database systems are likely to come together, and early steps we take towards making this vision a reality. △ Less

Submitted 27 December, 2019; v1 submitted 30 August, 2019; originally announced September 2019.

arXiv:1904.00775 [pdf, other]

Deep Demosaicing for Edge Implementation

Authors: Ramchalam Kinattinkara Ramakrishnan, Shangling Jui, Vahid Patrovi Nia

Abstract: Most digital cameras use sensors coated with a Color Filter Array (CFA) to capture channel components at every pixel location, resulting in a mosaic image that does not contain pixel values in all channels. Current research on reconstructing these missing channels, also known as demosaicing, introduces many artifacts, such as zipper effect and false color. Many deep learning demosaicing techniques… ▽ More Most digital cameras use sensors coated with a Color Filter Array (CFA) to capture channel components at every pixel location, resulting in a mosaic image that does not contain pixel values in all channels. Current research on reconstructing these missing channels, also known as demosaicing, introduces many artifacts, such as zipper effect and false color. Many deep learning demosaicing techniques outperform other classical techniques in reducing the impact of artifacts. However, most of these models tend to be over-parametrized. Consequently, edge implementation of the state-of-the-art deep learning-based demosaicing algorithms on low-end edge devices is a major challenge. We provide an exhaustive search of deep neural network architectures and obtain a pareto front of Color Peak Signal to Noise Ratio (CPSNR) as the performance criterion versus the number of parameters as the model complexity that beats the state-of-the-art. Architectures on the pareto front can then be used to choose the best architecture for a variety of resource constraints. Simple architecture search methods such as exhaustive search and grid search require some conditions of the loss function to converge to the optimum. We clarify these conditions in a brief theoretical study. △ Less

Submitted 23 May, 2019; v1 submitted 26 March, 2019; originally announced April 2019.

Comments: Accepted in the 16th International Conference of Image Analysis and Recognition (ICIAR 2019)

arXiv:1806.07834 [pdf, other]

A Look at Motion Planning for Autonomous Vehicles at an Intersection

Authors: Shravan Krishnan, Govind Aadithya R, Rahul Ramakrishnan, Vijay Arvindh, Sivanathan K

Abstract: Autonomous Vehicles are currently being tested in a variety of scenarios. As we move towards Autonomous Vehicles, how should intersections look? To answer that question, we break down an intersection management into the different conundrums and scenarios involved in the trajectory planning and current approaches to solve them. Then, a brief analysis of current works in autonomous intersection is c… ▽ More Autonomous Vehicles are currently being tested in a variety of scenarios. As we move towards Autonomous Vehicles, how should intersections look? To answer that question, we break down an intersection management into the different conundrums and scenarios involved in the trajectory planning and current approaches to solve them. Then, a brief analysis of current works in autonomous intersection is conducted. With a critical eye, we try to delve into the discrepancies of existing solutions while presenting some critical and important factors that have been addressed. Furthermore, open issues that have to be addressed are also emphasized. We also try to answer the question of how to benchmark intersection management algorithms by providing some factors that impact autonomous navigation at intersection. △ Less

Submitted 7 September, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

Comments: Accepted for presentation at ITSC 2018, Final Version

arXiv:1805.08966 [pdf, other]

Discovering Blind Spots in Reinforcement Learning

Authors: Ramya Ramakrishnan, Ece Kamar, Debadeepta Dey, Julie Shah, Eric Horvitz

Abstract: Agents trained in simulation may make errors in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult to discover because the agent cannot predict them a priori. We propose using oracle feedback to learn a predictive model of these blind spots to reduce costly errors in real-world applications. We focus on blind spots in reinfor… ▽ More Agents trained in simulation may make errors in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult to discover because the agent cannot predict them a priori. We propose using oracle feedback to learn a predictive model of these blind spots to reduce costly errors in real-world applications. We focus on blind spots in reinforcement learning (RL) that occur due to incomplete state representation: The agent does not have the appropriate features to represent the true state of the world and thus cannot distinguish among numerous states. We formalize the problem of discovering blind spots in RL as a noisy supervised learning problem with class imbalance. We learn models to predict blind spots in unseen regions of the state space by combining techniques for label aggregation, calibration, and supervised learning. The models take into consideration noise emerging from different forms of oracle feedback, including demonstrations and corrections. We evaluate our approach on two domains and show that it achieves higher predictive performance than baseline methods, and that the learned model can be used to selectively query an oracle at execution time to prevent errors. We also empirically analyze the biases of various feedback types and how they influence the discovery of blind spots. △ Less

Submitted 23 May, 2018; originally announced May 2018.

Comments: To appear at AAMAS 2018

arXiv:1804.10237 [pdf, ps, other]

Constraint-Based Inference in Probabilistic Logic Programs

Authors: Arun Nampally, Timothy Zhang, C. R. Ramakrishnan

Abstract: Probabilistic Logic Programs (PLPs) generalize traditional logic programs and allow the encoding of models combining logical structure and uncertainty. In PLP, inference is performed by summarizing the possible worlds which entail the query in a suitable data structure, and using it to compute the answer probability. Systems such as ProbLog, PITA, etc., use propositional data structures like expla… ▽ More Probabilistic Logic Programs (PLPs) generalize traditional logic programs and allow the encoding of models combining logical structure and uncertainty. In PLP, inference is performed by summarizing the possible worlds which entail the query in a suitable data structure, and using it to compute the answer probability. Systems such as ProbLog, PITA, etc., use propositional data structures like explanation graphs, BDDs, SDDs, etc., to represent the possible worlds. While this approach saves inference time due to substructure sharing, there are a number of problems where a more compact data structure is possible. We propose a data structure called Ordered Symbolic Derivation Diagram (OSDD) which captures the possible worlds by means of constraint formulas. We describe a program transformation technique to construct OSDDs via query evaluation, and give procedures to perform exact and approximate inference over OSDDs. Our approach has two key properties. Firstly, the exact inference procedure is a generalization of traditional inference, and results in speedup over the latter in certain settings. Secondly, the approximate technique is a generalization of likelihood weighting in Bayesian Networks, and allows us to perform sampling-based inference with lower rejection rate and variance. We evaluate the effectiveness of the proposed techniques through experiments on several problems. This paper is under consideration for acceptance in TPLP. △ Less

Submitted 26 April, 2018; originally announced April 2018.

Comments: Paper presented at the 34nd International Conference on Logic Programming (ICLP 2018), Oxford, UK, July 14 to July 17, 2018 18 pages, LaTeX, 5 PDF figures (arXiv:YYMM.NNNNN)

arXiv:1612.03505 [pdf, ps, other]

doi 10.1109/ICASSP.2017.7952638

Convolutional Neural Networks for Passive Monitoring of a Shallow Water Environment using a Single Sensor

Authors: Eric L. Ferguson, Rishi Ramakrishnan, Stefan B. Williams, Craig T. Jin

Abstract: A cost effective approach to remote monitoring of protected areas such as marine reserves and restricted naval waters is to use passive sonar to detect, classify, localize, and track marine vessel activity (including small boats and autonomous underwater vehicles). Cepstral analysis of underwater acoustic data enables the time delay between the direct path arrival and the first multipath arrival t… ▽ More A cost effective approach to remote monitoring of protected areas such as marine reserves and restricted naval waters is to use passive sonar to detect, classify, localize, and track marine vessel activity (including small boats and autonomous underwater vehicles). Cepstral analysis of underwater acoustic data enables the time delay between the direct path arrival and the first multipath arrival to be measured, which in turn enables estimation of the instantaneous range of the source (a small boat). However, this conventional method is limited to ranges where the Lloyd's mirror effect (interference pattern formed between the direct and first multipath arrivals) is discernible. This paper proposes the use of convolutional neural networks (CNNs) for the joint detection and ranging of broadband acoustic noise sources such as marine vessels in conjunction with a data augmentation approach for improving network performance in varied signal-to-noise ratio (SNR) situations. Performance is compared with a conventional passive sonar ranging method for monitoring marine vessel activity using real data from a single hydrophone mounted above the sea floor. It is shown that CNNs operating on cepstrum data are able to detect the presence and estimate the range of transiting vessels at greater distances than the conventional method. △ Less

Submitted 11 December, 2016; originally announced December 2016.

Comments: Final draft for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017. 5 pages, 4 figures

arXiv:1611.09007 [pdf, other]

Hyperspectral CNN Classification with Limited Training Samples

Authors: Lloyd Windrim, Rishi Ramakrishnan, Arman Melkumyan, Richard Murphy

Abstract: Hyperspectral imaging sensors are becoming increasingly popular in robotics applications such as agriculture and mining, and allow per-pixel thematic classification of materials in a scene based on their unique spectral signatures. Recently, convolutional neural networks have shown remarkable performance for classification tasks, but require substantial amounts of labelled training data. This data… ▽ More Hyperspectral imaging sensors are becoming increasingly popular in robotics applications such as agriculture and mining, and allow per-pixel thematic classification of materials in a scene based on their unique spectral signatures. Recently, convolutional neural networks have shown remarkable performance for classification tasks, but require substantial amounts of labelled training data. This data must sufficiently cover the variability expected to be encountered in the environment. For hyperspectral data, one of the main variations encountered outdoors is due to incident illumination, which can change in spectral shape and intensity depending on the scene geometry. For example, regions occluded from the sun have a lower intensity and their incident irradiance skewed towards shorter wavelengths. In this work, a data augmentation strategy based on relighting is used during training of a hyperspectral convolutional neural network. It allows training to occur in the outdoor environment given only a small labelled region, which does not need to sufficiently represent the geometric variability of the entire scene. This is important for applications where obtaining large amounts of training data is labourious, hazardous or difficult, such as labelling pixels within shadows. Radiometric normalisation approaches for pre-processing the hyperspectral data are analysed and it is shown that methods based on the raw pixel data are sufficient to be used as input for the classifier. This removes the need for external hardware such as calibration boards, which can restrict the application of hyperspectral sensors in robotics applications. Experiments to evaluate the classification system are carried out on two datasets captured from a field-based platform. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: 10 pages, 6 figures

arXiv:1608.05763 [pdf, ps, other]

Inference in Probabilistic Logic Programs using Lifted Explanations

Authors: Arun Nampally, C. R. Ramakrishnan

Abstract: In this paper, we consider the problem of lifted inference in the context of Prism-like probabilistic logic programming languages. Traditional inference in such languages involves the construction of an explanation graph for the query and computing probabilities over this graph. When evaluating queries over probabilistic logic programs with a large number of instances of random variables, traditio… ▽ More In this paper, we consider the problem of lifted inference in the context of Prism-like probabilistic logic programming languages. Traditional inference in such languages involves the construction of an explanation graph for the query and computing probabilities over this graph. When evaluating queries over probabilistic logic programs with a large number of instances of random variables, traditional methods treat each instance separately. For many programs and queries, we observe that explanations can be summarized into substantially more compact structures, which we call lifted explanation graphs. In this paper, we define lifted explanation graphs and operations over them. In contrast to existing lifted inference techniques, our method for constructing lifted explanations naturally generalizes existing methods for constructing explanation graphs. To compute probability of query answers, we solve recurrences generated from the lifted graphs. We show examples where the use of our technique reduces the asymptotic complexity of inference. △ Less

Submitted 19 August, 2016; originally announced August 2016.

arXiv:1604.06118 [pdf, other]

XPL: An extended probabilistic logic for probabilistic transition systems

Authors: Andrey Gorlin, C. R. Ramakrishnan

Abstract: Generalized Probabilistic Logic (GPL) is a temporal logic, based on the modal mu-calculus, for specifying properties of reactive probabilistic systems. We explore XPL, an extension to GPL allowing the semantics of nondeterminism present in Markov decision processes (MDPs). XPL is expressive enough that a number of independently studied problems--- such as termination of Recursive MDPs (RMDPs), PCT… ▽ More Generalized Probabilistic Logic (GPL) is a temporal logic, based on the modal mu-calculus, for specifying properties of reactive probabilistic systems. We explore XPL, an extension to GPL allowing the semantics of nondeterminism present in Markov decision processes (MDPs). XPL is expressive enough that a number of independently studied problems--- such as termination of Recursive MDPs (RMDPs), PCTL* model checking of MDPs, and reachability for Branching MDPs--- can all be cast as model checking over XPL. Termination of multi-exit RMDPs is undecidable; thus, model checking in XPL is undecidable in general. We define a subclass, called separable XPL, for which model checking is decidable. Decidable problems such as termination of 1-exit RMDPs, PCTL* model checking of MDPs, and reachability for Branching MDPs can be reduced to model checking separable XPL. Thus, XPL forms a uniform framework for studying problems involving systems with non-deterministic and probabilistic behaviors, while separable XPL provides a way to solve decidable fragments of these problems. △ Less

Submitted 9 May, 2017; v1 submitted 20 April, 2016; originally announced April 2016.

ACM Class: I.2.3

arXiv:1509.08439 [pdf, other]

Hyper-Fisher Vectors for Action Recognition

Authors: Sanath Narayan, Kalpathi R. Ramakrishnan

Abstract: In this paper, a novel encoding scheme combining Fisher vector and bag-of-words encodings has been proposed for recognizing action in videos. The proposed Hyper-Fisher vector encoding is sum of local Fisher vectors which are computed based on the traditional Bag-of-Words (BoW) encoding. Thus, the proposed encoding is simple and yet an effective representation over the traditional Fisher Vector enc… ▽ More In this paper, a novel encoding scheme combining Fisher vector and bag-of-words encodings has been proposed for recognizing action in videos. The proposed Hyper-Fisher vector encoding is sum of local Fisher vectors which are computed based on the traditional Bag-of-Words (BoW) encoding. Thus, the proposed encoding is simple and yet an effective representation over the traditional Fisher Vector encoding. By extensive evaluation on challenging action recognition datasets, viz., Youtube, Olympic Sports, UCF50 and HMDB51, we show that the proposed Hyper-Fisher Vector encoding improves the recognition performance by around 2-3% compared to the improved Fisher Vector encoding. We also perform experiments to show that the performance of the Hyper-Fisher Vector is robust to the dictionary size of the BoW encoding. △ Less

Submitted 28 September, 2015; originally announced September 2015.

arXiv:1501.03879 [pdf, other]

A new ADMM algorithm for the Euclidean median and its application to robust patch regression

Authors: Kunal N. Chaudhury, K. R. Ramakrishnan

Abstract: The Euclidean Median (EM) of a set of points $Ω$ in an Euclidean space is the point x minimizing the (weighted) sum of the Euclidean distances of x to the points in $Ω$. While there exits no closed-form expression for the EM, it can nevertheless be computed using iterative methods such as the Wieszfeld algorithm. The EM has classically been used as a robust estimator of centrality for multivariate… ▽ More The Euclidean Median (EM) of a set of points $Ω$ in an Euclidean space is the point x minimizing the (weighted) sum of the Euclidean distances of x to the points in $Ω$. While there exits no closed-form expression for the EM, it can nevertheless be computed using iterative methods such as the Wieszfeld algorithm. The EM has classically been used as a robust estimator of centrality for multivariate data. It was recently demonstrated that the EM can be used to perform robust patch-based denoising of images by generalizing the popular Non-Local Means algorithm. In this paper, we propose a novel algorithm for computing the EM (and its box-constrained counterpart) using variable splitting and the method of augmented Lagrangian. The attractive feature of this approach is that the subproblems involved in the ADMM-based optimization of the augmented Lagrangian can be resolved using simple closed-form projections. The proposed ADMM solver is used for robust patch-based image denoising and is shown to exhibit faster convergence compared to an existing solver. △ Less

Submitted 16 January, 2015; originally announced January 2015.

Comments: 5 pages, 3 figures, 1 table. To appear in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, April 19-24, 2015

arXiv:1405.6341 [pdf, other]

doi 10.1145/2696454.2696455

Efficient Model Learning for Human-Robot Collaborative Tasks

Authors: Stefanos Nikolaidis, Keren Gu, Ramya Ramakrishnan, Julie Shah

Abstract: We present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These… ▽ More We present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm. The learned model is then used as part of a Mixed Observability Markov Decision Process formulation, wherein the human type is a partially observable variable. With this framework, we can infer, either offline or online, the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this new user and will be robust to deviations of the human actions from prior demonstrations. Finally we validate the approach using data collected in human subject experiments, and conduct proof-of-concept demonstrations in which a person performs a collaborative task with a small industrial robot. △ Less

Submitted 24 May, 2014; originally announced May 2014.

ACM Class: I.2.6; I.2.8; I.2.9

Journal ref: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI 2015)

arXiv:1403.6036 [pdf, other]

Adaptive MCMC-Based Inference in Probabilistic Logic Programs

Authors: Arun Nampally, C. R. Ramakrishnan

Abstract: Probabilistic Logic Programming (PLP) languages enable programmers to specify systems that combine logical models with statistical knowledge. The inference problem, to determine the probability of query answers in PLP, is intractable in general, thereby motivating the need for approximate techniques. In this paper, we present a technique for approximate inference of conditional probabilities for P… ▽ More Probabilistic Logic Programming (PLP) languages enable programmers to specify systems that combine logical models with statistical knowledge. The inference problem, to determine the probability of query answers in PLP, is intractable in general, thereby motivating the need for approximate techniques. In this paper, we present a technique for approximate inference of conditional probabilities for PLP queries. It is an Adaptive Markov Chain Monte Carlo (MCMC) technique, where the distribution from which samples are drawn is modified as the Markov Chain is explored. In particular, the distribution is progressively modified to increase the likelihood that a generated sample is consistent with evidence. In our context, each sample is uniquely characterized by the outcomes of a set of random variables. Inspired by reinforcement learning, our technique propagates rewards to random variable/outcome pairs used in a sample based on whether the sample was consistent or not. The cumulative rewards of each outcome is used to derive a new "adapted distribution" for each random variable. For a sequence of samples, the distributions are progressively adapted after each sample. For a query with "Markovian evaluation structure", we show that the adapted distribution of samples converges to the query's conditional probability distribution. For Markovian queries, we present a modified adaptation process that can be used in adaptive MCMC as well as adaptive independent sampling. We empirically evaluate the effectiveness of the adaptive sampling methods for queries with and without Markovian evaluation structure. △ Less

Submitted 24 March, 2014; originally announced March 2014.

arXiv:1303.3517 [pdf, other]

Iterative MapReduce for Large Scale Machine Learning

Authors: Joshua Rosen, Neoklis Polyzotis, Vinayak Borkar, Yingyi Bu, Michael J. Carey, Markus Weimer, Tyson Condie, Raghu Ramakrishnan

Abstract: Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one of the foundational disciplines for data analysis, summarization and inference - on Big Data has become routine at most organizations that operate large clouds,… ▽ More Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one of the foundational disciplines for data analysis, summarization and inference - on Big Data has become routine at most organizations that operate large clouds, usually based on systems such as Hadoop that support the MapReduce programming paradigm. It is now widely recognized that while MapReduce is highly scalable, it suffers from a critical weakness for machine learning: it does not support iteration. Consequently, one has to program around this limitation, leading to fragile, inefficient code. Further, reliance on the programmer is inherently flawed in a multi-tenanted cloud environment, since the programmer does not have visibility into the state of the system when his or her program executes. Prior work has sought to address this problem by either developing specialized systems aimed at stylized applications, or by augmenting MapReduce with ad hoc support for saving state across iterations (driven by an external loop). In this paper, we advocate support for looping as a first-class construct, and propose an extension of the MapReduce programming paradigm called {\em Iterative MapReduce}. We then develop an optimizer for a class of Iterative MapReduce programs that cover most machine learning techniques, provide theoretical justifications for the key optimization steps, and empirically demonstrate that system-optimized programs for significant machine learning tasks are competitive with state-of-the-art specialized solutions. △ Less

Submitted 13 March, 2013; originally announced March 2013.

arXiv:1204.4736 [pdf, other]

Model Checking with Probabilistic Tabled Logic Programming

Authors: Andrey Gorlin, C. R. Ramakrishnan, Scott A. Smolka

Abstract: We present a formulation of the problem of probabilistic model checking as one of query evaluation over probabilistic logic programs. To the best of our knowledge, our formulation is the first of its kind, and it covers a rich class of probabilistic models and probabilistic temporal logics. The inference algorithms of existing probabilistic logic-programming systems are well defined only for queri… ▽ More We present a formulation of the problem of probabilistic model checking as one of query evaluation over probabilistic logic programs. To the best of our knowledge, our formulation is the first of its kind, and it covers a rich class of probabilistic models and probabilistic temporal logics. The inference algorithms of existing probabilistic logic-programming systems are well defined only for queries with a finite number of explanations. This restriction prohibits the encoding of probabilistic model checkers, where explanations correspond to executions of the system being model checked. To overcome this restriction, we propose a more general inference algorithm that uses finite generative structures (similar to automata) to represent families of explanations. The inference algorithm computes the probability of a possibly infinite set of explanations directly from the finite generative structure. We have implemented our inference algorithm in XSB Prolog, and use this implementation to encode probabilistic model checkers for a variety of temporal logics, including PCTL and GPL (which subsumes PCTL*). Our experiment results show that, despite the highly declarative nature of their encodings, the model checkers constructed in this manner are competitive with their native implementations. △ Less

Submitted 20 April, 2012; originally announced April 2012.

Showing 1–50 of 53 results for author: Ramakrishnan, R