-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Authors:
Gheorghe Comanici,
Eric Bieber,
Mike Schaekermann,
Ice Pasupat,
Noveen Sachdeva,
Inderjit Dhillon,
Marcel Blistein,
Ori Ram,
Dan Zhang,
Evan Rosen,
Luke Marris,
Sam Petulla,
Colin Gaffney,
Asaf Aharoni,
Nathan Lintz,
Tiago Cardal Pais,
Henrik Jacobsson,
Idan Szpektor,
Nan-Jiang Jiang,
Krishna Haridasan,
Ahmed Omran,
Nikunj Saunshi,
Dara Bahri,
Gaurav Mishra,
Eric Chu
, et al. (3264 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde…
▽ More
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
△ Less
Submitted 11 July, 2025; v1 submitted 7 July, 2025;
originally announced July 2025.
-
iRonCub 3: The Jet-Powered Flying Humanoid Robot
Authors:
Davide Gorbani,
Hosameldin Awadalla Omer Mohamed,
Giuseppe L'Erario,
Gabriele Nava,
Punith Reddy Vanteddu,
Shabarish Purushothaman Pillai,
Antonello Paolino,
Fabio Bergonti,
Saverio Taliani,
Alessandro Croci,
Nicholas James Tremaroli,
Silvio Traversaro,
Bruno Vittorio Trombetta,
Daniele Pucci
Abstract:
This article presents iRonCub 3, a jet-powered humanoid robot, and its first flight experiments. Unlike traditional aerial vehicles, iRonCub 3 aims to achieve flight using a full-body humanoid form, which poses unique challenges in control, estimation, and system integration. We highlight the robot's current mechanical and software architecture, including its propulsion system, control framework,…
▽ More
This article presents iRonCub 3, a jet-powered humanoid robot, and its first flight experiments. Unlike traditional aerial vehicles, iRonCub 3 aims to achieve flight using a full-body humanoid form, which poses unique challenges in control, estimation, and system integration. We highlight the robot's current mechanical and software architecture, including its propulsion system, control framework, and experimental infrastructure. The control and estimation framework is first validated in simulation by performing a takeoff and tracking a reference trajectory. Then, we demonstrate, for the first time, a liftoff of a jet-powered humanoid robot - an initial but significant step toward aerial humanoid mobility. Also, we detail how the experimental area around a jet-powered humanoid robot should be designed in order to deal with a level of complexity that is substantially superior than indoor humanoid robot experiments.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Replay to Remember: Retaining Domain Knowledge in Streaming Language Models
Authors:
Sneh Pillai
Abstract:
Continual learning in large language models (LLMs) typically encounters the critical challenge of catastrophic forgetting, where previously acquired knowledge deteriorates upon exposure to new data. While techniques like replay buffers and parameter-efficient tuning (e.g., Low-Rank Adaptation or LoRA) have been proposed, few studies investigate real-time domain adaptation under strict computationa…
▽ More
Continual learning in large language models (LLMs) typically encounters the critical challenge of catastrophic forgetting, where previously acquired knowledge deteriorates upon exposure to new data. While techniques like replay buffers and parameter-efficient tuning (e.g., Low-Rank Adaptation or LoRA) have been proposed, few studies investigate real-time domain adaptation under strict computational and data-stream constraints. In this paper, we demonstrate a lightweight method combining LoRA and a minimal replay mechanism in a realistic streaming setting across three diverse knowledge domains: medical question answering, genetics, and law. Using perplexity, semantic similarity, and GPT-based human-like evaluation metrics, we quantify the model's adaptation, forgetting, and recovery over time. Our experiments reveal that while catastrophic forgetting naturally occurs, even minimal replay significantly stabilizes and partially restores domain-specific knowledge. This study contributes practical insights for deploying adaptable LLMs in resource-constrained, real-world scenarios.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Graph-Augmented LSTM for Forecasting Sparse Anomalies in Graph-Structured Time Series
Authors:
Sneh Pillai
Abstract:
Detecting anomalies in time series data is a critical task across many domains. The challenge intensifies when anomalies are sparse and the data are multivariate with relational dependencies across sensors or nodes. Traditional univariate anomaly detectors struggle to capture such cross-node dependencies, particularly in sparse anomaly settings. To address this, we propose a graph-augmented time s…
▽ More
Detecting anomalies in time series data is a critical task across many domains. The challenge intensifies when anomalies are sparse and the data are multivariate with relational dependencies across sensors or nodes. Traditional univariate anomaly detectors struggle to capture such cross-node dependencies, particularly in sparse anomaly settings. To address this, we propose a graph-augmented time series forecasting approach that explicitly integrates the graph of relationships among time series into an LSTM forecasting model. This enables the model to detect rare anomalies that might otherwise go unnoticed in purely univariate approaches. We evaluate the approach on two benchmark datasets - the Yahoo Webscope S5 anomaly dataset and the METR-LA traffic sensor network - and compare the performance of the Graph-Augmented LSTM against LSTM-only, ARIMA, and Prophet baselines. Results demonstrate that the graph-augmented model achieves significantly higher precision and recall, improving F1-score by up to 10% over the best baseline
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings
Authors:
Sneh Pillai
Abstract:
Training vision-language models for image-text alignment typically requires large datasets to achieve robust performance. In low-data scenarios, standard contrastive learning can struggle to align modalities effectively due to overfitting and unstable training dynamics. In this paper, we propose a variance-aware loss scheduling approach that dynamically adjusts the weighting of the contrastive los…
▽ More
Training vision-language models for image-text alignment typically requires large datasets to achieve robust performance. In low-data scenarios, standard contrastive learning can struggle to align modalities effectively due to overfitting and unstable training dynamics. In this paper, we propose a variance-aware loss scheduling approach that dynamically adjusts the weighting of the contrastive loss based on the statistical variability (uncertainty) in the model's alignment predictions. Using a subset of the Flickr8k image-caption dataset to simulate limited data conditions, we demonstrate that our approach improves image-text retrieval accuracy compared to a fixed-weight baseline. We also compare against other adaptive weighting strategies (using output entropy and cosine similarity spread) and find that variance-aware scheduling provides the best overall trade-off. Qualitatively, our method yields more distinct multimodal embeddings as shown by t-SNE visualizations. Moreover, in a stress test with noise-injected captions and images, the variance-guided loss proves more robust, maintaining higher recall when random perturbations are introduced. These results highlight the benefit of adaptive loss weighting for multimodal alignment in low-data regimes.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Privacy-Preserving Race/Ethnicity Estimation for Algorithmic Bias Measurement in the U.S
Authors:
Saikrishna Badrinarayanan,
Osonde Osoba,
Miao Cheng,
Ryan Rogers,
Sakshi Jain,
Rahul Tandra,
Natesh S. Pillai
Abstract:
AI fairness measurements, including tests for equal treatment, often take the form of disaggregated evaluations of AI systems. Such measurements are an important part of Responsible AI operations. These measurements compare system performance across demographic groups or sub-populations and typically require member-level demographic signals such as gender, race, ethnicity, and location. However, s…
▽ More
AI fairness measurements, including tests for equal treatment, often take the form of disaggregated evaluations of AI systems. Such measurements are an important part of Responsible AI operations. These measurements compare system performance across demographic groups or sub-populations and typically require member-level demographic signals such as gender, race, ethnicity, and location. However, sensitive member-level demographic attributes like race and ethnicity can be challenging to obtain and use due to platform choices, legal constraints, and cultural norms. In this paper, we focus on the task of enabling AI fairness measurements on race/ethnicity for \emph{U.S. LinkedIn members} in a privacy-preserving manner. We present the Privacy-Preserving Probabilistic Race/Ethnicity Estimation (PPRE) method for performing this task. PPRE combines the Bayesian Improved Surname Geocoding (BISG) model, a sparse LinkedIn survey sample of self-reported demographics, and privacy-enhancing technologies like secure two-party computation and differential privacy to enable meaningful fairness measurements while preserving member privacy. We provide details of the PPRE method and its privacy guarantees. We then illustrate sample measurement operations. We conclude with a review of open research and engineering challenges for expanding our privacy-preserving fairness measurement capabilities.
△ Less
Submitted 16 September, 2024; v1 submitted 6 September, 2024;
originally announced September 2024.
-
Policy Gradients for Optimal Parallel Tempering MCMC
Authors:
Daniel Zhao,
Natesh S. Pillai
Abstract:
Parallel tempering is a meta-algorithm for Markov Chain Monte Carlo that uses multiple chains to sample from tempered versions of the target distribution, enhancing mixing in multi-modal distributions that are challenging for traditional methods. The effectiveness of parallel tempering is heavily influenced by the selection of chain temperatures. Here, we present an adaptive temperature selection…
▽ More
Parallel tempering is a meta-algorithm for Markov Chain Monte Carlo that uses multiple chains to sample from tempered versions of the target distribution, enhancing mixing in multi-modal distributions that are challenging for traditional methods. The effectiveness of parallel tempering is heavily influenced by the selection of chain temperatures. Here, we present an adaptive temperature selection algorithm that dynamically adjusts temperatures during sampling using a policy gradient approach. Experiments demonstrate that our method can achieve lower integrated autocorrelation times compared to traditional geometrically spaced temperatures and uniform acceptance rate schemes on benchmark distributions.
△ Less
Submitted 26 December, 2024; v1 submitted 2 September, 2024;
originally announced September 2024.
-
GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
Authors:
Manu S Pillai,
Mamshad Nayeem Rizve,
Mubarak Shah
Abstract:
Cross-view video geo-localization (CVGL) aims to derive GPS trajectories from street-view videos by aligning them with aerial-view images. Despite their promising performance, current CVGL methods face significant challenges. These methods use camera and odometry data, typically absent in real-world scenarios. They utilize multiple adjacent frames and various encoders for feature extraction, resul…
▽ More
Cross-view video geo-localization (CVGL) aims to derive GPS trajectories from street-view videos by aligning them with aerial-view images. Despite their promising performance, current CVGL methods face significant challenges. These methods use camera and odometry data, typically absent in real-world scenarios. They utilize multiple adjacent frames and various encoders for feature extraction, resulting in high computational costs. Moreover, these approaches independently predict each street-view frame's location, resulting in temporally inconsistent GPS trajectories. To address these challenges, in this work, we propose GAReT, a fully transformer-based method for CVGL that does not require camera and odometry data. We introduce GeoAdapter, a transformer-adapter module designed to efficiently aggregate image-level representations and adapt them for video inputs. Specifically, we train a transformer encoder on video frames and aerial images, then freeze the encoder to optimize the GeoAdapter module to obtain video-level representation. To address temporally inconsistent trajectories, we introduce TransRetriever, an encoder-decoder transformer model that predicts GPS locations of street-view frames by encoding top-k nearest neighbor predictions per frame and auto-regressively decoding the best neighbor based on the previous frame's predictions. Our method's effectiveness is validated through extensive experiments, demonstrating state-of-the-art performance on benchmark datasets. Our code is available at https://github.com/manupillai308/GAReT.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1112 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 16 December, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Mobile Health Text Misinformation Identification Using Mobile Data Mining
Authors:
Wen-Chen Hu,
Sanjaikanth E Vadakkethil Somanathan Pillai,
Abdelrahman Ahmed ElSaid
Abstract:
More than six million people died of the COVID-19 by April 2022. The heavy casualties have put people on great and urgent alert and people try to find all kinds of information to keep them from being inflected by the coronavirus. This research tries to find out whether the mobile health text information sent to peoples devices is correct as smartphones becoming the major information source for peo…
▽ More
More than six million people died of the COVID-19 by April 2022. The heavy casualties have put people on great and urgent alert and people try to find all kinds of information to keep them from being inflected by the coronavirus. This research tries to find out whether the mobile health text information sent to peoples devices is correct as smartphones becoming the major information source for people. The proposed method uses various mobile information retrieval and data mining technologies including lexical analysis, stopword elimination, stemming, and decision trees to classify the mobile health text information to one of the following classes: (i) true, (ii) fake, (iii) misinformative, (iv) disinformative, and (v) neutral. Experiment results show the accuracy of the proposed method is above the threshold value 50 percentage, but is not optimal. It is because the problem, mobile text misinformation identification, is intrinsically difficult.
△ Less
Submitted 5 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1326 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 9 May, 2025; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Towards Understanding the Dynamics of Gaussian-Stein Variational Gradient Descent
Authors:
Tianle Liu,
Promit Ghosal,
Krishnakumar Balasubramanian,
Natesh S. Pillai
Abstract:
Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. Despite its wide usage, understanding the theoretical properties of SVGD has remained a challenging problem. For sampling from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain Gaussian as long as the initializer is Gaussian. Inspired by this fact, we undertake a deta…
▽ More
Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. Despite its wide usage, understanding the theoretical properties of SVGD has remained a challenging problem. For sampling from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain Gaussian as long as the initializer is Gaussian. Inspired by this fact, we undertake a detailed theoretical study of the Gaussian-SVGD, i.e., SVGD projected to the family of Gaussian distributions via the bilinear kernel, or equivalently Gaussian variational inference (GVI) with SVGD. We present a complete picture by considering both the mean-field PDE and discrete particle systems. When the target is strongly log-concave, the mean-field Gaussian-SVGD dynamics is proven to converge linearly to the Gaussian distribution closest to the target in KL divergence. In the finite-particle setting, there is both uniform in time convergence to the mean-field limit and linear convergence in time to the equilibrium if the target is Gaussian. In the general case, we propose a density-based and a particle-based implementation of the Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from different perspectives, emerge as special cases of our unified framework. Interestingly, one of the new particle-based instance from this framework empirically outperforms existing approaches. Our results make concrete contributions towards obtaining a deeper understanding of both SVGD and GVI.
△ Less
Submitted 27 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Detecting Fake Job Postings Using Bidirectional LSTM
Authors:
Aravind Sasidharan Pillai
Abstract:
Fake job postings have become prevalent in the online job market, posing significant challenges to job seekers and employers. Despite the growing need to address this problem, there is limited research that leverages deep learning techniques for the detection of fraudulent job advertisements. This study aims to fill the gap by employing a Bidirectional Long Short-Term Memory (Bi-LSTM) model to ide…
▽ More
Fake job postings have become prevalent in the online job market, posing significant challenges to job seekers and employers. Despite the growing need to address this problem, there is limited research that leverages deep learning techniques for the detection of fraudulent job advertisements. This study aims to fill the gap by employing a Bidirectional Long Short-Term Memory (Bi-LSTM) model to identify fake job advertisements. Our approach considers both numeric and text features, effectively capturing the underlying patterns and relationships within the data. The proposed model demonstrates a superior performance, achieving a 0.91 ROC AUC score and a 98.71% accuracy rate, indicating its potential for practical applications in the online job market. The findings of this research contribute to the development of robust, automated tools that can help combat the proliferation of fake job postings and improve the overall integrity of the job search process. Moreover, we discuss challenges, future research directions, and ethical considerations related to our approach, aiming to inspire further exploration and development of practical solutions to combat online job fraud.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Multi-Label Chest X-Ray Classification via Deep Learning
Authors:
Aravind Sasidharan Pillai
Abstract:
In this era of pandemic, the future of healthcare industry has never been more exciting. Artificial intelligence and machine learning (AI & ML) present opportunities to develop solutions that cater for very specific needs within the industry. Deep learning in healthcare had become incredibly powerful for supporting clinics and in transforming patient care in general. Deep learning is increasingly…
▽ More
In this era of pandemic, the future of healthcare industry has never been more exciting. Artificial intelligence and machine learning (AI & ML) present opportunities to develop solutions that cater for very specific needs within the industry. Deep learning in healthcare had become incredibly powerful for supporting clinics and in transforming patient care in general. Deep learning is increasingly being applied for the detection of clinically important features in the images beyond what can be perceived by the naked human eye. Chest X-ray images are one of the most common clinical method for diagnosing a number of diseases such as pneumonia, lung cancer and many other abnormalities like lesions and fractures. Proper diagnosis of a disease from X-ray images is often challenging task for even expert radiologists and there is a growing need for computerized support systems due to the large amount of information encoded in X-Ray images. The goal of this paper is to develop a lightweight solution to detect 14 different chest conditions from an X ray image. Given an X-ray image as input, our classifier outputs a label vector indicating which of 14 disease classes does the image fall into. Along with the image features, we are also going to use non-image features available in the data such as X-ray view type, age, gender etc. The original study conducted Stanford ML Group is our base line. Original study focuses on predicting 5 diseases. Our aim is to improve upon previous work, expand prediction to 14 diseases and provide insight for future chest radiography research.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
PaLM: Scaling Language Modeling with Pathways
Authors:
Aakanksha Chowdhery,
Sharan Narang,
Jacob Devlin,
Maarten Bosma,
Gaurav Mishra,
Adam Roberts,
Paul Barham,
Hyung Won Chung,
Charles Sutton,
Sebastian Gehrmann,
Parker Schuh,
Kensen Shi,
Sasha Tsvyashchenko,
Joshua Maynez,
Abhishek Rao,
Parker Barnes,
Yi Tay,
Noam Shazeer,
Vinodkumar Prabhakaran,
Emily Reif,
Nan Du,
Ben Hutchinson,
Reiner Pope,
James Bradbury,
Jacob Austin
, et al. (42 additional authors not shown)
Abstract:
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran…
▽ More
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
△ Less
Submitted 5 October, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Finding Label and Model Errors in Perception Data With Learned Observation Assertions
Authors:
Daniel Kang,
Nikos Arechiga,
Sudeep Pillai,
Peter Bailis,
Matei Zaharia
Abstract:
ML is being deployed in complex, real-world scenarios where errors have impactful consequences. In these systems, thorough testing of the ML pipelines is critical. A key component in ML deployment pipelines is the curation of labeled training data. Common practice in the ML literature assumes that labels are the ground truth. However, in our experience in a large autonomous vehicle development cen…
▽ More
ML is being deployed in complex, real-world scenarios where errors have impactful consequences. In these systems, thorough testing of the ML pipelines is critical. A key component in ML deployment pipelines is the curation of labeled training data. Common practice in the ML literature assumes that labels are the ground truth. However, in our experience in a large autonomous vehicle development center, we have found that vendors can often provide erroneous labels, which can lead to downstream safety risks in trained models.
To address these issues, we propose a new abstraction, learned observation assertions, and implement it in a system called Fixy. Fixy leverages existing organizational resources, such as existing (possibly noisy) labeled datasets or previously trained ML models, to learn a probabilistic model for finding errors in human- or model-generated labels. Given user-provided features and these existing resources, Fixy learns feature distributions that specify likely and unlikely values (e.g., that a speed of 30mph is likely but 300mph is unlikely). It then uses these feature distributions to score labels for potential errors. We show that FIxy can automatically rank potential errors in real datasets with up to 2$\times$ higher precision compared to recent work on model assertions and standard techniques such as uncertainty sampling.
△ Less
Submitted 15 January, 2022;
originally announced January 2022.
-
On the Age of Information of a Queuing System with Heterogeneous Servers
Authors:
Anhad Bhati,
Sibi Raj B. Pillai,
Rahul Vaze
Abstract:
An optimal control problem with heterogeneous servers to minimize the average age of information (AoI) is considered. Each server maintains a separate queue, and each packet arriving to the system is randomly routed to one of the servers. Assuming Poisson arrivals and exponentially distributed service times, we first derive an exact expression of the average AoI for two heterogeneous servers. Next…
▽ More
An optimal control problem with heterogeneous servers to minimize the average age of information (AoI) is considered. Each server maintains a separate queue, and each packet arriving to the system is randomly routed to one of the servers. Assuming Poisson arrivals and exponentially distributed service times, we first derive an exact expression of the average AoI for two heterogeneous servers. Next, to solve for the optimal average AoI, a close approximation is derived, called the approximate AoI, this is shown to be useful for multi-server systems as well. We show that for the optimal approximate AoI, server utilization (ratio of arrival rate and service rate) for each server should be same as the optimal server utilization with a single server queue. For two identical servers, it is shown that the average AoI is approximately 5/8 times the average AoI of a single server. Furthermore, the average AoI is shown to decrease considerably with the addition of more servers to the system.
△ Less
Submitted 14 September, 2021; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Multiple Access Channel Simulation
Authors:
Gowtham R. Kurri,
Viswanathan Ramachandran,
Sibi Raj B. Pillai,
Vinod M. Prabhakaran
Abstract:
We study the problem of simulating a two-user multiple-access channel (MAC) over a multiple access network of noiseless links. Two encoders observe independent and identically distributed (i.i.d.) copies of a source random variable each, while a decoder observes i.i.d. copies of a side-information random variable. There are rate-limited noiseless communication links between each encoder and the de…
▽ More
We study the problem of simulating a two-user multiple-access channel (MAC) over a multiple access network of noiseless links. Two encoders observe independent and identically distributed (i.i.d.) copies of a source random variable each, while a decoder observes i.i.d. copies of a side-information random variable. There are rate-limited noiseless communication links between each encoder and the decoder, and there is independent pairwise shared randomness between all the three possible pairs of nodes. The decoder has to output approximately i.i.d. copies of another random variable jointly distributed with the two sources and the side information. We are interested in the rate tuples which permit this simulation. This setting can be thought of as a multi-terminal generalization of the point-to-point channel simulation problem studied by Bennett et al. (2002) and Cuff (2013). When the pairwise shared randomness between the encoders is absent, the setting reduces to a special case of MAC simulation using another MAC studied by Haddadpour et al.~(2013). We establish that the presence of encoder shared randomness can strictly improve the communication rate requirements. We first show that the inner bound derived from Haddadpour et al.~(2013) is tight when the sources at the encoders are conditionally independent given the side-information at the decoder. This result recovers the existing results on point-to-point channel simulation and function computation over such multi-terminal networks. We then explicitly compute the communication rate regions for an example both with and without the encoder shared randomness and demonstrate that its presence strictly reduces the communication rates. Inner and outer bounds for the general case are also obtained.
△ Less
Submitted 16 June, 2022; v1 submitted 23 February, 2021;
originally announced February 2021.
-
On the Capacity Enlargement of Gaussian Broadcast Channels with Passive Noisy Feedback
Authors:
Aditya Narayan Ravi,
Sibi Raj B. Pillai,
Vinod Prabhakaran,
Michèle Wigger
Abstract:
It is well known that the capacity region of an average transmit power constrained Gaussian Broadcast Channel (GBC) with independent noise realizations at the receivers is enlarged by the presence of causal noiseless feedback. Capacity region enlargement is also known to be possible by using only passive noisy feedback, when the GBC has identical noise variances at the receivers. The last fact rem…
▽ More
It is well known that the capacity region of an average transmit power constrained Gaussian Broadcast Channel (GBC) with independent noise realizations at the receivers is enlarged by the presence of causal noiseless feedback. Capacity region enlargement is also known to be possible by using only passive noisy feedback, when the GBC has identical noise variances at the receivers. The last fact remains true even when the feedback noise variance is very high, and available only from one of the receivers. While such capacity enlargements are feasible for several other feedback models in the Gaussian BC setting, it is also known that feedback does not change the capacity region for physically degraded broadcast channels. In this paper, we consider a two user GBC with independent noise realizations at the receivers, where the feedback links from the receivers are corrupted by independent additive Gaussian noise processes. We investigate the set of four noise variances, two forward and two feedback, for which no capacity enlargement is possible. A sharp characterization of this region is derived, i.e., any quadruple outside the presented region will lead to a capacity enlargement, whereas quadruples inside will leave the capacity region unchanged. Our results lead to the conclusion that when the forward noise variances are different, too noisy a feedback from one of the receivers alone is not always beneficial for enlarging the capacity region, be it from the stronger user or the weaker one, in sharp contrast to the case of equal forward noise variances.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion
Authors:
Igor Vasiljevic,
Vitor Guizilini,
Rares Ambrus,
Sudeep Pillai,
Wolfram Burgard,
Greg Shakhnarovich,
Adrien Gaidon
Abstract:
Self-supervised learning has emerged as a powerful tool for depth and ego-motion estimation, leading to state-of-the-art results on benchmark datasets. However, one significant limitation shared by current methods is the assumption of a known parametric camera model -- usually the standard pinhole geometry -- leading to failure when applied to imaging systems that deviate significantly from this a…
▽ More
Self-supervised learning has emerged as a powerful tool for depth and ego-motion estimation, leading to state-of-the-art results on benchmark datasets. However, one significant limitation shared by current methods is the assumption of a known parametric camera model -- usually the standard pinhole geometry -- leading to failure when applied to imaging systems that deviate significantly from this assumption (e.g., catadioptric cameras or underwater imaging). In this work, we show that self-supervision can be used to learn accurate depth and ego-motion estimation without prior knowledge of the camera model. Inspired by the geometric model of Grossberg and Nayar, we introduce Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise projection rays, approximating a wide range of cameras. NRS are fully differentiable and can be learned end-to-end from unlabeled raw videos. We demonstrate the use of NRS for self-supervised learning of visual odometry and depth estimation from raw videos obtained using a wide variety of camera systems, including pinhole, fisheye, and catadioptric.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
PillarFlow: End-to-end Birds-eye-view Flow Estimation for Autonomous Driving
Authors:
Kuan-Hui Lee,
Matthew Kliemann,
Adrien Gaidon,
Jie Li,
Chao Fang,
Sudeep Pillai,
Wolfram Burgard
Abstract:
In autonomous driving, accurately estimating the state of surrounding obstacles is critical for safe and robust path planning. However, this perception task is difficult, particularly for generic obstacles/objects, due to appearance and occlusion changes. To tackle this problem, we propose an end-to-end deep learning framework for LIDAR-based flow estimation in bird's eye view (BeV). Our method ta…
▽ More
In autonomous driving, accurately estimating the state of surrounding obstacles is critical for safe and robust path planning. However, this perception task is difficult, particularly for generic obstacles/objects, due to appearance and occlusion changes. To tackle this problem, we propose an end-to-end deep learning framework for LIDAR-based flow estimation in bird's eye view (BeV). Our method takes consecutive point cloud pairs as input and produces a 2-D BeV flow grid describing the dynamic state of each cell. The experimental results show that the proposed method not only estimates 2-D BeV flow accurately but also improves tracking performance of both dynamic and static objects.
△ Less
Submitted 29 August, 2020; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Fast and memory-optimal dimension reduction using Kac's walk
Authors:
Vishesh Jain,
Natesh S. Pillai,
Ashwin Sah,
Mehtaab Sawhney,
Aaron Smith
Abstract:
In this work, we analyze dimension reduction algorithms based on the Kac walk and discrete variants.
(1) For $n$ points in $\mathbb{R}^{d}$, we design an optimal Johnson-Lindenstrauss (JL) transform based on the Kac walk which can be applied to any vector in time $O(d\log{d})$ for essentially the same restriction on $n$ as in the best-known transforms due to Ailon and Liberty [SODA, 2008], and B…
▽ More
In this work, we analyze dimension reduction algorithms based on the Kac walk and discrete variants.
(1) For $n$ points in $\mathbb{R}^{d}$, we design an optimal Johnson-Lindenstrauss (JL) transform based on the Kac walk which can be applied to any vector in time $O(d\log{d})$ for essentially the same restriction on $n$ as in the best-known transforms due to Ailon and Liberty [SODA, 2008], and Bamberger and Krahmer [arXiv, 2017]. Our algorithm is memory-optimal, and outperforms existing algorithms in regimes when $n$ is sufficiently large and the distortion parameter is sufficiently small. In particular, this confirms a conjecture of Ailon and Chazelle [STOC, 2006] in a stronger form.
(2) The same construction gives a simple transform with optimal Restricted Isometry Property (RIP) which can be applied in time $O(d\log{d})$ for essentially the same range of sparsity as in the best-known such transform due to Ailon and Rauhut [Discrete Comput. Geom., 2014].
(3) We show that by fixing the angle in the Kac walk to be $Ï€/4$ throughout, one obtains optimal JL and RIP transforms with almost the same running time, thereby confirming -- up to a $\log\log{d}$ factor -- a conjecture of Avron, Maymounkov, and Toledo [SIAM J. Sci. Comput., 2010]. Our moment-based analysis of this modification of the Kac walk may also be of independent interest.
△ Less
Submitted 14 July, 2020; v1 submitted 22 March, 2020;
originally announced March 2020.
-
Neural Outlier Rejection for Self-Supervised Keypoint Learning
Authors:
Jiexiong Tang,
Hanme Kim,
Vitor Guizilini,
Sudeep Pillai,
Rares Ambrus
Abstract:
Identifying salient points in images is a crucial component for visual odometry, Structure-from-Motion or SLAM algorithms. Recently, several learned keypoint methods have demonstrated compelling performance on challenging benchmarks. However, generating consistent and accurate training data for interest-point detection in natural images still remains challenging, especially for human annotators. W…
▽ More
Identifying salient points in images is a crucial component for visual odometry, Structure-from-Motion or SLAM algorithms. Recently, several learned keypoint methods have demonstrated compelling performance on challenging benchmarks. However, generating consistent and accurate training data for interest-point detection in natural images still remains challenging, especially for human annotators. We introduce IO-Net (i.e. InlierOutlierNet), a novel proxy task for the self-supervision of keypoint detection, description and matching. By making the sampling of inlier-outlier sets from point-pair correspondences fully differentiable within the keypoint learning framework, we show that are able to simultaneously self-supervise keypoint description and improve keypoint matching. Second, we introduce KeyPointNet, a keypoint-network architecture that is especially amenable to robust keypoint detection and description. We design the network to allow local keypoint aggregation to avoid artifacts due to spatial discretizations commonly used for this task, and we improve fine-grained keypoint descriptor performance by taking advantage of efficient sub-pixel convolutions to upsample the descriptor feature-maps to a higher operating resolution. Through extensive experiments and ablative analysis, we show that the proposed self-supervised keypoint learning method greatly improves the quality of feature matching and homography estimation on challenging benchmarks over the state-of-the-art.
△ Less
Submitted 22 December, 2019;
originally announced December 2019.
-
Self-Supervised 3D Keypoint Learning for Ego-motion Estimation
Authors:
Jiexiong Tang,
Rares Ambrus,
Vitor Guizilini,
Sudeep Pillai,
Hanme Kim,
Patric Jensfelt,
Adrien Gaidon
Abstract:
Detecting and matching robust viewpoint-invariant keypoints is critical for visual SLAM and Structure-from-Motion. State-of-the-art learning-based methods generate training samples via homography adaptation to create 2D synthetic views with known keypoint matches from a single image. This approach, however, does not generalize to non-planar 3D scenes with illumination variations commonly seen in r…
▽ More
Detecting and matching robust viewpoint-invariant keypoints is critical for visual SLAM and Structure-from-Motion. State-of-the-art learning-based methods generate training samples via homography adaptation to create 2D synthetic views with known keypoint matches from a single image. This approach, however, does not generalize to non-planar 3D scenes with illumination variations commonly seen in real-world videos. In this work, we propose self-supervised learning of depth-aware keypoints directly from unlabeled videos. We jointly learn keypoint and depth estimation networks by combining appearance and geometric matching via a differentiable structure-from-motion module based on Procrustean residual pose correction. We describe how our self-supervised keypoints can be integrated into state-of-the-art visual odometry frameworks for robust and accurate ego-motion estimation of autonomous vehicles in real-world conditions.
△ Less
Submitted 17 November, 2020; v1 submitted 6 December, 2019;
originally announced December 2019.
-
Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances
Authors:
Vitor Guizilini,
Jie Li,
Rares Ambrus,
Sudeep Pillai,
Adrien Gaidon
Abstract:
Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks. Initially viewed as a direct regression problem, requiring annotated labels as supervision at training time, in the past few years a substantial amount of work has been done in self-supervised depth training based on strong geometric cues, both from stereo cam…
▽ More
Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks. Initially viewed as a direct regression problem, requiring annotated labels as supervision at training time, in the past few years a substantial amount of work has been done in self-supervised depth training based on strong geometric cues, both from stereo cameras and more recently from monocular video sequences. In this paper we investigate how these two approaches (supervised & self-supervised) can be effectively combined, so that a depth model can learn to encode true scale from sparse supervision while achieving high fidelity local accuracy by leveraging geometric cues. To this end, we propose a novel supervised loss term that complements the widely used photometric loss, and show how it can be used to train robust semi-supervised monocular depth estimation models. Furthermore, we evaluate how much supervision is actually necessary to train accurate scale-aware monocular depth models, showing that with our proposed framework, very sparse LiDAR information, with as few as 4 beams (less than 100 valid depth values per image), is enough to achieve results competitive with the current state-of-the-art.
△ Less
Submitted 19 November, 2019; v1 submitted 3 October, 2019;
originally announced October 2019.
-
Two Stream Networks for Self-Supervised Ego-Motion Estimation
Authors:
Rares Ambrus,
Vitor Guizilini,
Jie Li,
Sudeep Pillai,
Adrien Gaidon
Abstract:
Learning depth and camera ego-motion from raw unlabeled RGB video streams is seeing exciting progress through self-supervision from strong geometric cues. To leverage not only appearance but also scene geometry, we propose a novel self-supervised two-stream network using RGB and inferred depth information for accurate visual odometry. In addition, we introduce a sparsity-inducing data augmentation…
▽ More
Learning depth and camera ego-motion from raw unlabeled RGB video streams is seeing exciting progress through self-supervision from strong geometric cues. To leverage not only appearance but also scene geometry, we propose a novel self-supervised two-stream network using RGB and inferred depth information for accurate visual odometry. In addition, we introduce a sparsity-inducing data augmentation policy for ego-motion learning that effectively regularizes the pose network to enable stronger generalization performance. As a result, we show that our proposed two-stream pose network achieves state-of-the-art results among learning-based methods on the KITTI odometry benchmark, and is especially suited for self-supervision at scale. Our experiments on a large-scale urban driving dataset of 1 million frames indicate that the performance of our proposed architecture does indeed scale progressively with more data.
△ Less
Submitted 19 November, 2019; v1 submitted 3 October, 2019;
originally announced October 2019.
-
Online Energy Harvesting Problem Over An Arbitrary Directed Acyclic Graph Network
Authors:
Rahul Vaze,
Sibi Raj B Pillai
Abstract:
A communication network modelled by a directed acyclic graph (DAG) is considered, over which a source wishes to send a specified number of bits to a destination node. Each node of the DAG is powered by a separate renewable energy source, and the harvested energy is used to facilitate the source destination data flow. The challenge here is to find the optimal rate and power allocations across time…
▽ More
A communication network modelled by a directed acyclic graph (DAG) is considered, over which a source wishes to send a specified number of bits to a destination node. Each node of the DAG is powered by a separate renewable energy source, and the harvested energy is used to facilitate the source destination data flow. The challenge here is to find the optimal rate and power allocations across time for each node on its outgoing edges so as to minimize the time by which the destination receives a specified number of bits. An online setting is considered where an algorithm only has causal information about the energy arrivals. Using the competitive ratio as the performance metric, i.e. the ratio of the cost of the online algorithm and the optimal offline algorithm, maximized over all inputs, a {\it lazy} online algorithm with a competitive ratio of $2+δ$ for any $δ>0$ is proposed. Incidentally, $2$ is also a lower bound to the competitive ratio of any online algorithm for this problem. Our lazy online algorithm is described and analyzed via defining a novel max-flow problem over a DAG, where the rate on the subset of outgoing edges of any node are related/constrained. An optimal algorithm to find max-flow with these constraints is also provided, which may be of independent interest.
△ Less
Submitted 26 August, 2019;
originally announced August 2019.
-
MIST: A Novel Training Strategy for Low-latency Scalable Neural Net Decoders
Authors:
Kumar Yashashwi,
Deepak Anand,
Sibi Raj B Pillai,
Prasanna Chaporkar,
K Ganesh
Abstract:
In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decodi…
▽ More
In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decoding speed is due to the use of convolutional neural network (CNN) as opposed to recurrent neural network (RNN) used in the best known neural net based decoders. This contradicts existing doctrine that only RNN based decoders can provide a performance close to the optimal ones. The key ingredient to our approach is a novel Mixed-SNR Independent Samples based Training (MIST), which allows for training of CNN with only 1\% of possible datawords, even for block length as high as 1000. The proposed decoder is robust as, once trained, the same decoder can be used for a wide range of SNR values. Finally, in the presence of channel outages, the proposed decoders outperform the best known decoders, {\it viz.} unquantized Viterbi decoder for convolutional code, and belief propagation for LDPC. This gives the CNN decoder a significant advantage in 5G millimeter wave systems, where channel outages are prevalent.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Self-Supervised Visual Place Recognition Learning in Mobile Robots
Authors:
Sudeep Pillai,
John Leonard
Abstract:
Place recognition is a critical component in robot navigation that enables it to re-establish previously visited locations, and simultaneously use this information to correct the drift incurred in its dead-reckoned estimate. In this work, we develop a self-supervised approach to place recognition in robots. The task of visual loop-closure identification is cast as a metric learning problem, where…
▽ More
Place recognition is a critical component in robot navigation that enables it to re-establish previously visited locations, and simultaneously use this information to correct the drift incurred in its dead-reckoned estimate. In this work, we develop a self-supervised approach to place recognition in robots. The task of visual loop-closure identification is cast as a metric learning problem, where the labels for positive and negative examples of loop-closures can be bootstrapped using a GPS-aided navigation solution that the robot already uses. By leveraging the synchronization between sensors, we show that we are able to learn an appropriate distance metric for arbitrary real-valued image descriptors (including state-of-the-art CNN models), that is specifically geared for visual place recognition in mobile robots. Furthermore, we show that the newly learned embedding can be particularly powerful in disambiguating visual scenes for the task of vision-based loop-closure identification in mobile robots.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
3D Packing for Self-Supervised Monocular Depth Estimation
Authors:
Vitor Guizilini,
Rares Ambrus,
Sudeep Pillai,
Allan Raventos,
Adrien Gaidon
Abstract:
Although cameras are ubiquitous, robotic platforms typically rely on active sensors like LiDAR for direct 3D perception. In this work, we propose a novel self-supervised monocular depth estimation method combining geometry with a new deep network, PackNet, learned only from unlabeled monocular videos. Our architecture leverages novel symmetrical packing and unpacking blocks to jointly learn to com…
▽ More
Although cameras are ubiquitous, robotic platforms typically rely on active sensors like LiDAR for direct 3D perception. In this work, we propose a novel self-supervised monocular depth estimation method combining geometry with a new deep network, PackNet, learned only from unlabeled monocular videos. Our architecture leverages novel symmetrical packing and unpacking blocks to jointly learn to compress and decompress detail-preserving representations using 3D convolutions. Although self-supervised, our method outperforms other self, semi, and fully supervised methods on the KITTI benchmark. The 3D inductive bias in PackNet enables it to scale with input resolution and number of parameters without overfitting, generalizing better on out-of-domain data such as the NuScenes dataset. Furthermore, it does not require large-scale supervised pretraining on ImageNet and can run in real-time. Finally, we release DDAD (Dense Depth for Automated Driving), a new urban driving dataset with more challenging and accurate depth evaluation, thanks to longer-range and denser ground-truth depth generated from high-density LiDARs mounted on a fleet of self-driving cars operating world-wide.
△ Less
Submitted 28 March, 2020; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Knowledge-driven generative subspaces for modeling multi-view dependencies in medical data
Authors:
Parvathy Sudhir Pillai,
Tze-Yun Leong
Abstract:
Early detection of Alzheimer's disease (AD) and identification of potential risk/beneficial factors are important for planning and administering timely interventions or preventive measures. In this paper, we learn a disease model for AD that combines genotypic and phenotypic profiles, and cognitive health metrics of patients. We propose a probabilistic generative subspace that describes the correl…
▽ More
Early detection of Alzheimer's disease (AD) and identification of potential risk/beneficial factors are important for planning and administering timely interventions or preventive measures. In this paper, we learn a disease model for AD that combines genotypic and phenotypic profiles, and cognitive health metrics of patients. We propose a probabilistic generative subspace that describes the correlative, complementary and domain-specific semantics of the dependencies in multi-view, multi-modality medical data. Guided by domain knowledge and using the latent consensus between abstractions of multi-view data, we model the fusion as a data generating process. We show that our approach can potentially lead to i) explainable clinical predictions and ii) improved AD diagnoses.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.
-
Joint State Estimation and Communication over a State-Dependent Gaussian Multiple Access Channel
Authors:
Viswanathan Ramachandran,
Sibi Raj B Pillai,
Vinod M Prabhakaran
Abstract:
A hybrid communication network with a common analog signal and an independent digital data stream as input to each node in a multiple access network is considered. The receiver/base-station has to estimate the analog signal with a given fidelity, and decode the digital streams with a low error probability. Treating the analog signal as a common state process, we set up a joint state estimation and…
▽ More
A hybrid communication network with a common analog signal and an independent digital data stream as input to each node in a multiple access network is considered. The receiver/base-station has to estimate the analog signal with a given fidelity, and decode the digital streams with a low error probability. Treating the analog signal as a common state process, we set up a joint state estimation and communication problem in a Gaussian multiple access channel (MAC) with additive state. The transmitters have non-causal knowledge of the state process, and need to communicate independent data streams in addition to facilitating state estimation at the receiver. We first provide a complete characterization of the optimal trade-off between mean squared error distortion performance in estimating the state and the data rates for the message streams from two transmitting nodes. This is then generalized to an N-sender MAC. To this end, we show a natural connection between the state-dependent MAC model and a hybrid multi-sensor network in which a common source phenomenon is observed at N transmitting nodes. Each node encodes the source observations as well as an independent message stream over a Gaussian MAC without any state process. The receiver is interested estimating the source and all the messages. Again the distortion-rate performance is characterized.
△ Less
Submitted 25 November, 2018;
originally announced November 2018.
-
SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation
Authors:
Sudeep Pillai,
Rares Ambrus,
Adrien Gaidon
Abstract:
Recent techniques in self-supervised monocular depth estimation are approaching the performance of supervised methods, but operate in low resolution only. We show that high resolution is key towards high-fidelity self-supervised monocular depth prediction. Inspired by recent deep learning methods for Single-Image Super-Resolution, we propose a sub-pixel convolutional layer extension for depth supe…
▽ More
Recent techniques in self-supervised monocular depth estimation are approaching the performance of supervised methods, but operate in low resolution only. We show that high resolution is key towards high-fidelity self-supervised monocular depth prediction. Inspired by recent deep learning methods for Single-Image Super-Resolution, we propose a sub-pixel convolutional layer extension for depth super-resolution that accurately synthesizes high-resolution disparities from their corresponding low-resolution convolutional features. In addition, we introduce a differentiable flip-augmentation layer that accurately fuses predictions from the image and its horizontally flipped version, reducing the effect of left and right shadow regions generated in the disparity map due to occlusions. Both contributions provide significant performance gains over the state-of-the-art in self-supervised depth and pose estimation on the public KITTI benchmark. A video of our approach can be found at https://youtu.be/jKNgBeBMx0I.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
Does Hamiltonian Monte Carlo mix faster than a random walk on multimodal densities?
Authors:
Oren Mangoubi,
Natesh S. Pillai,
Aaron Smith
Abstract:
Hamiltonian Monte Carlo (HMC) is a very popular and generic collection of Markov chain Monte Carlo (MCMC) algorithms. One explanation for the popularity of HMC algorithms is their excellent performance as the dimension $d$ of the target becomes large: under conditions that are satisfied for many common statistical models, optimally-tuned HMC algorithms have a running time that scales like…
▽ More
Hamiltonian Monte Carlo (HMC) is a very popular and generic collection of Markov chain Monte Carlo (MCMC) algorithms. One explanation for the popularity of HMC algorithms is their excellent performance as the dimension $d$ of the target becomes large: under conditions that are satisfied for many common statistical models, optimally-tuned HMC algorithms have a running time that scales like $d^{0.25}$. In stark contrast, the running time of the usual Random-Walk Metropolis (RWM) algorithm, optimally tuned, scales like $d$. This superior scaling of the HMC algorithm with dimension is attributed to the fact that it, unlike RWM, incorporates the gradient information in the proposal distribution. In this paper, we investigate a different scaling question: does HMC beat RWM for highly $\textit{multimodal}$ targets? We find that the answer is often $\textit{no}$. We compute the spectral gaps for both the algorithms for a specific class of multimodal target densities, and show that they are identical. The key reason is that, within one mode, the gradient is effectively ignorant about other modes, thus negating the advantage the HMC algorithm enjoys in unimodal targets. We also give heuristic arguments suggesting that the above observation may hold quite generally. Our main tool for answering this question is a novel simple formula for the conductance of HMC using Liouville's theorem. This result allows us to compute the spectral gap of HMC algorithms, for both the classical HMC with isotropic momentum and the recent Riemannian HMC, for multimodal targets.
△ Less
Submitted 4 September, 2018; v1 submitted 9 August, 2018;
originally announced August 2018.
-
Towards Visual Ego-motion Learning in Robots
Authors:
Sudeep Pillai,
John J. Leonard
Abstract:
Many model-based Visual Odometry (VO) algorithms have been proposed in the past decade, often restricted to the type of camera optics, or the underlying motion manifold observed. We envision robots to be able to learn and perform these tasks, in a minimally supervised setting, as they gain more experience. To this end, we propose a fully trainable solution to visual ego-motion estimation for varie…
▽ More
Many model-based Visual Odometry (VO) algorithms have been proposed in the past decade, often restricted to the type of camera optics, or the underlying motion manifold observed. We envision robots to be able to learn and perform these tasks, in a minimally supervised setting, as they gain more experience. To this end, we propose a fully trainable solution to visual ego-motion estimation for varied camera optics. We propose a visual ego-motion learning architecture that maps observed optical flow vectors to an ego-motion density estimate via a Mixture Density Network (MDN). By modeling the architecture as a Conditional Variational Autoencoder (C-VAE), our model is able to provide introspective reasoning and prediction for ego-motion induced scene-flow. Additionally, our proposed model is especially amenable to bootstrapped ego-motion learning in robots where the supervision in ego-motion estimation for a particular camera sensor can be obtained from standard navigation-based sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through experiments, we show the utility of our proposed approach in enabling the concept of self-supervised learning for visual ego-motion estimation in autonomous robots.
△ Less
Submitted 29 May, 2017;
originally announced May 2017.
-
Feedback-Capacity of Degraded Gaussian Vector BC using Directed Information and Concave Envelopes
Authors:
Viswanathan Ramachandran,
S. R. B. Pillai
Abstract:
It is known that the capacity region of a two user physically degraded discrete memoryless (DM) broadcast channel (BC) is not enlarged by feedback. An identical result holds true for a physically degraded Gaussian BC, established later using a variant of the Entropy Power Inequality (EPI). In this paper, we extend the latter result to a physically degraded Gaussian Vector BC (PD-GVBC). However, th…
▽ More
It is known that the capacity region of a two user physically degraded discrete memoryless (DM) broadcast channel (BC) is not enlarged by feedback. An identical result holds true for a physically degraded Gaussian BC, established later using a variant of the Entropy Power Inequality (EPI). In this paper, we extend the latter result to a physically degraded Gaussian Vector BC (PD-GVBC). However, the extension is not EPI based, but employs a recent result on the factorization of concave envelopes. While the existing concave envelope factorization results do not hold in the presence of feedback, we show that factorizing the corresponding directed information quantities suffice to attain the feedback capacity region of a PD-GVBC. Our work demonstrates that factorizing concave envelopes of directed information can handle situations involving feedback. We further show that the capacity region of a discrete memoryless reversely physically degraded BC is not enlarged by feedback.
△ Less
Submitted 18 April, 2017;
originally announced April 2017.
-
Robust Spatial Filtering with Graph Convolutional Neural Networks
Authors:
Felipe Petroski Such,
Shagan Sah,
Miguel Dominguez,
Suhas Pillai,
Chao Zhang,
Andrew Michael,
Nathan Cahill,
Raymond Ptucha
Abstract:
Convolutional Neural Networks (CNNs) have recently led to incredible breakthroughs on a variety of pattern recognition problems. Banks of finite impulse response filters are learned on a hierarchy of layers, each contributing more abstract information than the previous layer. The simplicity and elegance of the convolutional filtering process makes them perfect for structured problems such as image…
▽ More
Convolutional Neural Networks (CNNs) have recently led to incredible breakthroughs on a variety of pattern recognition problems. Banks of finite impulse response filters are learned on a hierarchy of layers, each contributing more abstract information than the previous layer. The simplicity and elegance of the convolutional filtering process makes them perfect for structured problems such as image, video, or voice, where vertices are homogeneous in the sense of number, location, and strength of neighbors. The vast majority of classification problems, for example in the pharmaceutical, homeland security, and financial domains are unstructured. As these problems are formulated into unstructured graphs, the heterogeneity of these problems, such as number of vertices, number of connections per vertex, and edge strength, cannot be tackled with standard convolutional techniques. We propose a novel neural learning framework that is capable of handling both homogeneous and heterogeneous data, while retaining the benefits of traditional CNN successes.
Recently, researchers have proposed variations of CNNs that can handle graph data. In an effort to create learnable filter banks of graphs, these methods either induce constraints on the data or require preprocessing. As opposed to spectral methods, our framework, which we term Graph-CNNs, defines filters as polynomials of functions of the graph adjacency matrix. Graph-CNNs can handle both heterogeneous and homogeneous graph data, including graphs having entirely different vertex or edge sets. We perform experiments to validate the applicability of Graph-CNNs to a variety of structured and unstructured classification problems and demonstrate state-of-the-art results on document and molecule classification problems.
△ Less
Submitted 14 July, 2017; v1 submitted 2 March, 2017;
originally announced March 2017.
-
Distributed Scheduling in Multiple Access with Bursty Arrivals and Delay Constraints
Authors:
Sakshi Kapoor,
Sreejith Sreekumar,
Sibi Raj B Pillai
Abstract:
A multiple access system with bursty data arrivals to the terminals is considered. The users are frame-synchronized, with variable sized packets independently arriving in each slot at every transmitter. Each packet needs to be delivered to a common receiver within a certain number of slots specified by a maximum delay constraint. The key assumption is that the terminals know only their own packet…
▽ More
A multiple access system with bursty data arrivals to the terminals is considered. The users are frame-synchronized, with variable sized packets independently arriving in each slot at every transmitter. Each packet needs to be delivered to a common receiver within a certain number of slots specified by a maximum delay constraint. The key assumption is that the terminals know only their own packet arrival process, i.e. the arrivals at the rest of the terminals are unknown to each transmitter, except for their statistics. For this interesting distributed multiple access model, we design novel online communication schemes which transport the arriving data without any outage, while ensuring the delay constraint. In particular, the transmit powers in each slot are chosen in a distributed manner, ensuring at the same time that the joint power vector is sufficient to support the distributed choice of data-rates employed in that slot. The proposed schemes not only are optimal for minimizing the average transmit sum-power, but they also considerably outperform conventional orthogonal multiple access techniques like TDMA.
△ Less
Submitted 28 November, 2016; v1 submitted 2 February, 2016;
originally announced February 2016.
-
High-Performance and Tunable Stereo Reconstruction
Authors:
Sudeep Pillai,
Srikumar Ramalingam,
John J. Leonard
Abstract:
Traditional stereo algorithms have focused their efforts on reconstruction quality and have largely avoided prioritizing for run time performance. Robots, on the other hand, require quick maneuverability and effective computation to observe its immediate environment and perform tasks within it. In this work, we propose a high-performance and tunable stereo disparity estimation method, with a peak…
▽ More
Traditional stereo algorithms have focused their efforts on reconstruction quality and have largely avoided prioritizing for run time performance. Robots, on the other hand, require quick maneuverability and effective computation to observe its immediate environment and perform tasks within it. In this work, we propose a high-performance and tunable stereo disparity estimation method, with a peak frame-rate of 120Hz (VGA resolution, on a single CPU-thread), that can potentially enable robots to quickly reconstruct their immediate surroundings and maneuver at high-speeds. Our key contribution is a disparity estimation algorithm that iteratively approximates the scene depth via a piece-wise planar mesh from stereo imagery, with a fast depth validation step for semi-dense reconstruction. The mesh is initially seeded with sparsely matched keypoints, and is recursively tessellated and refined as needed (via a resampling stage), to provide the desired stereo disparity accuracy. The inherent simplicity and speed of our approach, with the ability to tune it to a desired reconstruction quality and runtime performance makes it a compelling solution for applications in high-speed vehicles.
△ Less
Submitted 17 February, 2016; v1 submitted 2 November, 2015;
originally announced November 2015.
-
Monocular SLAM Supported Object Recognition
Authors:
Sudeep Pillai,
John Leonard
Abstract:
In this work, we develop a monocular SLAM-aware object recognition system that is able to achieve considerably stronger recognition performance, as compared to classical object recognition systems that function on a frame-by-frame basis. By incorporating several key ideas including multi-view object proposals and efficient feature encoding methods, our proposed system is able to detect and robustl…
▽ More
In this work, we develop a monocular SLAM-aware object recognition system that is able to achieve considerably stronger recognition performance, as compared to classical object recognition systems that function on a frame-by-frame basis. By incorporating several key ideas including multi-view object proposals and efficient feature encoding methods, our proposed system is able to detect and robustly recognize objects in its environment using a single RGB camera in near-constant time. Through experiments, we illustrate the utility of using such a system to effectively detect and recognize objects, incorporating multiple object viewpoint detections into a unified prediction hypothesis. The performance of the proposed recognition system is evaluated on the UW RGB-D Dataset, showing strong recognition performance and scalable run-time performance compared to current state-of-the-art recognition systems.
△ Less
Submitted 4 June, 2015;
originally announced June 2015.
-
On the Noisy Feedback Capacity of Gaussian Broadcast Channels
Authors:
Sibi Raj B. Pillai,
Vinod M. Prabhakaran
Abstract:
It is well known that, in general, feedback may enlarge the capacity region of Gaussian broadcast channels. This has been demonstrated even when the feedback is noisy (or partial-but-perfect) and only from one of the receivers. The only case known where feedback has been shown not to enlarge the capacity region is when the channel is physically degraded (El Gamal 1978, 1981). In this paper, we sho…
▽ More
It is well known that, in general, feedback may enlarge the capacity region of Gaussian broadcast channels. This has been demonstrated even when the feedback is noisy (or partial-but-perfect) and only from one of the receivers. The only case known where feedback has been shown not to enlarge the capacity region is when the channel is physically degraded (El Gamal 1978, 1981). In this paper, we show that for a class of two-user Gaussian broadcast channels (not necessarily physically degraded), passively feeding back the stronger user's signal over a link corrupted by Gaussian noise does not enlarge the capacity region if the variance of feedback noise is above a certain threshold.
△ Less
Submitted 17 February, 2015;
originally announced February 2015.
-
Learning Articulated Motions From Visual Demonstration
Authors:
Sudeep Pillai,
Matthew R. Walter,
Seth Teller
Abstract:
Many functional elements of human homes and workplaces consist of rigid components which are connected through one or more sliding or rotating linkages. Examples include doors and drawers of cabinets and appliances; laptops; and swivel office chairs. A robotic mobile manipulator would benefit from the ability to acquire kinematic models of such objects from observation. This paper describes a meth…
▽ More
Many functional elements of human homes and workplaces consist of rigid components which are connected through one or more sliding or rotating linkages. Examples include doors and drawers of cabinets and appliances; laptops; and swivel office chairs. A robotic mobile manipulator would benefit from the ability to acquire kinematic models of such objects from observation. This paper describes a method by which a robot can acquire an object model by capturing depth imagery of the object as a human moves it through its range of motion. We envision that in future, a machine newly introduced to an environment could be shown by its human user the articulated objects particular to that environment, inferring from these "visual demonstrations" enough information to actuate each object independently of the user.
Our method employs sparse (markerless) feature tracking, motion segmentation, component pose estimation, and articulation learning; it does not require prior object models. Using the method, a robot can observe an object being exercised, infer a kinematic model incorporating rigid, prismatic and revolute joints, then use the model to predict the object's motion from a novel vantage point. We evaluate the method's performance, and compare it to that of a previously published technique, for a variety of household objects.
△ Less
Submitted 5 February, 2015;
originally announced February 2015.
-
Bitcoin Transaction Graph Analysis
Authors:
Michael Fleder,
Michael S. Kester,
Sudeep Pillai
Abstract:
Bitcoins have recently become an increasingly popular cryptocurrency through which users trade electronically and more anonymously than via traditional electronic transfers. Bitcoin's design keeps all transactions in a public ledger. The sender and receiver for each transaction are identified only by cryptographic public-key ids. This leads to a common misconception that it inherently provides ano…
▽ More
Bitcoins have recently become an increasingly popular cryptocurrency through which users trade electronically and more anonymously than via traditional electronic transfers. Bitcoin's design keeps all transactions in a public ledger. The sender and receiver for each transaction are identified only by cryptographic public-key ids. This leads to a common misconception that it inherently provides anonymous use. While Bitcoin's presumed anonymity offers new avenues for commerce, several recent studies raise user-privacy concerns. We explore the level of anonymity in the Bitcoin system. Our approach is two-fold: (i) We annotate the public transaction graph by linking bitcoin public keys to "real" people - either definitively or statistically. (ii) We run the annotated graph through our graph-analysis framework to find and summarize activity of both known and unknown users.
△ Less
Submitted 5 February, 2015;
originally announced February 2015.
-
Higher dimensional homodyne filtering for suppression of incidental phase artifacts in multichannel MRI
Authors:
Joseph Suresh Paul,
Uma Krishna Swamy Pillai
Abstract:
The aim of this paper is to introduce procedural steps for extension of the 1D homodyne phase correction for k-space truncation in all gradient encoding directions. Compared to the existing method applied to 2D partial k-space, signal losses introduced by the phase correction filter is observed to be minimal for the extended approach. In addition, the modified form of phase correction mitigates In…
▽ More
The aim of this paper is to introduce procedural steps for extension of the 1D homodyne phase correction for k-space truncation in all gradient encoding directions. Compared to the existing method applied to 2D partial k-space, signal losses introduced by the phase correction filter is observed to be minimal for the extended approach. In addition, the modified form of phase correction mitigates Incidental Phase Artifacts (IPA) due to truncation. For parallel imaging with undersampling along phase encode direction, the extended homodyne filtering is shown to be effective for minimizing these artifacts when each of the channel k-spaces are truncated along both phase and frequency encode directions. This is illustrated with 2D partial k-space for flow compensated multichannel Susceptibility Weighted Imaging (SWI). Extension of our method to 3D partial k-space shows improved reconstruction of flow information in phase contrast angiography.
△ Less
Submitted 14 January, 2015;
originally announced January 2015.
-
Optimal WiFi Sensing via Dynamic Programming
Authors:
Abhinav Kumar,
Rahul Vaze,
Sibi Raj B Pillai,
Aditya Gopalan
Abstract:
The problem of finding an optimal sensing schedule for a mobile device that encounters an intermittent WiFi access opportunity is considered. At any given time, the WiFi is in any of the two modes, ON or OFF, and the mobile's incentive is to connect to the WiFi in the ON mode as soon as possible, while spending as little sensing energy. We introduce a dynamic programming framework which enables th…
▽ More
The problem of finding an optimal sensing schedule for a mobile device that encounters an intermittent WiFi access opportunity is considered. At any given time, the WiFi is in any of the two modes, ON or OFF, and the mobile's incentive is to connect to the WiFi in the ON mode as soon as possible, while spending as little sensing energy. We introduce a dynamic programming framework which enables the characterization of an explicit solution for several models, particularly when the OFF periods are exponentially distributed. While the problem for non-exponential OFF periods is ill-posed in general, a usual workaround in literature is to make the mobile device aware if one ON period is completely missed. In this restricted setting, using the DP framework, the deterministic nature of the optimal sensing policy is established, and value iterations are shown to converge to the optimal solution. Finally, we address the blind situation where the distributions of ON and OFF periods are unknown. A continuous bandit based learning algorithm that has vanishing regret (loss compared to the optimal strategy with the knowledge of distributions) is presented, and comparisons with the optimal schemes are provided for exponential ON and OFF times.
△ Less
Submitted 28 October, 2014;
originally announced October 2014.
-
Distributed Rate Adaptation and Power Control in Fading Multiple Access Channels
Authors:
Sreejith Sreekumar,
Bikash K. Dey,
Sibi Raj B. Pillai
Abstract:
Traditionally, the capacity region of a coherent fading multiple access channel (MAC) is analyzed in two popular contexts. In the first, a centralized system with full channel state information at the transmitters (CSIT) is assumed, and the communication parameters like transmit power and data-rate are jointly chosen for every fading vector realization. On the other hand, in fast-fading links with…
▽ More
Traditionally, the capacity region of a coherent fading multiple access channel (MAC) is analyzed in two popular contexts. In the first, a centralized system with full channel state information at the transmitters (CSIT) is assumed, and the communication parameters like transmit power and data-rate are jointly chosen for every fading vector realization. On the other hand, in fast-fading links with distributed CSIT, the lack of full CSI is compensated by performing ergodic averaging over sufficiently many channel realizations. Notice that the distributed CSI may necessitate decentralized power-control for optimal data-transfer. Apart from these two models, the case of slow-fading links and distributed CSIT, though relevant to many systems, has received much less attention.
In this paper, a block-fading AWGN MAC with full CSI at the receiver and distributed CSI at the transmitters is considered. The links undergo independent fading, but otherwise have arbitrary fading distributions. The channel statistics and respective long-term average transmit powers are known to all parties. We first consider the case where each encoder has knowledge only of its own link quality, and not of others. For this model, we compute the adaptive capacity region, i.e. the collection of average rate-tuples under block-wise coding/decoding such that the rate-tuple for every fading realization is inside the instantaneous MAC capacity region. The key step in our solution is an optimal rate allocation function for any given set of distributed power control laws at the transmitters. This also allows us to characterize the optimal power control for a wide class of fading models. Further extensions are also proposed to account for more general CSI availability at the transmitters.
△ Less
Submitted 15 September, 2014;
originally announced September 2014.
-
Performance Comparison of Linear Prediction based Vocoders in Linux Platform
Authors:
Lani Rachel Mathew,
Ancy S. Anselam,
Sakuntala S. Pillai
Abstract:
Linear predictive coders form an important class of speech coders. This paper describes the software level implementation of linear prediction based vocoders, viz. Code Excited Linear Prediction (CELP), Low-Delay CELP (LD-CELP) and Mixed Excitation Linear Prediction (MELP) at bit rates of 4.8 kb/s, 16 kb/s and 2.4 kb/s respectively. The C programs of the vocoders have been compiled and executed in…
▽ More
Linear predictive coders form an important class of speech coders. This paper describes the software level implementation of linear prediction based vocoders, viz. Code Excited Linear Prediction (CELP), Low-Delay CELP (LD-CELP) and Mixed Excitation Linear Prediction (MELP) at bit rates of 4.8 kb/s, 16 kb/s and 2.4 kb/s respectively. The C programs of the vocoders have been compiled and executed in Linux platform. Subjective testing with the help of Mean Opinion Score test has been performed. Waveform analysis has been done using Praat and Adobe Audition software. The results show that MELP and CELP produce comparable quality while the quality of LD-CELP coder is much higher, at the expense of higher bit rate.
△ Less
Submitted 25 June, 2014;
originally announced June 2014.
-
Least-Squares FIR Models of Low-Resolution MR data for Efficient Phase-Error Compensation with Simultaneous Artefact Removal
Authors:
Joseph Suresh Paul,
Uma Krishna Swamy Pillai,
Nyjin Thomas
Abstract:
Signal space models in both phase-encode, and frequency-encode directions are presented for extrapolation of 2D partial kspace. Using the boxcar representation of low-resolution spatial data, and a geometrical representation of signal space vectors in both positive and negative phase-encode directions, a robust predictor is constructed using a series of signal space projections. Compared to some o…
▽ More
Signal space models in both phase-encode, and frequency-encode directions are presented for extrapolation of 2D partial kspace. Using the boxcar representation of low-resolution spatial data, and a geometrical representation of signal space vectors in both positive and negative phase-encode directions, a robust predictor is constructed using a series of signal space projections. Compared to some of the existing phase-correction methods that require acquisition of a pre-determined set of fractional kspace lines, the proposed predictor is found to be more efficient, due to its capability of exhibiting an equivalent degree of performance using only half the number of fractional lines. Robust filtering of noisy data is achieved using a second signal space model in the frequency-encode direction, bypassing the requirement of a prior highpass filtering operation. The signal space is constructed from Fourier Transformed samples of each row in the low-resolution image. A set of FIR filters are estimated by fitting a least squares model to this signal space. Partial kspace extrapolation using the FIR filters is shown to result in artifact-free reconstruction, particularly in respect of Gibbs ringing and streaking type artifacts.
△ Less
Submitted 11 March, 2013;
originally announced March 2013.
-
Power Controlled Adaptive Sum-Capacity of Fading MACs with Distributed CSI
Authors:
Sibi Raj B. Pillai,
Bikash K. Dey,
Yash Deshpande,
Krishnamoorthy Iyer
Abstract:
We consider the problem of finding optimal, fair and distributed power-rate strategies to achieve the sum capacity of the Gaussian multiple-access block-fading channel. In here, the transmitters have access to only their own fading coefficients, while the receiver has global access to all the fading coefficients. Outage is not permitted in any communication block. The resulting average sum-through…
▽ More
We consider the problem of finding optimal, fair and distributed power-rate strategies to achieve the sum capacity of the Gaussian multiple-access block-fading channel. In here, the transmitters have access to only their own fading coefficients, while the receiver has global access to all the fading coefficients. Outage is not permitted in any communication block. The resulting average sum-throughput is also known as `power-controlled adaptive sum-capacity', which appears as an open problem in literature.
This paper presents the power-controlled adaptive sum-capacity of a wide-class of popular MAC models. In particular, we propose a power-rate strategy in the presence of distributed channel state information (CSI), which is throughput optimal when all the users have identical channel statistics. The proposed scheme also has an efficient implementation using successive cancellation and rate-splitting. We propose an upperbound when the channel laws are not identical. Furthermore, the optimal schemes are extended to situations in which each transmitter has additional finite-rate partial CSI on the link quality of others.
△ Less
Submitted 23 August, 2012;
originally announced August 2012.
-
Number of Measurements in Sparse Signal Recovery
Authors:
Paul Tune,
Sibiraj Bhaskaran Pillai,
Stephen Hanly
Abstract:
We analyze the asymptotic performance of sparse signal recovery from noisy measurements. In particular, we generalize some of the existing results for the Gaussian case to subgaussian and other ensembles. An achievable result is presented for the linear sparsity regime. A converse on the number of required measurements in the sub-linear regime is also presented, which cover many of the widely us…
▽ More
We analyze the asymptotic performance of sparse signal recovery from noisy measurements. In particular, we generalize some of the existing results for the Gaussian case to subgaussian and other ensembles. An achievable result is presented for the linear sparsity regime. A converse on the number of required measurements in the sub-linear regime is also presented, which cover many of the widely used measurement ensembles. Our converse idea makes use of a correspondence between compressed sensing ideas and compound channels in information theory.
△ Less
Submitted 28 April, 2009;
originally announced April 2009.