Search | arXiv e-print repository

RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement

Authors: Raman Jha, Adithya Lenka, Mani Ramanagopal, Aswin Sankaranarayanan, Kaushik Mitra

Abstract: In nighttime conditions, high noise levels and bright illumination sources degrade image quality, making low-light image enhancement challenging. Thermal images provide complementary information, offering richer textures and structural details. We propose RT-X Net, a cross-attention network that fuses RGB and thermal images for nighttime image enhancement. We leverage self-attention networks for f… ▽ More In nighttime conditions, high noise levels and bright illumination sources degrade image quality, making low-light image enhancement challenging. Thermal images provide complementary information, offering richer textures and structural details. We propose RT-X Net, a cross-attention network that fuses RGB and thermal images for nighttime image enhancement. We leverage self-attention networks for feature extraction and a cross-attention mechanism for fusion to effectively integrate information from both modalities. To support research in this domain, we introduce the Visible-Thermal Image Enhancement Evaluation (V-TIEE) dataset, comprising 50 co-located visible and thermal images captured under diverse nighttime conditions. Extensive evaluations on the publicly available LLVIP dataset and our V-TIEE dataset demonstrate that RT-X Net outperforms state-of-the-art methods in low-light image enhancement. The code and the V-TIEE can be found here https://github.com/jhakrraman/rt-xnet. △ Less

Submitted 30 May, 2025; originally announced May 2025.

Comments: Accepted at ICIP 2025

arXiv:2504.13151 [pdf, ps, other]

MIB: A Mechanistic Interpretability Benchmark

Authors: Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov

Abstract: How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization… ▽ More How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization track compares methods that locate the model components - and connections between them - most important for performing a task (e.g., attribution patching or information flow routes). The causal variable localization track compares methods that featurize a hidden vector, e.g., sparse autoencoders (SAEs) or distributed alignment search (DAS), and align those features to a task-relevant causal variable. Using MIB, we find that attribution and mask optimization methods perform best on circuit localization. For causal variable localization, we find that the supervised DAS method performs best, while SAE features are not better than neurons, i.e., non-featurized hidden vectors. These findings illustrate that MIB enables meaningful comparisons, and increases our confidence that there has been real progress in the field. △ Less

Submitted 9 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: Accepted to ICML 2025. Project website at https://mib-bench.github.io

arXiv:2502.02598 [pdf, ps, other]

On the distribution of the strongly multiplicative function $2^{ω(n)}$ on the set of natural numbers

Authors: K. Venkatasubbareddy, A. Sankaranarayanan

Abstract: In this paper, we study the distribution of the sequence of integers $2^{ω(n)}$ under the assumption of the strong Riemann hypothesis, where $ω(n)$ denotes the number of distinct prime divisors of $n$. We provide an asymptotic formula for the sum $\displaystyle\sum_{n\leq x}2^{ω(n)}$ under this assumption. We study the sum $\displaystyle\sum_{n\leq x}2^{ω(n)}$ unconditionally too. In this paper, we study the distribution of the sequence of integers $2^{ω(n)}$ under the assumption of the strong Riemann hypothesis, where $ω(n)$ denotes the number of distinct prime divisors of $n$. We provide an asymptotic formula for the sum $\displaystyle\sum_{n\leq x}2^{ω(n)}$ under this assumption. We study the sum $\displaystyle\sum_{n\leq x}2^{ω(n)}$ unconditionally too. △ Less

Submitted 21 January, 2025; originally announced February 2025.

arXiv:2501.08618 [pdf, other]

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

Authors: Aruna Sankaranarayanan, Dylan Hadfield-Menell, Aaron Mueller

Abstract: All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise s… ▽ More All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise solely from exposure to large-scale language distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the underlying grammars to conform to either hierarchical or linear/positional rules. Using these grammars, we first observe that language models show distinct behaviors on hierarchical versus linearly structured inputs. Then, we find that the components responsible for processing hierarchical grammars are distinct from those that process linear grammars; we causally verify this in ablation experiments. Finally, we observe that hierarchy-selective components are also active on nonce grammars; this suggests that hierarchy sensitivity is not tied to meaning, nor in-distribution inputs. △ Less

Submitted 15 January, 2025; originally announced January 2025.

arXiv:2409.12162 [pdf, other]

doi 10.1109/ICCVW54120.2021.00133

Precise Forecasting of Sky Images Using Spatial Warping

Authors: Leron Julian, Aswin C. Sankaranarayanan

Abstract: The intermittency of solar power, due to occlusion from cloud cover, is one of the key factors inhibiting its widespread use in both commercial and residential settings. Hence, real-time forecasting of solar irradiance for grid-connected photovoltaic systems is necessary to schedule and allocate resources across the grid. Ground-based imagers that capture wide field-of-view images of the sky are c… ▽ More The intermittency of solar power, due to occlusion from cloud cover, is one of the key factors inhibiting its widespread use in both commercial and residential settings. Hence, real-time forecasting of solar irradiance for grid-connected photovoltaic systems is necessary to schedule and allocate resources across the grid. Ground-based imagers that capture wide field-of-view images of the sky are commonly used to monitor cloud movement around a particular site in an effort to forecast solar irradiance. However, these wide FOV imagers capture a distorted image of sky image, where regions near the horizon are heavily compressed. This hinders the ability to precisely predict cloud motion near the horizon which especially affects prediction over longer time horizons. In this work, we combat the aforementioned constraint by introducing a deep learning method to predict a future sky image frame with higher resolution than previous methods. Our main contribution is to derive an optimal warping method to counter the adverse affects of clouds at the horizon, and learn a framework for future sky image prediction which better determines cloud evolution for longer time horizons. △ Less

Submitted 18 September, 2024; originally announced September 2024.

arXiv:2409.12016 [pdf, other]

Computational Imaging for Long-Term Prediction of Solar Irradiance

Authors: Leron Julian, Haejoon Lee, Soummya Kar, Aswin C. Sankaranarayanan

Abstract: The occlusion of the sun by clouds is one of the primary sources of uncertainties in solar power generation, and is a factor that affects the wide-spread use of solar power as a primary energy source. Real-time forecasting of cloud movement and, as a result, solar irradiance is necessary to schedule and allocate energy across grid-connected photovoltaic systems. Previous works monitored cloud move… ▽ More The occlusion of the sun by clouds is one of the primary sources of uncertainties in solar power generation, and is a factor that affects the wide-spread use of solar power as a primary energy source. Real-time forecasting of cloud movement and, as a result, solar irradiance is necessary to schedule and allocate energy across grid-connected photovoltaic systems. Previous works monitored cloud movement using wide-angle field of view imagery of the sky. However, such images have poor resolution for clouds that appear near the horizon, which reduces their effectiveness for long term prediction of solar occlusion. Specifically, to be able to predict occlusion of the sun over long time periods, clouds that are near the horizon need to be detected, and their velocities estimated precisely. To enable such a system, we design and deploy a catadioptric system that delivers wide-angle imagery with uniform spatial resolution of the sky over its field of view. To enable prediction over a longer time horizon, we design an algorithm that uses carefully selected spatio-temporal slices of the imagery using estimated wind direction and velocity as inputs. Using ray-tracing simulations as well as a real testbed deployed outdoors, we show that the system is capable of predicting solar occlusion as well as irradiance for tens of minutes in the future, which is an order of magnitude improvement over prior work. △ Less

Submitted 18 September, 2024; originally announced September 2024.

arXiv:2408.01416 [pdf, other]

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Authors: Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov

Abstract: Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the… ▽ More Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the basic causal units underlying these mechanisms are often not explicitly defined. In this paper, we propose a perspective on interpretability research grounded in causal mediation analysis. Specifically, we describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) employed, as well as methods used to search over mediators. We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate depending on the goals of a given study. We argue that this framing yields a more cohesive narrative of the field, as well as actionable insights for future work. Specifically, we recommend a focus on discovering new mediators with better trade-offs between human-interpretability and compute-efficiency, and which can uncover more sophisticated abstractions from neural networks than the primarily linear mediators employed in current work. We also argue for more standardized evaluations that enable principled comparisons across mediator types, such that we can better understand when particular causal units are better suited to particular use cases. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.18259 [pdf, ps, other]

Higher symmetric power $L$-functions and their Fourier coefficients

Authors: Kampamolla Venkatasubbareddy Ayyadurai Sankaranarayanan

Abstract: Let $H_k$ be the set of all normalized primitive holomorphic cusp forms of even integral weight $k\geq 2$ for the full modular group $SL(2, \mathbb{Z})$, and let $j\geq 3$ be any fixed integer. For $f\in H_k$, we write $λ_{{\rm{sym}^j }f}(n)$ for the $n^\textit{th}$ normalized Fourier coefficient of $L(s,{\rm{sym}}^j f)$. In this article, we establish an asymptotic formula for the sum… ▽ More Let $H_k$ be the set of all normalized primitive holomorphic cusp forms of even integral weight $k\geq 2$ for the full modular group $SL(2, \mathbb{Z})$, and let $j\geq 3$ be any fixed integer. For $f\in H_k$, we write $λ_{{\rm{sym}^j }f}(n)$ for the $n^\textit{th}$ normalized Fourier coefficient of $L(s,{\rm{sym}}^j f)$. In this article, we establish an asymptotic formula for the sum $$\begin{equation} \sum_{\substack{n=a_1^2+a_2^2+\ldots+a_6^2\leq x\\ \left(a_1,a_2,\ldots, a_6\right)\in \mathbb{Z}^6}} λ_{\rm{sym}^j f}^2(n), \end{equation}$$ with an improved error term. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2309.00243 [pdf, ps, other]

On the Rankin-Selberg $L$-function related to the Godement-Jacquet $L$-function II

Authors: Amrinder Kaur, Ayyadurai Sankaranarayanan

Abstract: In this paper, we consider the $k$-th Riesz mean for the coefficients of the Rankin-Selberg $L$-function $L_{f \times f}(s)$ related to the Godement-Jacquet $L$-function with respect to $SL(n,\mathbb{Z})$. We establish an asymptotic formula for the $k$-th Riesz mean with an improved range and a better error term. As a result, we get an asymptotic relation for the partial sum of the coefficients of… ▽ More In this paper, we consider the $k$-th Riesz mean for the coefficients of the Rankin-Selberg $L$-function $L_{f \times f}(s)$ related to the Godement-Jacquet $L$-function with respect to $SL(n,\mathbb{Z})$. We establish an asymptotic formula for the $k$-th Riesz mean with an improved range and a better error term. As a result, we get an asymptotic relation for the partial sum of the coefficients of $L_{f \times f}(s)$. △ Less

Submitted 1 September, 2023; originally announced September 2023.

MSC Class: 11F30; 11N75

arXiv:2308.03191 [pdf, other]

The Facebook Algorithm's Active Role in Climate Advertisement Delivery

Authors: Aruna Sankaranarayanan, Erik Hemberg, Una-May O'Reilly

Abstract: Communication strongly influences attitudes on climate change. Within sponsored communication, high spend and high reach advertising dominates. In the advertising ecosystem we can distinguish actors with adversarial stances: organizations with contrarian or advocacy communication goals, who direct the advertisement delivery algorithm to launch ads in different destinations by specifying targets an… ▽ More Communication strongly influences attitudes on climate change. Within sponsored communication, high spend and high reach advertising dominates. In the advertising ecosystem we can distinguish actors with adversarial stances: organizations with contrarian or advocacy communication goals, who direct the advertisement delivery algorithm to launch ads in different destinations by specifying targets and campaign objectives. We present an observational (N=275,632) and a controlled (N=650) study which collectively indicate that the advertising delivery algorithm could itself be an actor, asserting statistically significant influence over advertisement destinations, characterized by U.S. state, gender type, or age range. This algorithmic behaviour may not entirely be understood by the advertising platform (and its creators). These findings have implications for climate communications and misinformation research, revealing that targeting intentions are not always fulfilled as requested and that delivery itself could be manipulated. △ Less

Submitted 7 August, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

arXiv:2307.13439 [pdf, ps, other]

On the coefficients of $\ell$-fold product $L$-function

Authors: Ayyadurai Sankaranarayanan, Lalit Vaishya

Abstract: Let $f \in S_{k}(SL_2(\mathbb{Z}))$ be a normalized Hecke eigenforms of integral weight $k$ for the full modular group. In the article, we study the average behaviour of Fourier coefficients of $\ell$-fold product $L$-function. More precisely, we establish the asymptotics of power moments associated to the sequence $\{λ_{f \otimes f \otimes \cdots \otimes_{\ell} f}(n)\}_{n- {\rm squarefree}}$ wher… ▽ More Let $f \in S_{k}(SL_2(\mathbb{Z}))$ be a normalized Hecke eigenforms of integral weight $k$ for the full modular group. In the article, we study the average behaviour of Fourier coefficients of $\ell$-fold product $L$-function. More precisely, we establish the asymptotics of power moments associated to the sequence $\{λ_{f \otimes f \otimes \cdots \otimes_{\ell} f}(n)\}_{n- {\rm squarefree}}$ where ${f \otimes f \otimes \cdots \otimes_{\ell} f}$ denotes the $\ell$-fold product of $f$. As a consequence, we prove results concerning the behaviour of sign changes associated to these sequences for odd $\ell$-fold product $L$-function. A similar result also holds for the sequence $\{λ_{f \otimes f \otimes \cdots \otimes_{\ell} f}(n)\}_{n \in \mathbb{N}}$. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 13 pages, Any comments or suggestions are welcome

MSC Class: 11F30; 11F11; 11M06; 11N37

arXiv:2306.15953 [pdf, other]

Angle Sensitive Pixels for Lensless Imaging on Spherical Sensors

Authors: Yi Hua, Yongyi Zhao, Aswin C. Sankaranarayanan

Abstract: We propose OrbCam, a lensless architecture for imaging with spherical sensors. Prior work in lensless imager techniques have focused largely on using planar sensors; for such designs, it is important to use a modulation element, e.g. amplitude or phase masks, to construct a invertible imaging system. In contrast, we show that the diversity of pixel orientations on a curved surface is sufficient to… ▽ More We propose OrbCam, a lensless architecture for imaging with spherical sensors. Prior work in lensless imager techniques have focused largely on using planar sensors; for such designs, it is important to use a modulation element, e.g. amplitude or phase masks, to construct a invertible imaging system. In contrast, we show that the diversity of pixel orientations on a curved surface is sufficient to improve the conditioning of the mapping between the scene and the sensor. Hence, when imaging on a spherical sensor, all pixels can have the same angular response function such that the lensless imager is comprised of pixels that are identical to each other and differ only in their orientations. We provide the computational tools for the design of the angular response of the pixels in a spherical sensor that leads to well-conditioned and noise-robust measurements. We validate our design in both simulation and a lab prototype. The implications of our design is that the lensless imaging can be enabled easily for curved and flexible surfaces thereby opening up a new set of application domains. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2303.13362 [pdf, ps, other]

On certain kernel functions and shifted convolution sums of the Fourier coefficients

Authors: K. Venkatasubbareddy, A. Sankaranarayanan

Abstract: We study the behavior of the shifted convolution sum involving fourth power of the Fourier coefficients of holomorphic cusp forms with a weight function to be the $k$-full kernel function for any fixed integer $k\geq2$. We study the behavior of the shifted convolution sum involving fourth power of the Fourier coefficients of holomorphic cusp forms with a weight function to be the $k$-full kernel function for any fixed integer $k\geq2$. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2302.08161 [pdf, ps, other]

doi 10.1016/j.jnt.2023.08.006

The Selberg-Delange method and mean value of arithmetic functions over short intervals

Authors: Amrinder Kaur, Ayyadurai Sankaranarayanan

Abstract: In this paper, we establish a mean value result of arithmetic functions over shorter intervals with the Selberg-Delange method using the Hooley-Huxley contour. In this paper, we establish a mean value result of arithmetic functions over shorter intervals with the Selberg-Delange method using the Hooley-Huxley contour. △ Less

Submitted 16 February, 2023; originally announced February 2023.

MSC Class: 11N37

Journal ref: Journal of Number Theory, Volume 255, 2024, Pages 37-61

arXiv:2207.00945 [pdf, other]

PS$^2$F: Polarized Spiral Point Spread Function for Single-Shot 3D Sensing

Authors: Bhargav Ghanekar, Vishwanath Saragadam, Dushyant Mehra, Anna-Karin Gustavsson, Aswin Sankaranarayanan, Ashok Veeraraghavan

Abstract: We propose a compact snapshot monocular depth estimation technique that relies on an engineered point spread function (PSF). Traditional approaches used in microscopic super-resolution imaging such as the Double-Helix PSF (DHPSF) are ill-suited for scenes that are more complex than a sparse set of point light sources. We show, using the Cramér-Rao lower bound, that separating the two lobes of the… ▽ More We propose a compact snapshot monocular depth estimation technique that relies on an engineered point spread function (PSF). Traditional approaches used in microscopic super-resolution imaging such as the Double-Helix PSF (DHPSF) are ill-suited for scenes that are more complex than a sparse set of point light sources. We show, using the Cramér-Rao lower bound, that separating the two lobes of the DHPSF and thereby capturing two separate images leads to a dramatic increase in depth accuracy. A special property of the phase mask used for generating the DHPSF is that a separation of the phase mask into two halves leads to a spatial separation of the two lobes. We leverage this property to build a compact polarization-based optical setup, where we place two orthogonal linear polarizers on each half of the DHPSF phase mask and then capture the resulting image with a polarization-sensitive camera. Results from simulations and a lab prototype demonstrate that our technique achieves up to $50\%$ lower depth error compared to state-of-the-art designs including the DHPSF and the Tetrapod PSF, with little to no loss in spatial resolution. △ Less

Submitted 4 August, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

Comments: 12 pages, 12 figures

arXiv:2206.01491 [pdf, ps, other]

On the average behavior of the Fourier coefficients of $j^{th}$ symmetric power $L$-function over a certain sequences of positive integers

Authors: Anubhav Sharma, Ayyadurai Sankaranarayanan

Abstract: In this paper, we investigate the average behavior of the $n^{th}$ normalized Fourier coefficients of the $j^{th}$ ($j \geq 2$ be any fixed integer) symmetric power $L$-function (i.e., $L(s,sym^{j}f)$), attached to a primitive holomorphic cusp form $f$ of weight $k$ for the full modular group $SL(2,\mathbb{Z})$ over a certain sequences of positive integers. Precisely, we prove an asymptotic formul… ▽ More In this paper, we investigate the average behavior of the $n^{th}$ normalized Fourier coefficients of the $j^{th}$ ($j \geq 2$ be any fixed integer) symmetric power $L$-function (i.e., $L(s,sym^{j}f)$), attached to a primitive holomorphic cusp form $f$ of weight $k$ for the full modular group $SL(2,\mathbb{Z})$ over a certain sequences of positive integers. Precisely, we prove an asymptotic formula with an error term for the sum $$\sum_{\stackrel{a_{1}^{2}+a_{2}^{2}+a_{3}^{2}+a_{4}^{2}+a_{5}^{2}+a_{6}^{2}\leq {x}}{(a_{1},a_{2},a_{3},a_{4},a_{5},a_{6})\in\mathbb{Z}^{6}}}λ^{2}_{sym^{j}f}(a_{1}^{2}+a_{2}^{2}+a_{3}^{2}+a_{4}^{2}+a_{5}^{2}+a_{6}^{2}),$$ where $x$ is sufficiently large, and $$L(s,sym^{j}f):=\sum_{n=1}^{\infty}\dfrac{λ_{sym^{j}f}(n)}{n^{s}}.$$ When $j=2$, the error term which we obtain, improves the earlier known result. △ Less

Submitted 3 June, 2022; originally announced June 2022.

MSC Class: 11M06; 11F11; 11F30

arXiv:2204.05300 [pdf, other]

Single-Photon Structured Light

Authors: Varun Sundar, Sizhuo Ma, Aswin C. Sankaranarayanan, Mohit Gupta

Abstract: We present a novel structured light technique that uses Single Photon Avalanche Diode (SPAD) arrays to enable 3D scanning at high-frame rates and low-light levels. This technique, called "Single-Photon Structured Light", works by sensing binary images that indicates the presence or absence of photon arrivals during each exposure; the SPAD array is used in conjunction with a high-speed binary proje… ▽ More We present a novel structured light technique that uses Single Photon Avalanche Diode (SPAD) arrays to enable 3D scanning at high-frame rates and low-light levels. This technique, called "Single-Photon Structured Light", works by sensing binary images that indicates the presence or absence of photon arrivals during each exposure; the SPAD array is used in conjunction with a high-speed binary projector, with both devices operated at speeds as high as 20~kHz. The binary images that we acquire are heavily influenced by photon noise and are easily corrupted by ambient sources of light. To address this, we develop novel temporal sequences using error correction codes that are designed to be robust to short-range effects like projector and camera defocus as well as resolution mismatch between the two devices. Our lab prototype is capable of 3D imaging in challenging scenarios involving objects with extremely low albedo or undergoing fast motion, as well as scenes under strong ambient illumination. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted at CVPR 2022 (poster). 26 pages, 23 figures

arXiv:2203.14214 [pdf, other]

Enhancing Speckle Statistics for Imaging inside Scattering Media

Authors: Wei-Yu Chen, Matthew O'Toole, Aswin C. Sankaranarayanan, Anat Levin

Abstract: We exploit memory effect speckle correlations for the imaging of incoherent linear (single-photon) fluorescent sources behind scattering tissue. While memory effect-based imaging techniques have been heavily studied in the past, for thick scattering layers and complex illumination patterns these correlations are weak, limiting the practice applicability of the idea. In this work, we introduce a Sp… ▽ More We exploit memory effect speckle correlations for the imaging of incoherent linear (single-photon) fluorescent sources behind scattering tissue. While memory effect-based imaging techniques have been heavily studied in the past, for thick scattering layers and complex illumination patterns these correlations are weak, limiting the practice applicability of the idea. In this work, we introduce a Spatial Light Modulator (SLM) between the tissue sample and the imaging sensor and capture multiple modulations of the speckle pattern. We show that by correctly designing the modulation pattern and the reconstruction algorithm we can greatly enhance statistical correlations in the data. We exploit this to demonstrate the reconstruction of mega-pixel wide fluorescent patterns behind scattering tissue. △ Less

Submitted 16 July, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

arXiv:2202.12883 [pdf, other]

Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video

Authors: Matthew Groh, Aruna Sankaranarayanan, Nikhil Singh, Dong Young Kim, Andrew Lippman, Rosalind Picard

Abstract: Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video recordings. The conventional wisdom in communication theory predicts people will fall for fake news more often when the same version of a story is presented as a video versus text. We conduct 5 pre-registered r… ▽ More Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video recordings. The conventional wisdom in communication theory predicts people will fall for fake news more often when the same version of a story is presented as a video versus text. We conduct 5 pre-registered randomized experiments with 2,215 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, question framings, and media modalities. We find base rates of misinformation minimally influence discernment and deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover across all experiments, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content. △ Less

Submitted 15 January, 2024; v1 submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.00873 [pdf, ps, other]

On the sum of the twisted Fourier coefficients of Maass forms by Möbius function

Authors: K Venkatasubbareddy, Amrinder Kaur, Ayyadurai Sankaranarayanan

Abstract: In this paper, we study non-trivial upper bounds for the sum $\sum \limits_{n \in S} |λ_f(n)|$ where $f$ is a normalized Maass eigencusp form for the full modular group, $λ_f(n)$ is the $n$th normalized Fourier coefficient of $f$ and $S$ is a proper subset of positive integers in $[1,x]$ with certain properties. In this paper, we study non-trivial upper bounds for the sum $\sum \limits_{n \in S} |λ_f(n)|$ where $f$ is a normalized Maass eigencusp form for the full modular group, $λ_f(n)$ is the $n$th normalized Fourier coefficient of $f$ and $S$ is a proper subset of positive integers in $[1,x]$ with certain properties. △ Less

Submitted 9 February, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

arXiv:2201.02000 [pdf, ps, other]

doi 10.46298/hrj.2023.10747

Godement-Jacquet $L$-function, some conjectures and some consequences

Authors: Amrinder Kaur, Ayyadurai Sankaranarayanan

Abstract: In this paper, we investigate the mean square estimate for the logarithmic derivative of the Godement--Jacquet $L$-function $L_f(s)$ assuming the Riemann hypothesis for $L_f(s)$ and Rudnick--Sarnak conjecture. In this paper, we investigate the mean square estimate for the logarithmic derivative of the Godement--Jacquet $L$-function $L_f(s)$ assuming the Riemann hypothesis for $L_f(s)$ and Rudnick--Sarnak conjecture. △ Less

Submitted 4 April, 2022; v1 submitted 6 January, 2022; originally announced January 2022.

Journal ref: Hardy-Ramanujan Journal, Volume 45, 2023

arXiv:2109.14450 [pdf, other]

Programmable Spectral Filter Arrays using Phase Spatial Light Modulator

Authors: Vishwanath Saragadam, Vijay Rengarajan, Ryuichi Tadano, Tuo Zhuang, Hideki Oyaizu, Jun Murayama, Aswin C. Sankaranarayanan

Abstract: Spatially varying spectral modulation can be implemented using a liquid crystal spatial light modulator (SLM) since it provides an array of liquid crystal cells, each of which can be purposed to act as a programmable spectral filter array. However, such an optical setup suffers from strong optical aberrations due to the unintended phase modulation, precluding spectral modulation at high spatial re… ▽ More Spatially varying spectral modulation can be implemented using a liquid crystal spatial light modulator (SLM) since it provides an array of liquid crystal cells, each of which can be purposed to act as a programmable spectral filter array. However, such an optical setup suffers from strong optical aberrations due to the unintended phase modulation, precluding spectral modulation at high spatial resolutions. In this work, we propose a novel computational approach for the practical implementation of phase SLMs for implementing spatially varying spectral filters. We provide a careful and systematic analysis of the aberrations arising out of phase SLMs for the purposes of spatially varying spectral modulation. The analysis naturally leads us to a set of "good patterns" that minimize the optical aberrations. We then train a deep network that overcomes any residual aberrations, thereby achieving ideal spectral modulation at high spatial resolution. We show a number of unique operating points with our prototype including dynamic spectral filtering, material classification, and single- and multi-image hyperspectral imaging. △ Less

Submitted 11 December, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2109.08061 [pdf, other]

Invertible Frowns: Video-to-Video Facial Emotion Translation

Authors: Ian Magnusson, Aruna Sankaranarayanan, Andrew Lippman

Abstract: We present Wav2Lip-Emotion, a video-to-video translation architecture that modifies facial expressions of emotion in videos of speakers. Previous work modifies emotion in images, uses a single image to produce a video with animated emotion, or puppets facial expressions in videos with landmarks from a reference video. However, many use cases such as modifying an actor's performance in post-product… ▽ More We present Wav2Lip-Emotion, a video-to-video translation architecture that modifies facial expressions of emotion in videos of speakers. Previous work modifies emotion in images, uses a single image to produce a video with animated emotion, or puppets facial expressions in videos with landmarks from a reference video. However, many use cases such as modifying an actor's performance in post-production, coaching individuals to be more animated speakers, or touching up emotion in a teleconference require a video-to-video translation approach. We explore a method to maintain speakers' lip movements, identity, and pose while translating their expressed emotion. Our approach extends an existing multi-modal lip synchronization architecture to modify the speaker's emotion using L1 reconstruction and pre-trained emotion objectives. We also propose a novel automated emotion evaluation approach and corroborate it with a user study. These find that we succeed in modifying emotion while maintaining lip synchronization. Visual quality is somewhat diminished, with a trade off between greater emotion modification and visual quality between model variants. Nevertheless, we demonstrate (1) that facial expressions of emotion can be modified with nothing other than L1 reconstruction and pre-trained emotion objectives and (2) that our automated emotion evaluation approach aligns with human judgements. △ Less

Submitted 22 October, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: 9 pages, 2 figures, 4 tables, accepted at ADGD @ ACM Multimedia 2021

arXiv:2108.07966 [pdf, other]

A Simple Framework for 3D Lensless Imaging with Programmable Masks

Authors: Yucheng Zheng, Yi Hua, Aswin C. Sankaranarayanan, M. Salman Asif

Abstract: Lensless cameras provide a framework to build thin imaging systems by replacing the lens in a conventional camera with an amplitude or phase mask near the sensor. Existing methods for lensless imaging can recover the depth and intensity of the scene, but they require solving computationally-expensive inverse problems. Furthermore, existing methods struggle to recover dense scenes with large depth… ▽ More Lensless cameras provide a framework to build thin imaging systems by replacing the lens in a conventional camera with an amplitude or phase mask near the sensor. Existing methods for lensless imaging can recover the depth and intensity of the scene, but they require solving computationally-expensive inverse problems. Furthermore, existing methods struggle to recover dense scenes with large depth variations. In this paper, we propose a lensless imaging system that captures a small number of measurements using different patterns on a programmable mask. In this context, we make three contributions. First, we present a fast recovery algorithm to recover textures on a fixed number of depth planes in the scene. Second, we consider the mask design problem, for programmable lensless cameras, and provide a design template for optimizing the mask patterns with the goal of improving depth estimation. Third, we use a refinement network as a post-processing step to identify and remove artifacts in the reconstruction. These modifications are evaluated extensively with experimental results on a lensless camera prototype to showcase the performance benefits of the optimized masks and recovery algorithms over the state of the art. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: Supplementary material available at https://github.com/CSIPlab/Programmable3Dcam.git

Journal ref: International Conference on Computer Vision (ICCV) 2021

arXiv:2104.04785 [pdf, other]

Generating Physically-Consistent Satellite Imagery for Climate Visualizations

Authors: Björn Lütjens, Brandon Leshchinskiy, Océane Boulais, Farrukh Chishtie, Natalia Díaz-Rodríguez, Margaux Masson-Forsythe, Ana Mata-Payerro, Christian Requena-Mesa, Aruna Sankaranarayanan, Aaron Piña, Yarin Gal, Chedy Raïssi, Alexander Lavin, Dava Newman

Abstract: Deep generative vision models are now able to synthesize realistic-looking satellite imagery. But, the possibility of hallucinations prevents their adoption for risk-sensitive applications, such as generating materials for communicating climate change. To demonstrate this issue, we train a generative adversarial network (pix2pixHD) to create synthetic satellite imagery of future flooding and refor… ▽ More Deep generative vision models are now able to synthesize realistic-looking satellite imagery. But, the possibility of hallucinations prevents their adoption for risk-sensitive applications, such as generating materials for communicating climate change. To demonstrate this issue, we train a generative adversarial network (pix2pixHD) to create synthetic satellite imagery of future flooding and reforestation events. We find that a pure deep learning-based model can generate photorealistic flood visualizations but hallucinates floods at locations that were not susceptible to flooding. To address this issue, we propose to condition and evaluate generative vision models on segmentation maps of physics-based flood models. We show that our physics-conditioned model outperforms the pure deep learning-based model and a handcrafted baseline. We evaluate the generalization capability of our method to different remote sensing data and different climate-related events (reforestation). We publish our code and dataset which includes the data for a third case study of melting Arctic sea ice and $>$30,000 labeled HD image triplets -- or the equivalent of 5.5 million images at 128x128 pixels -- for segmentation guided image-to-image translation in Earth observation. Code and data is available at \url{https://github.com/blutjens/eie-earth-public}. △ Less

Submitted 21 October, 2024; v1 submitted 10 April, 2021; originally announced April 2021.

Comments: arXiv admin note: text overlap with arXiv:2010.08103

arXiv:2012.14495 [pdf, other]

SASSI -- Super-Pixelated Adaptive Spatio-Spectral Imaging

Authors: Vishwanath Saragadam, Michael DeZeeuw, Richard Baraniuk, Ashok Veeraraghavan, Aswin Sankaranarayanan

Abstract: We introduce a novel video-rate hyperspectral imager with high spatial, and temporal resolutions. Our key hypothesis is that spectral profiles of pixels in a super-pixel of an oversegmented image tend to be very similar. Hence, a scene-adaptive spatial sampling of an hyperspectral scene, guided by its super-pixel segmented image, is capable of obtaining high-quality reconstructions. To achieve thi… ▽ More We introduce a novel video-rate hyperspectral imager with high spatial, and temporal resolutions. Our key hypothesis is that spectral profiles of pixels in a super-pixel of an oversegmented image tend to be very similar. Hence, a scene-adaptive spatial sampling of an hyperspectral scene, guided by its super-pixel segmented image, is capable of obtaining high-quality reconstructions. To achieve this, we acquire an RGB image of the scene, compute its super-pixels, from which we generate a spatial mask of locations where we measure high-resolution spectrum. The hyperspectral image is subsequently estimated by fusing the RGB image and the spectral measurements using a learnable guided filtering approach. Due to low computational complexity of the superpixel estimation step, our setup can capture hyperspectral images of the scenes with little overhead over traditional snapshot hyperspectral cameras, but with significantly higher spatial and spectral resolutions. We validate the proposed technique with extensive simulations as well as a lab prototype that measures hyperspectral video at a spatial resolution of $600 \times 900$ pixels, at a spectral resolution of 10 nm over visible wavebands, and achieving a frame rate at $18$fps. △ Less

Submitted 28 December, 2020; originally announced December 2020.

arXiv:2005.00946 [pdf, other]

Towards Occlusion-Aware Multifocal Displays

Authors: Jen-Hao Rick Chang, Anat Levin, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan

Abstract: The human visual system uses numerous cues for depth perception, including disparity, accommodation, motion parallax and occlusion. It is incumbent upon virtual-reality displays to satisfy these cues to provide an immersive user experience. Multifocal displays, one of the classic approaches to satisfy the accommodation cue, place virtual content at multiple focal planes, each at a di erent depth.… ▽ More The human visual system uses numerous cues for depth perception, including disparity, accommodation, motion parallax and occlusion. It is incumbent upon virtual-reality displays to satisfy these cues to provide an immersive user experience. Multifocal displays, one of the classic approaches to satisfy the accommodation cue, place virtual content at multiple focal planes, each at a di erent depth. However, the content on focal planes close to the eye do not occlude those farther away; this deteriorates the occlusion cue as well as reduces contrast at depth discontinuities due to leakage of the defocus blur. This paper enables occlusion-aware multifocal displays using a novel ConeTilt operator that provides an additional degree of freedom -- tilting the light cone emitted at each pixel of the display panel. We show that, for scenes with relatively simple occlusion con gurations, tilting the light cones provides the same e ect as physical occlusion. We demonstrate that ConeTilt can be easily implemented by a phase-only spatial light modulator. Using a lab prototype, we show results that demonstrate the presence of occlusion cues and the increased contrast of the display at depth edges. △ Less

Submitted 2 May, 2020; originally announced May 2020.

Comments: SIGGRAPH 2020

arXiv:2004.08614 [pdf, other]

Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships

Authors: Kuldeep Kulkarni, Tejas Gokhale, Rajhans Singh, Pavan Turaga, Aswin Sankaranarayanan

Abstract: Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially unde… ▽ More Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially under uncertainty due to weather, occlusion, or noise. On the other hand, algorithms that can synthesize images from sparse labelmaps or sketches are highly desirable as tools that can guide content creators and artists to quickly generate scenes by simply specifying locations of a few objects. In this paper, we address the problem of complex scene completion from sparse labelmaps. Under this setting, very few details about the scene (30\% of object instances) are available as input for image synthesis. We propose a two-stage deep network based method, called `Halluci-Net', that learns co-occurence relationships between objects in scenes, and then exploits these relationships to produce a dense and complete labelmap. The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image. The proposed method is evaluated on the Cityscapes dataset and it outperforms two baselines methods on performance metrics like Fréchet Inception Distance (FID), semantic segmentation accuracy, and similarity in object co-occurrences. We also show qualitative results on a subset of ADE20K dataset that contains bedroom images. △ Less

Submitted 20 May, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

Comments: Accepted to AI for Content Creation Workshop @CVPR 2021

arXiv:1912.06102 [pdf, other]

Photosequencing of Motion Blur using Short and Long Exposures

Authors: Vijay Rengarajan, Shuo Zhao, Ruiwen Zhen, John Glotzbach, Hamid Sheikh, Aswin C. Sankaranarayanan

Abstract: Photosequencing aims to transform a motion blurred image to a sequence of sharp images. This problem is challenging due to the inherent ambiguities in temporal ordering as well as the recovery of lost spatial textures due to blur. Adopting a computational photography approach, we propose to capture two short exposure images, along with the original blurred long exposure image to aid in the aforeme… ▽ More Photosequencing aims to transform a motion blurred image to a sequence of sharp images. This problem is challenging due to the inherent ambiguities in temporal ordering as well as the recovery of lost spatial textures due to blur. Adopting a computational photography approach, we propose to capture two short exposure images, along with the original blurred long exposure image to aid in the aforementioned challenges. Post-capture, we recover the sharp photosequence using a novel blur decomposition strategy that recursively splits the long exposure image into smaller exposure intervals. We validate the approach by capturing a variety of scenes with interesting motions using machine vision cameras programmed to capture short and long exposure sequences. Our experimental results show that the proposed method resolves both fast and fine motions better than prior works. △ Less

Submitted 11 December, 2019; originally announced December 2019.

arXiv:1911.06956 [pdf, other]

doi 10.1364/OE.381154

On Space-spectrum Uncertainty Analysis for Coded Aperture Systems

Authors: Vishwanath Saragadam, Aswin Sankaranarayanan

Abstract: We introduce and analyze the concept of space-spectrum uncertainty for certain commonly-used designs for spectrally programmable cameras. Our key finding states that, it is impossible to simultaneously capture high-resolution spatial images while programming the spectrum at high resolution. This phenomenon arises due to a Fourier relationship between the aperture used for obtaining spectrum and it… ▽ More We introduce and analyze the concept of space-spectrum uncertainty for certain commonly-used designs for spectrally programmable cameras. Our key finding states that, it is impossible to simultaneously capture high-resolution spatial images while programming the spectrum at high resolution. This phenomenon arises due to a Fourier relationship between the aperture used for obtaining spectrum and its corresponding diffraction blur in the (spatial) image. We show that the product of spatial and spectral standard deviations is lower bounded by λ/4π{ν_0} femto square-meters, where {ν_0} is the density of groves in the diffraction grating and λ is the wavelength of light. Experiments with a lab prototype for simultaneously measuring spectrum and image validate our findings and its implication for spectral filtering. △ Less

Submitted 15 November, 2019; originally announced November 2019.

Comments: 14 pages

arXiv:1905.04815 [pdf, other]

doi 10.1109/ICCP48838.2020.9105281

Programmable Spectrometry -- Per-pixel Classification of Materials using Learned Spectral Filters

Authors: Vishwanath Saragadam, Aswin C. Sankaranarayanan

Abstract: Many materials have distinct spectral profiles. This facilitates estimation of the material composition of a scene at each pixel by first acquiring its hyperspectral image, and subsequently filtering it using a bank of spectral profiles. This process is inherently wasteful since only a set of linear projections of the acquired measurements contribute to the classification task. We propose a novel… ▽ More Many materials have distinct spectral profiles. This facilitates estimation of the material composition of a scene at each pixel by first acquiring its hyperspectral image, and subsequently filtering it using a bank of spectral profiles. This process is inherently wasteful since only a set of linear projections of the acquired measurements contribute to the classification task. We propose a novel programmable camera that is capable of producing images of a scene with an arbitrary spectral filter. We use this camera to optically implement the spectral filtering of the scene's hyperspectral image with the bank of spectral profiles needed to perform per-pixel material classification. This provides gains both in terms of acquisition speed --- since only the relevant measurements are acquired --- and in signal-to-noise ratio --- since we invariably avoid narrowband filters that are light inefficient. Given training data, we use a range of classical and modern techniques including SVMs and neural networks to identify the bank of spectral profiles that facilitate material classification. We verify the method in simulations on standard datasets as well as real data using a lab prototype of the camera. △ Less

Submitted 12 May, 2019; originally announced May 2019.

arXiv:1811.12481 [pdf, other]

Learning to Separate Multiple Illuminants in a Single Image

Authors: Zhuo Hui, Ayan Chakrabarti, Kalyan Sunkavalli, Aswin C. Sankaranarayanan

Abstract: We present a method to separate a single image captured under two illuminants, with different spectra, into the two images corresponding to the appearance of the scene under each individual illuminant. We do this by training a deep neural network to predict the per-pixel reflectance chromaticity of the scene, which we use in conjunction with a previous flash/no-flash image-based separation algorit… ▽ More We present a method to separate a single image captured under two illuminants, with different spectra, into the two images corresponding to the appearance of the scene under each individual illuminant. We do this by training a deep neural network to predict the per-pixel reflectance chromaticity of the scene, which we use in conjunction with a previous flash/no-flash image-based separation algorithm to produce the final two output images. We design our reflectance chromaticity network and loss functions by incorporating intuitions from the physics of image formation. We show that this leads to significantly better performance than other single image techniques and even approaches the quality of the two image separation method. △ Less

Submitted 22 April, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

arXiv:1806.07437 [pdf, other]

Signal Processing Based Pile-up Compensation for Gated Single-Photon Avalanche Diodes

Authors: Adithya K. Pediredla, Aswin C. Sankaranarayanan, Mauro Buttafava, Alberto Tosi, Ashok Veeraraghavan

Abstract: Single-photon avalanche diode (SPAD) based transient imaging suffers from an aberration called pile-up. When multiple photons arrive within a single repetition period of the illuminating laser, the SPAD records only the arrival of the first photon; this leads to a bias in the recorded light transient wherein the transient response at later time-instants are under-estimated. An unfortunate conseque… ▽ More Single-photon avalanche diode (SPAD) based transient imaging suffers from an aberration called pile-up. When multiple photons arrive within a single repetition period of the illuminating laser, the SPAD records only the arrival of the first photon; this leads to a bias in the recorded light transient wherein the transient response at later time-instants are under-estimated. An unfortunate consequence of this is the need to operate the illumination at low-power levels to reduce the probability of multiple photons returning in a single period. Operating the laser at low power results in either low signal-to-noise ratio (SNR) in the measured transients or reduced frame rate due to longer exposure durations to achieve a high SNR. In this paper, we propose a signal processing-based approach to compensate pile-up in post-processing, thereby enabling high power operation of the illuminating laser. While increasing illumination does cause a fundamental information loss in the data captured by SPAD, we quantify this information loss using Cramer-Rao bound and show that the errors in our framework are only limited to this information loss. We experimentally validate our hypotheses using real data from a lab prototype. △ Less

Submitted 14 June, 2018; originally announced June 2018.

Comments: 17 pages, 11 figures

arXiv:1805.10664 [pdf, other]

doi 10.1145/3272127.3275015

Towards Multifocal Displays with Dense Focal Stacks

Authors: Jen-Hao Rick Chang, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan

Abstract: We present a virtual reality display that is capable of generating a dense collection of depth/focal planes. This is achieved by driving a focus-tunable lens to sweep a range of focal lengths at a high frequency and, subsequently, tracking the focal length precisely at microsecond time resolutions using an optical module. Precise tracking of the focal length, coupled with a high-speed display, ena… ▽ More We present a virtual reality display that is capable of generating a dense collection of depth/focal planes. This is achieved by driving a focus-tunable lens to sweep a range of focal lengths at a high frequency and, subsequently, tracking the focal length precisely at microsecond time resolutions using an optical module. Precise tracking of the focal length, coupled with a high-speed display, enables our lab prototype to generate 1600 focal planes per second. This enables a novel first-of-its-kind virtual reality multifocal display that is capable of resolving the vergence-accommodation conflict endemic to today's displays. △ Less

Submitted 22 September, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

arXiv:1801.09343 [pdf, other]

doi 10.1145/3345553

KRISM --- Krylov Subspace-based Optical Computing of Hyperspectral Images

Authors: Vishwanath Saragadam, Aswin C. Sankaranarayanan

Abstract: We present an adaptive imaging technique that optically computes a low-rank approximation of a scene's hyperspectral image, conceptualized as a matrix. Central to the proposed technique is the optical implementation of two measurement operators: a spectrally-coded imager and a spatially-coded spectrometer. By iterating between the two operators, we show that the top singular vectors and singular v… ▽ More We present an adaptive imaging technique that optically computes a low-rank approximation of a scene's hyperspectral image, conceptualized as a matrix. Central to the proposed technique is the optical implementation of two measurement operators: a spectrally-coded imager and a spatially-coded spectrometer. By iterating between the two operators, we show that the top singular vectors and singular values of a hyperspectral image can be adaptively and optically computed with only a few iterations. We present an optical design that uses pupil plane coding for implementing the two operations and show several compelling results using a lab prototype to demonstrate the effectiveness of the proposed hyperspectral imager. △ Less

Submitted 21 October, 2019; v1 submitted 26 January, 2018; originally announced January 2018.

Comments: 14 pages of main paper and 15 pages of supplementary material

Journal ref: Vishwanath Saragadam and Aswin C. Sankaranarayanan, "KRISM --- Krylov Subspace-based Optical Computing of Hyperspectral Images", ACM Trans. Graphics 38, 5 (2019), 148:1-14

arXiv:1704.05564 [pdf, other]

Illuminant Spectra-based Source Separation Using Flash Photography

Authors: Zhuo Hui, Kalyan Sunkavalli, Sunil Hadap, Aswin C. Sankaranarayanan

Abstract: Real-world lighting often consists of multiple illuminants with different spectra. Separating and manipulating these illuminants in post-process is a challenging problem that requires either significant manual input or calibrated scene geometry and lighting. In this work, we leverage a flash/no-flash image pair to analyze and edit scene illuminants based on their spectral differences. We derive a… ▽ More Real-world lighting often consists of multiple illuminants with different spectra. Separating and manipulating these illuminants in post-process is a challenging problem that requires either significant manual input or calibrated scene geometry and lighting. In this work, we leverage a flash/no-flash image pair to analyze and edit scene illuminants based on their spectral differences. We derive a novel physics-based relationship between color variations in the observed flash/no-flash intensities and the spectra and surface shading corresponding to individual scene illuminants. Our technique uses this constraint to automatically separate an image into constituent images lit by each illuminant. This separation can be used to support applications like white balancing, lighting editing, and RGB photometric stereo, where we demonstrate results that outperform state-of-the-art techniques on a wide range of images. △ Less

Submitted 26 November, 2017; v1 submitted 18 April, 2017; originally announced April 2017.

arXiv:1703.09912 [pdf, other]

One Network to Solve Them All --- Solving Linear Inverse Problems using Deep Projection Models

Authors: J. H. Rick Chang, Chun-Liang Li, Barnabas Poczos, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan

Abstract: While deep learning methods have achieved state-of-the-art performance in many challenging inverse problems like image inpainting and super-resolution, they invariably involve problem-specific training of the networks. Under this approach, different problems require different networks. In scenarios where we need to solve a wide variety of problems, e.g., on a mobile camera, it is inefficient and c… ▽ More While deep learning methods have achieved state-of-the-art performance in many challenging inverse problems like image inpainting and super-resolution, they invariably involve problem-specific training of the networks. Under this approach, different problems require different networks. In scenarios where we need to solve a wide variety of problems, e.g., on a mobile camera, it is inefficient and costly to use these specially-trained networks. On the other hand, traditional methods using signal priors can be used in all linear inverse problems but often have worse performance on challenging tasks. In this work, we provide a middle ground between the two kinds of methods --- we propose a general framework to train a single deep neural network that solves arbitrary linear inverse problems. The proposed network acts as a proximal operator for an optimization algorithm and projects non-image signals onto the set of natural images defined by the decision boundary of a classifier. In our experiments, the proposed framework demonstrates superior performance over traditional methods using a wavelet sparsity prior and achieves comparable performance of specially-trained networks on tasks including compressive sensing and pixel-wise inpainting. △ Less

Submitted 29 March, 2017; originally announced March 2017.

ACM Class: I.4.5

arXiv:1512.05278 [pdf, other]

Shape and Spatially-Varying Reflectance Estimation From Virtual Exemplars

Authors: Zhuo Hui, Aswin C Sankaranarayanan

Abstract: This paper addresses the problem of estimating the shape of objects that exhibit spatially-varying reflectance. We assume that multiple images of the object are obtained under a fixed view-point and varying illumination, i.e., the setting of photometric stereo. At the core of our techniques is the assumption that the BRDF at each pixel lies in the non-negative span of a known BRDF dictionary.This… ▽ More This paper addresses the problem of estimating the shape of objects that exhibit spatially-varying reflectance. We assume that multiple images of the object are obtained under a fixed view-point and varying illumination, i.e., the setting of photometric stereo. At the core of our techniques is the assumption that the BRDF at each pixel lies in the non-negative span of a known BRDF dictionary.This assumption enables a per-pixel surface normal and BRDF estimation framework that is computationally tractable and requires no initialization in spite of the underlying problem being non-convex. Our estimation framework first solves for the surface normal at each pixel using a variant of example-based photometric stereo. We design an efficient multi-scale search strategy for estimating the surface normal and subsequently, refine this estimate using a gradient descent procedure. Given the surface normal estimate, we solve for the spatially-varying BRDF by constraining the BRDF at each pixel to be in the span of the BRDF dictionary, here, we use additional priors to further regularize the solution. A hallmark of our approach is that it does not require iterative optimization techniques nor the need for careful initialization, both of which are endemic to most state-of-the-art techniques. We showcase the performance of our technique on a wide range of simulated and real scenes where we outperform competing methods. △ Less

Submitted 20 September, 2016; v1 submitted 16 December, 2015; originally announced December 2015.

Comments: PAMI minor revision. arXiv admin note: substantial text overlap with arXiv:1503.04265

arXiv:1511.05174 [pdf, other]

doi 10.1109/TIP.2018.2869719

Cross-scale predictive dictionaries

Authors: Vishwanath Saragadam, Xin Li, Aswin Sankaranarayanan

Abstract: Sparse representations using data dictionaries provide an efficient model particularly for signals that do not enjoy alternate analytic sparsifying transformations. However, solving inverse problems with sparsifying dictionaries can be computationally expensive, especially when the dictionary under consideration has a large number of atoms. In this paper, we incorporate additional structure on to… ▽ More Sparse representations using data dictionaries provide an efficient model particularly for signals that do not enjoy alternate analytic sparsifying transformations. However, solving inverse problems with sparsifying dictionaries can be computationally expensive, especially when the dictionary under consideration has a large number of atoms. In this paper, we incorporate additional structure on to dictionary-based sparse representations for visual signals to enable speedups when solving sparse approximation problems. The specific structure that we endow onto sparse models is that of a multi-scale modeling where the sparse representation at each scale is constrained by the sparse representation at coarser scales. We show that this cross-scale predictive model delivers significant speedups, often in the range of 10-60$\times$, with little loss in accuracy for linear inverse problems associated with images, videos, and light fields. △ Less

Submitted 3 September, 2018; v1 submitted 16 November, 2015; originally announced November 2015.

Comments: 12 pages

arXiv:1509.00116 [pdf, other]

FlatCam: Thin, Bare-Sensor Cameras using Coded Aperture and Computation

Authors: M. Salman Asif, Ali Ayremlou, Aswin Sankaranarayanan, Ashok Veeraraghavan, Richard Baraniuk

Abstract: FlatCam is a thin form-factor lensless camera that consists of a coded mask placed on top of a bare, conventional sensor array. Unlike a traditional, lens-based camera where an image of the scene is directly recorded on the sensor pixels, each pixel in FlatCam records a linear combination of light from multiple scene elements. A computational algorithm is then used to demultiplex the recorded meas… ▽ More FlatCam is a thin form-factor lensless camera that consists of a coded mask placed on top of a bare, conventional sensor array. Unlike a traditional, lens-based camera where an image of the scene is directly recorded on the sensor pixels, each pixel in FlatCam records a linear combination of light from multiple scene elements. A computational algorithm is then used to demultiplex the recorded measurements and reconstruct an image of the scene. FlatCam is an instance of a coded aperture imaging system; however, unlike the vast majority of related work, we place the coded mask extremely close to the image sensor that can enable a thin system. We employ a separable mask to ensure that both calibration and image reconstruction are scalable in terms of memory requirements and computational complexity. We demonstrate the potential of the FlatCam design using two prototypes: one at visible wavelengths and one at infrared wavelengths. △ Less

Submitted 27 January, 2016; v1 submitted 31 August, 2015; originally announced September 2015.

Comments: 12 pages, 10 figures

arXiv:1504.04085 [pdf, other]

FPA-CS: Focal Plane Array-based Compressive Imaging in Short-wave Infrared

Authors: Huaijin Chen, M. Salman Asif, Aswin C. Sankaranarayanan, Ashok Veeraraghavan

Abstract: Cameras for imaging in short and mid-wave infrared spectra are significantly more expensive than their counterparts in visible imaging. As a result, high-resolution imaging in those spectrum remains beyond the reach of most consumers. Over the last decade, compressive sensing (CS) has emerged as a potential means to realize inexpensive short-wave infrared cameras. One approach for doing this is th… ▽ More Cameras for imaging in short and mid-wave infrared spectra are significantly more expensive than their counterparts in visible imaging. As a result, high-resolution imaging in those spectrum remains beyond the reach of most consumers. Over the last decade, compressive sensing (CS) has emerged as a potential means to realize inexpensive short-wave infrared cameras. One approach for doing this is the single-pixel camera (SPC) where a single detector acquires coded measurements of a high-resolution image. A computational reconstruction algorithm is then used to recover the image from these coded measurements. Unfortunately, the measurement rate of a SPC is insufficient to enable imaging at high spatial and temporal resolutions. We present a focal plane array-based compressive sensing (FPA-CS) architecture that achieves high spatial and temporal resolutions. The idea is to use an array of SPCs that sense in parallel to increase the measurement rate, and consequently, the achievable spatio-temporal resolution of the camera. We develop a proof-of-concept prototype in the short-wave infrared using a sensor with 64$\times$ 64 pixels; the prototype provides a 4096$\times$ increase in the measurement rate compared to the SPC and achieves a megapixel resolution at video rate using CS techniques. △ Less

Submitted 15 April, 2015; originally announced April 2015.

Comments: appears in IEEE Conf. Computer Vision and Pattern Recognition, 2015

arXiv:1503.04267 [pdf, other]

LiSens --- A Scalable Architecture for Video Compressive Sensing

Authors: Jian Wang, Mohit Gupta, Aswin C. Sankaranarayanan

Abstract: The measurement rate of cameras that take spatially multiplexed measurements by using spatial light modulators (SLM) is often limited by the switching speed of the SLMs. This is especially true for single-pixel cameras where the photodetector operates at a rate that is many orders-of-magnitude greater than the SLM. We study the factors that determine the measurement rate for such spatial multiplex… ▽ More The measurement rate of cameras that take spatially multiplexed measurements by using spatial light modulators (SLM) is often limited by the switching speed of the SLMs. This is especially true for single-pixel cameras where the photodetector operates at a rate that is many orders-of-magnitude greater than the SLM. We study the factors that determine the measurement rate for such spatial multiplexing cameras (SMC) and show that increasing the number of pixels in the device improves the measurement rate, but there is an optimum number of pixels (typically, few thousands) beyond which the measurement rate does not increase. This motivates the design of LiSens, a novel imaging architecture, that replaces the photodetector in the single-pixel camera with a 1D linear array or a line-sensor. We illustrate the optical architecture underlying LiSens, build a prototype, and demonstrate results of a range of indoor and outdoor scenes. LiSens delivers on the promise of SMCs: imaging at a megapixel resolution, at video rate, using an inexpensive low-resolution sensor. △ Less

Submitted 14 March, 2015; originally announced March 2015.

Comments: IEEE Intl. Conf. Computational Photography, 2015

arXiv:1503.04265 [pdf, other]

A Dictionary-based Approach for Estimating Shape and Spatially-Varying Reflectance

Authors: Zhuo Hui, Aswin C. Sankaranarayanan

Abstract: We present a technique for estimating the shape and reflectance of an object in terms of its surface normals and spatially-varying BRDF. We assume that multiple images of the object are obtained under fixed view-point and varying illumination, i.e, the setting of photometric stereo. Assuming that the BRDF at each pixel lies in the non-negative span of a known BRDF dictionary, we derive a per-pixel… ▽ More We present a technique for estimating the shape and reflectance of an object in terms of its surface normals and spatially-varying BRDF. We assume that multiple images of the object are obtained under fixed view-point and varying illumination, i.e, the setting of photometric stereo. Assuming that the BRDF at each pixel lies in the non-negative span of a known BRDF dictionary, we derive a per-pixel surface normal and BRDF estimation framework that requires neither iterative optimization techniques nor careful initialization, both of which are endemic to most state-of-the-art techniques. We showcase the performance of our technique on a wide range of simulated and real scenes where we outperform competing methods. △ Less

Submitted 14 March, 2015; originally announced March 2015.

Comments: IEEE Intl. Conf. Computational Photography, 2015

arXiv:1503.03231 [pdf, ps, other]

Adaptive-Rate Sparse Signal Reconstruction With Application in Compressive Background Subtraction

Authors: Joao F. C. Mota, Nikos Deligiannis, Aswin C. Sankaranarayanan, Volkan Cevher, Miguel R. D. Rodrigues

Abstract: We propose and analyze an online algorithm for reconstructing a sequence of signals from a limited number of linear measurements. The signals are assumed sparse, with unknown support, and evolve over time according to a generic nonlinear dynamical model. Our algorithm, based on recent theoretical results for $\ell_1$-$\ell_1$ minimization, is recursive and computes the number of measurements to be… ▽ More We propose and analyze an online algorithm for reconstructing a sequence of signals from a limited number of linear measurements. The signals are assumed sparse, with unknown support, and evolve over time according to a generic nonlinear dynamical model. Our algorithm, based on recent theoretical results for $\ell_1$-$\ell_1$ minimization, is recursive and computes the number of measurements to be taken at each time on-the-fly. As an example, we apply the algorithm to compressive video background subtraction, a problem that can be stated as follows: given a set of measurements of a sequence of images with a static background, simultaneously reconstruct each image while separating its foreground from the background. The performance of our method is illustrated on sequences of real images: we observe that it allows a dramatic reduction in the number of measurements with respect to state-of-the-art compressive background subtraction schemes. △ Less

Submitted 11 March, 2015; originally announced March 2015.

Comments: submitted to IEEE Trans. Signal Processing

arXiv:1503.02727 [pdf, other]

Video Compressive Sensing for Spatial Multiplexing Cameras using Motion-Flow Models

Authors: Aswin C. Sankaranarayanan, Lina Xu, Christoph Studer, Yun Li, Kevin Kelly, Richard G. Baraniuk

Abstract: Spatial multiplexing cameras (SMCs) acquire a (typically static) scene through a series of coded projections using a spatial light modulator (e.g., a digital micro-mirror device) and a few optical sensors. This approach finds use in imaging applications where full-frame sensors are either too expensive (e.g., for short-wave infrared wavelengths) or unavailable. Existing SMC systems reconstruct sta… ▽ More Spatial multiplexing cameras (SMCs) acquire a (typically static) scene through a series of coded projections using a spatial light modulator (e.g., a digital micro-mirror device) and a few optical sensors. This approach finds use in imaging applications where full-frame sensors are either too expensive (e.g., for short-wave infrared wavelengths) or unavailable. Existing SMC systems reconstruct static scenes using techniques from compressive sensing (CS). For videos, however, existing acquisition and recovery methods deliver poor quality. In this paper, we propose the CS multi-scale video (CS-MUVI) sensing and recovery framework for high-quality video acquisition and recovery using SMCs. Our framework features novel sensing matrices that enable the efficient computation of a low-resolution video preview, while enabling high-resolution video recovery using convex optimization. To further improve the quality of the reconstructed videos, we extract optical-flow estimates from the low-resolution previews and impose them as constraints in the recovery procedure. We demonstrate the efficacy of our CS-MUVI framework for a host of synthetic and real measured SMC video data, and we show that high-quality videos can be recovered at roughly $60\times$ compression. △ Less

Submitted 5 August, 2015; v1 submitted 9 March, 2015; originally announced March 2015.

Comments: in SIAM Journal on Imaging Sciences, 2015

arXiv:1401.7715 [pdf, other]

Video Compressive Sensing for Dynamic MRI

Authors: Jianing V. Shi, Wotao Yin, Aswin C. Sankaranarayanan, Richard G. Baraniuk

Abstract: We present a video compressive sensing framework, termed kt-CSLDS, to accelerate the image acquisition process of dynamic magnetic resonance imaging (MRI). We are inspired by a state-of-the-art model for video compressive sensing that utilizes a linear dynamical system (LDS) to model the motion manifold. Given compressive measurements, the state sequence of an LDS can be first estimated using syst… ▽ More We present a video compressive sensing framework, termed kt-CSLDS, to accelerate the image acquisition process of dynamic magnetic resonance imaging (MRI). We are inspired by a state-of-the-art model for video compressive sensing that utilizes a linear dynamical system (LDS) to model the motion manifold. Given compressive measurements, the state sequence of an LDS can be first estimated using system identification techniques. We then reconstruct the observation matrix using a joint structured sparsity assumption. In particular, we minimize an objective function with a mixture of wavelet sparsity and joint sparsity within the observation matrix. We derive an efficient convex optimization algorithm through alternating direction method of multipliers (ADMM), and provide a theoretical guarantee for global convergence. We demonstrate the performance of our approach for video compressive sensing, in terms of reconstruction accuracy. We also investigate the impact of various sampling strategies. We apply this framework to accelerate the acquisition process of dynamic MRI and show it achieves the best reconstruction accuracy with the least computational time compared with existing algorithms in the literature. △ Less

Submitted 1 February, 2014; v1 submitted 29 January, 2014; originally announced January 2014.

Comments: 30 pages, 9 figures

MSC Class: 90-08; 90C25; 65P99; 65K10; 93E10; 93E12

arXiv:1311.4041 [pdf, ps, other]

The Mean Square of Divisor Function

Authors: Chaohua Jia, Ayyadurai Sankaranarayanan

Abstract: Let $d(n)$ be the divisor function. In 1916, S. Ramanujan stated but without proof that $$\sum_{n\leq x}d^2(n)=xP(\log x)+E(x), $$ where $P(y)$ is a cubic polynomial in $y$ and $$ E(x)=O(x^{{3\over 5}+ε}), $$ where $ε$ is a sufficiently small positive constant. He also stated that, assuming the Riemann Hypothesis(RH), $$ E(x)=O(x^{{1\over 2}+ε}). $$ In 1922, B. M. Wilson proved the above result… ▽ More Let $d(n)$ be the divisor function. In 1916, S. Ramanujan stated but without proof that $$\sum_{n\leq x}d^2(n)=xP(\log x)+E(x), $$ where $P(y)$ is a cubic polynomial in $y$ and $$ E(x)=O(x^{{3\over 5}+ε}), $$ where $ε$ is a sufficiently small positive constant. He also stated that, assuming the Riemann Hypothesis(RH), $$ E(x)=O(x^{{1\over 2}+ε}). $$ In 1922, B. M. Wilson proved the above result unconditionally. The direct application of the RH would produce $$ E(x)=O(x^{1\over 2}(\log x)^5\log\log x). $$ In 2003, K. Ramachandra and A. Sankaranarayanan proved the above result without any assumption. In this paper, we shall prove $$ E(x)=O(x^{1\over 2}(\log x)^5). $$ △ Less

Submitted 22 March, 2014; v1 submitted 16 November, 2013; originally announced November 2013.

Comments: This is a revised version

arXiv:1309.7567 [pdf, ps, other]

On the properties of a sequence concerning binomial coefficients

Authors: Daeyeoul Kim, Ayyadurai Sankaranarayanan, Zhi-Hong Sun

Abstract: For $n\ge 3$ let $f(n)$ be the least positive integer $k$ such that $\binom nk>\frac{2^n}{n+1}$. In this paper we investigate the properties of $f(n)$. For $n\ge 3$ let $f(n)$ be the least positive integer $k$ such that $\binom nk>\frac{2^n}{n+1}$. In this paper we investigate the properties of $f(n)$. △ Less

Submitted 29 September, 2013; originally announced September 2013.

Comments: 7 pages

MSC Class: 11B65; 11B83; 05A20

arXiv:1303.4778 [pdf, other]

Greedy Feature Selection for Subspace Clustering

Authors: Eva L. Dyer, Aswin C. Sankaranarayanan, Richard G. Baraniuk

Abstract: Unions of subspaces provide a powerful generalization to linear subspace models for collections of high-dimensional data. To learn a union of subspaces from a collection of data, sets of signals in the collection that belong to the same subspace must be identified in order to obtain accurate estimates of the subspace structures present in the data. Recently, sparse recovery methods have been shown… ▽ More Unions of subspaces provide a powerful generalization to linear subspace models for collections of high-dimensional data. To learn a union of subspaces from a collection of data, sets of signals in the collection that belong to the same subspace must be identified in order to obtain accurate estimates of the subspace structures present in the data. Recently, sparse recovery methods have been shown to provide a provable and robust strategy for exact feature selection (EFS)--recovering subsets of points from the ensemble that live in the same subspace. In parallel with recent studies of EFS with L1-minimization, in this paper, we develop sufficient conditions for EFS with a greedy method for sparse signal recovery known as orthogonal matching pursuit (OMP). Following our analysis, we provide an empirical study of feature selection strategies for signals living on unions of subspaces and characterize the gap between sparse recovery methods and nearest neighbor (NN)-based approaches. In particular, we demonstrate that sparse recovery methods provide significant advantages over NN methods and the gap between the two approaches is particularly pronounced when the sampling of subspaces in the dataset is sparse. Our results suggest that OMP may be employed to reliably recover exact feature sets in a number of regimes where NN approaches fail to reveal the subspace membership of points in the ensemble. △ Less

Submitted 3 July, 2013; v1 submitted 19 March, 2013; originally announced March 2013.

Comments: 32 pages, 7 figures, 1 table

Journal ref: Journal of Machine Learning Research, Vol.14, Issue 1, pp. 2487-2517, January 2013

arXiv:1201.4895 [pdf, other]

Compressive Acquisition of Dynamic Scenes

Authors: Aswin C Sankaranarayanan, Pavan K Turaga, Rama Chellappa, Richard G Baraniuk

Abstract: Compressive sensing (CS) is a new approach for the acquisition and recovery of sparse signals and images that enables sampling rates significantly below the classical Nyquist rate. Despite significant progress in the theory and methods of CS, little headway has been made in compressive video acquisition and recovery. Video CS is complicated by the ephemeral nature of dynamic events, which makes di… ▽ More Compressive sensing (CS) is a new approach for the acquisition and recovery of sparse signals and images that enables sampling rates significantly below the classical Nyquist rate. Despite significant progress in the theory and methods of CS, little headway has been made in compressive video acquisition and recovery. Video CS is complicated by the ephemeral nature of dynamic events, which makes direct extensions of standard CS imaging architectures and signal models difficult. In this paper, we develop a new framework for video CS for dynamic textured scenes that models the evolution of the scene as a linear dynamical system (LDS). This reduces the video recovery problem to first estimating the model parameters of the LDS from compressive measurements, and then reconstructing the image frames. We exploit the low-dimensional dynamic parameters (the state sequence) and high-dimensional static parameters (the observation matrix) of the LDS to devise a novel compressive measurement strategy that measures only the dynamic part of the scene at each instant and accumulates measurements over time to estimate the static parameters. This enables us to lower the compressive measurement rate considerably. We validate our approach with a range of experiments involving both video recovery, sensing hyper-spectral data, and classification of dynamic scenes from compressive data. Together, these applications demonstrate the effectiveness of the approach. △ Less

Submitted 26 June, 2013; v1 submitted 23 January, 2012; originally announced January 2012.

Showing 1–50 of 51 results for author: Sankaranarayanan, A