-
WB LUTs: Contrastive Learning for White Balancing Lookup Tables
Authors:
Sai Kumar Reddy Manne,
Michael Wan
Abstract:
Automatic white balancing (AWB), one of the first steps in an integrated signal processing (ISP) pipeline, aims to correct the color cast induced by the scene illuminant. An incorrect white balance (WB) setting or AWB failure can lead to an undesired blue or red tint in the rendered sRGB image. To address this, recent methods pose the post-capture WB correction problem as an image-to-image transla…
▽ More
Automatic white balancing (AWB), one of the first steps in an integrated signal processing (ISP) pipeline, aims to correct the color cast induced by the scene illuminant. An incorrect white balance (WB) setting or AWB failure can lead to an undesired blue or red tint in the rendered sRGB image. To address this, recent methods pose the post-capture WB correction problem as an image-to-image translation task and train deep neural networks to learn the necessary color adjustments at a lower resolution. These low resolution outputs are post-processed to generate high resolution WB corrected images, forming a bottleneck in the end-to-end run time. In this paper we present a 3D Lookup Table (LUT) based WB correction model called WB LUTs that can generate high resolution outputs in real time. We introduce a contrastive learning framework with a novel hard sample mining strategy, which improves the WB correction quality of baseline 3D LUTs by 25.5%. Experimental results demonstrate that the proposed WB LUTs perform competitively against state-of-the-art models on two benchmark datasets while being 300 times faster using 12.7 times less memory. Our model and code are available at https://github.com/skrmanne/3DLUT_sRGB_WB.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer
Authors:
Sai Kumar Reddy Manne,
Brendan Martin,
Tyler Roy,
Ryan Neilson,
Rebecca Peters,
Meghana Chillara,
Christine W. Lary,
Katherine J. Motyl,
Michael Wan
Abstract:
Osteoclast cell image analysis plays a key role in osteoporosis research, but it typically involves extensive manual image processing and hand annotations by a trained expert. In the last few years, a handful of machine learning approaches for osteoclast image analysis have been developed, but none have addressed the full instance segmentation task required to produce the same output as that of th…
▽ More
Osteoclast cell image analysis plays a key role in osteoporosis research, but it typically involves extensive manual image processing and hand annotations by a trained expert. In the last few years, a handful of machine learning approaches for osteoclast image analysis have been developed, but none have addressed the full instance segmentation task required to produce the same output as that of the human expert led process. Furthermore, none of the prior, fully automated algorithms have publicly available code, pretrained models, or annotated datasets, inhibiting reproduction and extension of their work. We present a new dataset with ~2*10^5 expert annotated mouse osteoclast masks, together with a deep learning instance segmentation method which works for both in vitro mouse osteoclast cells on plastic tissue culture plates and human osteoclast cells on bone chips. To our knowledge, this is the first work to automate the full osteoclast instance segmentation task. Our method achieves a performance of 0.82 mAP_0.5 (mean average precision at intersection-over-union threshold of 0.5) in cross validation for mouse osteoclasts. We present a novel nuclei-aware osteoclast instance segmentation training strategy (NOISe) based on the unique biology of osteoclasts, to improve the model's generalizability and boost the mAP_0.5 from 0.60 to 0.82 on human osteoclasts. We publish our annotated mouse osteoclast image dataset, instance segmentation models, and code at github.com/michaelwwan/noise to enable reproducibility and to provide a public tool to accelerate osteoporosis research.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Subtle Signals: Video-based Detection of Infant Non-nutritive Sucking as a Neurodevelopmental Cue
Authors:
Shaotong Zhu,
Michael Wan,
Sai Kumar Reddy Manne,
Emily Zimmerman,
Sarah Ostadabbas
Abstract:
Non-nutritive sucking (NNS), which refers to the act of sucking on a pacifier, finger, or similar object without nutrient intake, plays a crucial role in assessing healthy early development. In the case of preterm infants, NNS behavior is a key component in determining their readiness for feeding. In older infants, the characteristics of NNS behavior offer valuable insights into neural and motor d…
▽ More
Non-nutritive sucking (NNS), which refers to the act of sucking on a pacifier, finger, or similar object without nutrient intake, plays a crucial role in assessing healthy early development. In the case of preterm infants, NNS behavior is a key component in determining their readiness for feeding. In older infants, the characteristics of NNS behavior offer valuable insights into neural and motor development. Additionally, NNS activity has been proposed as a potential safeguard against sudden infant death syndrome (SIDS). However, the clinical application of NNS assessment is currently hindered by labor-intensive and subjective finger-in-mouth evaluations. Consequently, researchers often resort to expensive pressure transducers for objective NNS signal measurement. To enhance the accessibility and reliability of NNS signal monitoring for both clinicians and researchers, we introduce a vision-based algorithm designed for non-contact detection of NNS activity using baby monitor footage in natural settings. Our approach involves a comprehensive exploration of optical flow and temporal convolutional networks, enabling the detection and amplification of subtle infant-sucking signals. We successfully classify short video clips of uniform length into NNS and non-NNS periods. Furthermore, we investigate manual and learning-based techniques to piece together local classification results, facilitating the segmentation of longer mixed-activity videos into NNS and non-NNS segments of varying duration. Our research introduces two novel datasets of annotated infant videos, including one sourced from our clinical study featuring 19 infant subjects and 183 hours of overnight baby monitor footage.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Automatic Infant Respiration Estimation from Video: A Deep Flow-based Algorithm and a Novel Public Benchmark
Authors:
Sai Kumar Reddy Manne,
Shaotong Zhu,
Sarah Ostadabbas,
Michael Wan
Abstract:
Respiration is a critical vital sign for infants, and continuous respiratory monitoring is particularly important for newborns. However, neonates are sensitive and contact-based sensors present challenges in comfort, hygiene, and skin health, especially for preterm babies. As a step toward fully automatic, continuous, and contactless respiratory monitoring, we develop a deep-learning method for es…
▽ More
Respiration is a critical vital sign for infants, and continuous respiratory monitoring is particularly important for newborns. However, neonates are sensitive and contact-based sensors present challenges in comfort, hygiene, and skin health, especially for preterm babies. As a step toward fully automatic, continuous, and contactless respiratory monitoring, we develop a deep-learning method for estimating respiratory rate and waveform from plain video footage in natural settings. Our automated infant respiration flow-based network (AIRFlowNet) combines video-extracted optical flow input and spatiotemporal convolutional processing tuned to the infant domain. We support our model with the first public annotated infant respiration dataset with 125 videos (AIR-125), drawn from eight infant subjects, set varied pose, lighting, and camera conditions. We include manual respiration annotations and optimize AIRFlowNet training on them using a novel spectral bandpass loss function. When trained and tested on the AIR-125 infant data, our method significantly outperforms other state-of-the-art methods in respiratory rate estimation, achieving a mean absolute error of $\sim$2.9 breaths per minute, compared to $\sim$4.7--6.2 for other public models designed for adult subjects and more uniform environments.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems
Authors:
Mariam Elgamal,
Doug Carmean,
Elnaz Ansari,
Okay Zed,
Ramesh Peri,
Srilatha Manne,
Udit Gupta,
Gu-Yeon Wei,
David Brooks,
Gage Hills,
Carole-Jean Wu
Abstract:
As computing hardware becomes more specialized, designing environmentally sustainable computing systems requires accounting for both hardware and software parameters. Our goal is to design low carbon computing systems while maintaining a competitive level of performance and operational efficiency. Despite previous carbon modeling efforts for computing systems, there is a distinct lack of holistic…
▽ More
As computing hardware becomes more specialized, designing environmentally sustainable computing systems requires accounting for both hardware and software parameters. Our goal is to design low carbon computing systems while maintaining a competitive level of performance and operational efficiency. Despite previous carbon modeling efforts for computing systems, there is a distinct lack of holistic design strategies to simultaneously optimize for carbon, performance, power and energy. In this work, we take a data-driven approach to characterize the carbon impact (quantified in units of CO2e) of various artificial intelligence (AI) and extended reality (XR) production-level hardware and application use-cases. We propose a holistic design exploration framework to optimize and design for carbon-efficient computing systems and hardware. Our frameworks identifies significant opportunities for carbon efficiency improvements in application-specific and general purpose hardware design and optimization. Using our framework, we demonstrate 10$\times$ carbon efficiency improvement for specialized AI and XR accelerators (quantified by a key metric, tCDP: the product of total CO2e and total application execution time), up to 21% total life cycle carbon savings for existing general-purpose hardware and applications due to hardware over-provisioning, and up to 7.86$\times$ carbon efficiency improvement using advanced 3D integration techniques for resource-constrained XR systems.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
Socio-Technological Challenges and Opportunities: Paths Forward
Authors:
Carole-Jean Wu,
Srilatha Manne,
Parthasarathy Ranganathan,
Sarah Bird,
Shane Greenstein
Abstract:
Advancements in digital technologies have a bootstrapping effect. The past fifty years of technological innovations from the computer architecture community have brought innovations and orders-of-magnitude efficiency improvements that engender use cases that were not previously possible -- stimulating novel application domains and increasing uses and deployments at an ever-faster pace. Consequentl…
▽ More
Advancements in digital technologies have a bootstrapping effect. The past fifty years of technological innovations from the computer architecture community have brought innovations and orders-of-magnitude efficiency improvements that engender use cases that were not previously possible -- stimulating novel application domains and increasing uses and deployments at an ever-faster pace. Consequently, computing technologies have fueled significant economic growth, creating education opportunities, enabling access to a wider and more diverse spectrum of information, and, at the same time, connecting people of differing needs in the world together. Technology must be offered that is inclusive of the world's physical, cultural, and economic diversity, and which is manufactured, used, and recycled with environmental sustainability at the forefront. For the next decades to come, we envision significant cross-disciplinary efforts to build a circular development cycle by placing pervasive connectivity, sustainability, and demographic inclusion at the design forefront in order to sustain and expand the benefits of a technologically rich society. We hope this work will inspire our computing community to take broader and more holistic approaches when developing technological solutions to serve people from different parts of the world.
△ Less
Submitted 15 August, 2021;
originally announced August 2021.
-
Efficient Screening of Diseased Eyes based on Fundus Autofluorescence Images using Support Vector Machine
Authors:
Shanmukh Reddy Manne,
Kiran Kumar Vupparaboina,
Gowtham Chowdary Gudapati,
Ram Anudeep Peddoju,
Chandra Prakash Konkimalla,
Abhilash Goud,
Sarforaz Bin Bashar,
Jay Chhablani,
Soumya Jana
Abstract:
A variety of vision ailments are associated with geographic atrophy (GA) in the foveal region of the eye. In current clinical practice, the ophthalmologist manually detects potential presence of such GA based on fundus autofluorescence (FAF) images, and hence diagnoses the disease, when relevant. However, in view of the general scarcity of ophthalmologists relative to the large number of subjects…
▽ More
A variety of vision ailments are associated with geographic atrophy (GA) in the foveal region of the eye. In current clinical practice, the ophthalmologist manually detects potential presence of such GA based on fundus autofluorescence (FAF) images, and hence diagnoses the disease, when relevant. However, in view of the general scarcity of ophthalmologists relative to the large number of subjects seeking eyecare, especially in remote regions, it becomes imperative to develop methods to direct expert time and effort to medically significant cases. Further, subjects from either disadvantaged background or remote localities, who face considerable economic/physical barrier in consulting trained ophthalmologists, tend to seek medical attention only after being reasonably certain that an adverse condition exists. To serve the interest of both the ophthalmologist and the potential patient, we plan a screening step, where healthy and diseased eyes are algorithmically differentiated with limited input from only optometrists who are relatively more abundant in number. Specifically, an early treatment diabetic retinopathy study (ETDRS) grid is placed by an optometrist on each FAF image, based on which sectoral statistics are automatically collected. Using such statistics as features, healthy and diseased eyes are proposed to be classified by training an algorithm using available medical records. In this connection, we demonstrate the efficacy of support vector machines (SVM). Specifically, we consider SVM with linear as well as radial basis function (RBF) kernel, and observe satisfactory performance of both variants. Among those, we recommend the latter in view of its slight superiority in terms of classification accuracy (90.55% at a standard training-to-test ratio of 80:20), and practical class-conditional costs.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
A Scalable Decoder Micro-architecture for Fault-Tolerant Quantum Computing
Authors:
Poulami Das,
Christopher A. Pattison,
Srilatha Manne,
Douglas Carmean,
Krysta Svore,
Moinuddin Qureshi,
Nicolas Delfosse
Abstract:
Quantum computation promises significant computational advantages over classical computation for some problems. However, quantum hardware suffers from much higher error rates than in classical hardware. As a result, extensive quantum error correction is required to execute a useful quantum algorithm. The decoder is a key component of the error correction scheme whose role is to identify errors fas…
▽ More
Quantum computation promises significant computational advantages over classical computation for some problems. However, quantum hardware suffers from much higher error rates than in classical hardware. As a result, extensive quantum error correction is required to execute a useful quantum algorithm. The decoder is a key component of the error correction scheme whose role is to identify errors faster than they accumulate in the quantum computer and that must be implemented with minimum hardware resources in order to scale to the regime of practical applications. In this work, we consider surface code error correction, which is the most popular family of error correcting codes for quantum computing, and we design a decoder micro-architecture for the Union-Find decoding algorithm. We propose a three-stage fully pipelined hardware implementation of the decoder that significantly speeds up the decoder. Then, we optimize the amount of decoding hardware required to perform error correction simultaneously over all the logical qubits of the quantum computer. By sharing resources between logical qubits, we obtain a 67% reduction of the number of hardware units and the memory capacity is reduced by 70%. Moreover, we reduce the bandwidth required for the decoding process by a factor at least 30x using low-overhead compression algorithms. Finally, we provide numerical evidence that our optimized micro-architecture can be executed fast enough to correct errors in a quantum computer.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism
Authors:
Kevin K. Chang,
Gabriel H. Loh,
Mithuna Thottethodi,
Yasuko Eckert,
Mike O'Connor,
Srilatha Manne,
Lisa Hsu,
Lavanya Subramanian,
Onur Mutlu
Abstract:
Die-stacked DRAM has been proposed for use as a large, high-bandwidth, last-level cache with hundreds or thousands of megabytes of capacity. Not all workloads (or phases) can productively utilize this much cache space, however. Unfortunately, the unused (or under-used) cache continues to consume power due to leakage in the peripheral circuitry and periodic DRAM refresh. Dynamically adjusting the a…
▽ More
Die-stacked DRAM has been proposed for use as a large, high-bandwidth, last-level cache with hundreds or thousands of megabytes of capacity. Not all workloads (or phases) can productively utilize this much cache space, however. Unfortunately, the unused (or under-used) cache continues to consume power due to leakage in the peripheral circuitry and periodic DRAM refresh. Dynamically adjusting the available DRAM cache capacity could largely eliminate this energy overhead. However, the current proposed DRAM cache organization introduces new challenges for dynamic cache resizing. The organization differs from a conventional SRAM cache organization because it places entire cache sets and their tags within a single bank to reduce on-chip area and power overhead. Hence, resizing a DRAM cache requires remapping sets from the powered-down banks to active banks.
In this paper, we propose CRUNCH (Cache Resizing Using Native Consistent Hashing), a hardware data remapping scheme inspired by consistent hashing, an algorithm originally proposed to uniformly and dynamically distribute Internet traffic across a changing population of web servers. CRUNCH provides a load-balanced remapping of data from the powered-down banks alone to the active banks, without requiring sets from all banks to be remapped, unlike naive schemes to achieve load balancing. CRUNCH remaps only sets from the powered-down banks, so it achieves this load balancing with low bank power-up/down transition latencies. CRUNCH's combination of good load balancing and low transition latencies provides a substrate to enable efficient DRAM cache resizing.
△ Less
Submitted 1 February, 2016;
originally announced February 2016.
-
Fast Approximate Matrix Multiplication by Solving Linear Systems
Authors:
Shiva Manne,
Manjish Pal
Abstract:
In this paper, we present novel deterministic algorithms for multiplying two $n \times n$ matrices approximately. Given two matrices $A,B$ we return a matrix $C'$ which is an \emph{approximation} to $C = AB$. We consider the notion of approximate matrix multiplication in which the objective is to make the Frobenius norm of the error matrix $C-C'$ arbitrarily small. Our main contribution is to firs…
▽ More
In this paper, we present novel deterministic algorithms for multiplying two $n \times n$ matrices approximately. Given two matrices $A,B$ we return a matrix $C'$ which is an \emph{approximation} to $C = AB$. We consider the notion of approximate matrix multiplication in which the objective is to make the Frobenius norm of the error matrix $C-C'$ arbitrarily small. Our main contribution is to first reduce the matrix multiplication problem to solving a set of linear equations and then use standard techniques to find an approximate solution to that system in $\tilde{O}(n^2)$ time. To the best of our knowledge this the first examination into designing quadratic time deterministic algorithms for approximate matrix multiplication which guarantee arbitrarily low \emph{absolute error} w.r.t. Frobenius norm.
△ Less
Submitted 20 August, 2014; v1 submitted 19 August, 2014;
originally announced August 2014.