Search | arXiv e-print repository

Color-Guided Flying Pixel Correction in Depth Images

Authors: Ekamresh Vasudevan, Shashank N. Sridhara, Eduardo Pavez, Antonio Ortega, Raghavendra Singh, Srinath Kalluri

Abstract: We present a novel method to correct flying pixels within data captured by Time-of-flight (ToF) sensors. Flying pixel (FP) artifacts occur when signals from foreground and background objects reach the same sensor pixel, leading to a confident yet incorrect depth estimation in space - floating between two objects. Commercial RGB-D cameras have a complementary setup consisting of ToF sensors to capt… ▽ More We present a novel method to correct flying pixels within data captured by Time-of-flight (ToF) sensors. Flying pixel (FP) artifacts occur when signals from foreground and background objects reach the same sensor pixel, leading to a confident yet incorrect depth estimation in space - floating between two objects. Commercial RGB-D cameras have a complementary setup consisting of ToF sensors to capture depth in addition to RGB cameras. We propose a novel method to correct FPs by leveraging the aligned RGB and depth image in such RGB-D cameras to estimate the true depth values of FPs. Our method defines a 3D neighborhood around each point, representing a "field of view" that mirrors the acquisition process of ToF cameras. We propose a two-step iterative correction algorithm in which the FPs are first identified. Then, we estimate the true depth value of FPs by solving a least-squares optimization problem. Experimental results show that our proposed algorithm estimates the depth value of FPs as accurately as other algorithms in the literature. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 6 pages, 7 figures, Presented at IEEE 26th International Workshop on Multimedia Signal Processing (MMSP)

arXiv:2408.03925 [pdf, other]

STARI: STarlight Acquisition and Reflection toward Interferometry

Authors: John D. Monnier, Prachet Jain, Shashank Kalluri, James Cutler, Simone D'Amico, Glenn Lightsey, Leonid Pogorelyuk, Gautam Vasisht, Kerri Cahoy, Michael Meyer

Abstract: We present the concept for STARI: STarlight Acquisition and Reflection toward Interferometry. If launched, STARI will be the first mission to control a 3-D CubeSat formation to the few mm-level, reflect starlight over 10s to 100s of meters from one spacecraft to another, control tip-tilt with sub-arcsecond stability, and validate end-to-end performance by injecting light into a single-mode fiber.… ▽ More We present the concept for STARI: STarlight Acquisition and Reflection toward Interferometry. If launched, STARI will be the first mission to control a 3-D CubeSat formation to the few mm-level, reflect starlight over 10s to 100s of meters from one spacecraft to another, control tip-tilt with sub-arcsecond stability, and validate end-to-end performance by injecting light into a single-mode fiber. While STARI is not an interferometer, the mission will advance the Technology Readiness Levels of the essential subsystems needed for a space interferometer in the near future. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: submitted to SPIE 2024 (Yokohama)

arXiv:2408.03911 [pdf, other]

Prospects for using drones to test formation-flying CubeSat concepts, and other astronomical applications

Authors: John D. Monnier, Prachet Jain, Mayra Gutierrez, Chi Han, Sara Hezi, Shashank Kalluri, Hirsh Kabaria, Brennan Kompas, Vaishnavi Harikumar, Julian Skifstad, Janani Peri, Emmanuel Hernandez, Ramya Bhaskarapanthula, James Cutler

Abstract: Drones provide a versatile platform for remote sensing and atmospheric studies. However, strict payload mass limits and intense vibrations have proven obstacles to adoption for astronomy. We present a concept for system-level testing of a long-baseline CubeSat space interferometer using drones, taking advantage of their cm-level xyz station-keeping, 6-dof freedom of movement, large operational env… ▽ More Drones provide a versatile platform for remote sensing and atmospheric studies. However, strict payload mass limits and intense vibrations have proven obstacles to adoption for astronomy. We present a concept for system-level testing of a long-baseline CubeSat space interferometer using drones, taking advantage of their cm-level xyz station-keeping, 6-dof freedom of movement, large operational environment, access to guide stars for end-to-end testing of optical train and control algorithms, and comparable mass and power requirements. We have purchased two different drone platforms (Aurelia X6 Pro, Freefly Alta X) and present characterization studies of vibrations, flight stability, gps positioning precision, and more. We also describe our progress in sub-system development, including inter-drone laser metrology, realtime gimbal control, and LED beacon tracking. Lastly, we explore whether custom-built drone-borne telescopes could be used for interferometry of bright objects over km-level baselines using vibration-isolation platforms and a small fast delay for fringe-tracking. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: submitted to SPIE 2024 (Yokohama)

arXiv:2406.09494 [pdf, other]

The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments

Authors: Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy

Abstract: The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this datas… ▽ More The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this dataset. The dataset containing 158 hours of speech, consisting of both supervised and unsupervised mono-channel far-field recordings, was released for LD and SD tracks. Further, 12 hours of close-field mono-channel recordings were provided for the ASR track conducted on 5 Indian languages. The details of the dataset, baseline systems and the leader board results are highlighted in this paper. We have also compared our baseline models and the team's performances on evaluation data of DISPLACE-2023 to emphasize the advancements made in this second version of the challenge. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures, Interspeech 2024

arXiv:2402.04400 [pdf, other]

CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines

Authors: Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S. Kalluri, Elise L. Minto, Jason Patterson, Linying Zhang, George Hripcsak, Gamze Gürsoy, Noémie Elhadad, Karthik Natarajan

Abstract: Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabula… ▽ More Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format. △ Less

Submitted 5 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2111.08585 [pdf, other]

CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks

Authors: Chao Pang, Xinzhuo Jiang, Krishna S Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, Karthik Natarajan

Abstract: Embedding algorithms are increasingly used to represent clinical concepts in healthcare for improving machine learning tasks such as clinical phenotyping and disease prediction. Recent studies have adapted state-of-the-art bidirectional encoder representations from transformers (BERT) architecture to structured electronic health records (EHR) data for the generation of contextualized concept embed… ▽ More Embedding algorithms are increasingly used to represent clinical concepts in healthcare for improving machine learning tasks such as clinical phenotyping and disease prediction. Recent studies have adapted state-of-the-art bidirectional encoder representations from transformers (BERT) architecture to structured electronic health records (EHR) data for the generation of contextualized concept embeddings, yet do not fully incorporate temporal data across multiple clinical domains. Therefore we developed a new BERT adaptation, CEHR-BERT, to incorporate temporal information using a hybrid approach by augmenting the input to BERT using artificial time tokens, incorporating time, age, and concept embeddings, and introducing a new second learning objective for visit type. CEHR-BERT was trained on a subset of Columbia University Irving Medical Center-York Presbyterian Hospital's clinical data, which includes 2.4M patients, spanning over three decades, and tested using 4-fold cross-validation on the following prediction tasks: hospitalization, death, new heart failure (HF) diagnosis, and HF readmission. Our experiments show that CEHR-BERT outperformed existing state-of-the-art clinical BERT adaptations and baseline models across all 4 prediction tasks in both ROC-AUC and PR-AUC. CEHR-BERT also demonstrated strong transfer learning capability, as our model trained on only 5% of data outperformed comparison models trained on the entire data set. Ablation studies to better understand the contribution of each time component showed incremental gains with every element, suggesting that CEHR-BERT's incorporation of artificial time tokens, time and age embeddings with concept embeddings, and the addition of the second learning objective represents a promising approach for future BERT-based clinical embeddings. △ Less

Submitted 10 November, 2021; originally announced November 2021.

Journal ref: Proceedings of Machine Learning for Health, PMLR 158:239-260, 2021

arXiv:2011.04299 [pdf, other]

COVID-19 Patient Detection from Telephone Quality Speech Data

Authors: Kotra Venkata Sai Ritwik, Shareef Babu Kalluri, Deepu Vijayasenan

Abstract: In this paper, we try to investigate the presence of cues about the COVID-19 disease in the speech data. We use an approach that is similar to speaker recognition. Each sentence is represented as super vectors of short term Mel filter bank features for each phoneme. These features are used to learn a two-class classifier to separate the COVID-19 speech from normal. Experiments on a small dataset c… ▽ More In this paper, we try to investigate the presence of cues about the COVID-19 disease in the speech data. We use an approach that is similar to speaker recognition. Each sentence is represented as super vectors of short term Mel filter bank features for each phoneme. These features are used to learn a two-class classifier to separate the COVID-19 speech from normal. Experiments on a small dataset collected from YouTube videos show that an SVM classifier on this dataset is able to achieve an accuracy of 88.6% and an F1-Score of 92.7%. Further investigation reveals that some phone classes, such as nasals, stops, and mid vowels can distinguish the two classes better than the others. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: 6 pages, 7 figures

arXiv:2007.06021 [pdf, other]

NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling

Authors: Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy, Ragesh Rajan M, Prashant Krishnan

Abstract: Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have onl… ▽ More Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have only partial information for speaker profiling. In this paper, we attempt to overcome this limitation by developing a new dataset which has speech data from five different Indian languages along with English. The metadata information for speaker profiling applications like linguistic information, regional information, and physical characteristics of a speaker are also collected. We call this dataset as NITK-IISc Multilingual Multi-accent Speaker Profiling (NISP) dataset. The description of the dataset, potential applications, and baseline results for speaker profiling on this dataset are provided in this paper. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: 5pages, Initial version submitted to Interspeech2020

Showing 1–8 of 8 results for author: Kalluri, S