-
UnDIVE: Generalized Underwater Video Enhancement Using Generative Priors
Authors:
Suhas Srinath,
Aditya Chandrasekar,
Hemang Jamadagni,
Rajiv Soundararajan,
Prathosh A P
Abstract:
With the rise of marine exploration, underwater imaging has gained significant attention as a research topic. Underwater video enhancement has become crucial for real-time computer vision tasks in marine exploration. However, most existing methods focus on enhancing individual frames and neglect video temporal dynamics, leading to visually poor enhancements. Furthermore, the lack of ground-truth r…
▽ More
With the rise of marine exploration, underwater imaging has gained significant attention as a research topic. Underwater video enhancement has become crucial for real-time computer vision tasks in marine exploration. However, most existing methods focus on enhancing individual frames and neglect video temporal dynamics, leading to visually poor enhancements. Furthermore, the lack of ground-truth references limits the use of abundant available underwater video data in many applications. To address these issues, we propose a two-stage framework for enhancing underwater videos. The first stage uses a denoising diffusion probabilistic model to learn a generative prior from unlabeled data, capturing robust and descriptive feature representations. In the second stage, this prior is incorporated into a physics-based image formulation for spatial enhancement, while also enforcing temporal consistency between video frames. Our method enables real-time and computationally-efficient processing of high-resolution underwater videos at lower resolutions, and offers efficient enhancement in the presence of diverse water-types. Extensive experiments on four datasets show that our approach generalizes well and outperforms existing enhancement methods. Our code is available at github.com/suhas-srinath/undive.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Deep Learning-Based Brain Image Segmentation for Automated Tumour Detection
Authors:
Suman Sourabh,
Murugappan Valliappan,
Narayana Darapaneni,
Anwesh R P
Abstract:
Introduction: The present study on the development and evaluation of an automated brain tumor segmentation technique based on deep learning using the 3D U-Net model. Objectives: The objective is to leverage state-of-the-art convolutional neural networks (CNNs) on a large dataset of brain MRI scans for segmentation. Methods: The proposed methodology applies pre-processing techniques for enhanced pe…
▽ More
Introduction: The present study on the development and evaluation of an automated brain tumor segmentation technique based on deep learning using the 3D U-Net model. Objectives: The objective is to leverage state-of-the-art convolutional neural networks (CNNs) on a large dataset of brain MRI scans for segmentation. Methods: The proposed methodology applies pre-processing techniques for enhanced performance and generalizability. Results: Extensive validation on an independent dataset confirms the model's robustness and potential for integration into clinical workflows. The study emphasizes the importance of data pre-processing and explores various hyperparameters to optimize the model's performance. The 3D U-Net, has given IoUs for training and validation dataset have been 0.8181 and 0.66 respectively. Conclusion: Ultimately, this comprehensive framework showcases the efficacy of deep learning in automating brain tumour detection, offering valuable support in clinical practice.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
A Deep Look Into -- Automated Lung X-Ray Abnormality Detection System
Authors:
Nagullas KS,
Vivekanand. V,
Narayana Darapaneni,
Anwesh R P
Abstract:
Introduction: Automated Lung X-Ray Abnormality Detection System is the application which distinguish the normal x-ray images from infected x-ray images and highlight area considered for prediction, with the recent pandemic a need to have a non-conventional method and faster detecting diseases, for which X ray serves the purpose. Obectives: As of current situation any viral disease that is infectio…
▽ More
Introduction: Automated Lung X-Ray Abnormality Detection System is the application which distinguish the normal x-ray images from infected x-ray images and highlight area considered for prediction, with the recent pandemic a need to have a non-conventional method and faster detecting diseases, for which X ray serves the purpose. Obectives: As of current situation any viral disease that is infectious is potential pandemic, so there is need for cheap and early detection system. Methods: This research will help to eases the work of expert to do further analysis. Accuracy of three different preexisting models such as DenseNet, MobileNet and VGG16 were high but models over-fitted primarily due to black and white images. Results: This led to building up new method such as as V-BreathNet which gave more than 96% percent accuracy. Conclusion: Thus, it can be stated that not all state-of art CNN models can be used on B/W images. In conclusion not all state-of-art CNN models can be used on B/W images.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
CoroNetGAN: Controlled Pruning of GANs via Hypernetworks
Authors:
Aman Kumar,
Khushboo Anand,
Shubham Mandloi,
Ashutosh Mishra,
Avinash Thakur,
Neeraj Kasera,
Prathosh A P
Abstract:
Generative Adversarial Networks (GANs) have proven to exhibit remarkable performance and are widely used across many generative computer vision applications. However, the unprecedented demand for the deployment of GANs on resource-constrained edge devices still poses a challenge due to huge number of parameters involved in the generation process. This has led to focused attention on the area of co…
▽ More
Generative Adversarial Networks (GANs) have proven to exhibit remarkable performance and are widely used across many generative computer vision applications. However, the unprecedented demand for the deployment of GANs on resource-constrained edge devices still poses a challenge due to huge number of parameters involved in the generation process. This has led to focused attention on the area of compressing GANs. Most of the existing works use knowledge distillation with the overhead of teacher dependency. Moreover, there is no ability to control the degree of compression in these methods. Hence, we propose CoroNet-GAN for compressing GAN using the combined strength of differentiable pruning method via hypernetworks. The proposed method provides the advantage of performing controllable compression while training along with reducing training time by a substantial factor. Experiments have been done on various conditional GAN architectures (Pix2Pix and CycleGAN) to signify the effectiveness of our approach on multiple benchmark datasets such as Edges-to-Shoes, Horse-to-Zebra and Summer-to-Winter. The results obtained illustrate that our approach succeeds to outperform the baselines on Zebra-to-Horse and Summer-to-Winter achieving the best FID score of 32.3 and 72.3 respectively, yielding high-fidelity images across all the datasets. Additionally, our approach also outperforms the state-of-the-art methods in achieving better inference time on various smart-phone chipsets and data-types making it a feasible solution for deployment on edge devices.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
HiveLink, an IoT based Smart Bee Hive Monitoring System
Authors:
Ajwin Dsouza,
Aditya P,
Sameer Hegde
Abstract:
HiveLink, the IoT-based Smart Bee Hive Monitoring System addresses the challenges faced by beekeepers in managing the influence of environmental impact, diseases, and collapse in honey bee colonies. Integrated with advanced sensors, the system monitors temperature, humidity, hive weight, and diurnal cycle. Leveraging IoT technology, the system provides real-time data, remote connectivity, and acti…
▽ More
HiveLink, the IoT-based Smart Bee Hive Monitoring System addresses the challenges faced by beekeepers in managing the influence of environmental impact, diseases, and collapse in honey bee colonies. Integrated with advanced sensors, the system monitors temperature, humidity, hive weight, and diurnal cycle. Leveraging IoT technology, the system provides real-time data, remote connectivity, and actionable insights for beekeepers. Monitoring the hive with the system enables early disease detection, proactive interventions, and optimized hive management. Minimizing manual inspections, enhancing productivity, and promoting sustainable practices to mitigate environmental impact and support honey bee populations. Therefore, this system is a demonstration of technology-driven solution to ensure the well-being of bee hives by facilitating data-driven decision-making and contributes to the resilience of beekeeping in the face of diverse challenges.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Intelligent analysis of EEG signals to assess consumer decisions: A Study on Neuromarketing
Authors:
Nikunj Phutela,
Abhilash P,
Kaushik Sreevathsan,
B N Krupa
Abstract:
Neuromarketing is an emerging field that combines neuroscience and marketing to understand the factors that influence consumer decisions better. The study proposes a method to understand consumers' positive and negative reactions to advertisements (ads) and products by analysing electroencephalogram (EEG) signals. These signals are recorded using a low-cost single electrode headset from volunteers…
▽ More
Neuromarketing is an emerging field that combines neuroscience and marketing to understand the factors that influence consumer decisions better. The study proposes a method to understand consumers' positive and negative reactions to advertisements (ads) and products by analysing electroencephalogram (EEG) signals. These signals are recorded using a low-cost single electrode headset from volunteers belonging to the ages 18-22. A detailed subject dependent (SD) and subject independent (SI) analysis was performed employing machine learning methods like Naive Bayes (NB), Support Vector Machine (SVM), k-nearest neighbour and Decision Tree and the proposed deep learning (DL) model. SVM and NB yielded an accuracy (Acc.) of 0.63 for the SD analysis. In SI analysis, SVM performed better for the advertisement, product and gender-based analysis. Furthermore, the performance of the DL model was on par with that of SVM, especially, in product and ads-based analysis.
△ Less
Submitted 29 May, 2022;
originally announced June 2022.
-
Orthogonal Delay Scale Space Modulation: A New Technique for Wideband Time-Varying Channels
Authors:
Arunkumar K. P.,
Chandra R. Murthy
Abstract:
Orthogonal Time Frequency Space (OTFS) modulation is a recently proposed scheme for time-varying narrowband channels in terrestrial radio-frequency communications. Underwater acoustic (UWA) and ultra-wideband (UWB) communication systems, on the other hand, confront wideband time-varying channels. Unlike narrowband channels, for which time contractions or dilations due to Doppler effect can be appr…
▽ More
Orthogonal Time Frequency Space (OTFS) modulation is a recently proposed scheme for time-varying narrowband channels in terrestrial radio-frequency communications. Underwater acoustic (UWA) and ultra-wideband (UWB) communication systems, on the other hand, confront wideband time-varying channels. Unlike narrowband channels, for which time contractions or dilations due to Doppler effect can be approximated by frequency-shifts, the Doppler effect in wideband channels results in frequency-dependent non-uniform shift of signal frequencies across the band. In this paper, we develop an OTFS-like modulation scheme -- Orthogonal Delay Scale Space (ODSS) modulation -- for handling wideband time-varying channels. We derive the ODSS transmission and reception schemes from first principles. In the process, we introduce the notion of $ω$-convolution in the delay-scale space that parallels the twisted convolution used in the time-frequency space. The preprocessing 2D transformation from the Fourier-Mellin domain to the delay-scale space in ODSS, which plays the role of inverse symplectic Fourier transform (ISFFT) in OTFS, improves the bit error rate performance compared to OTFS and Orthogonal Frequency Division Multiplexing (OFDM) in wideband time-varying channels. Furthermore, since the channel matrix is rendered near-diagonal, ODSS retains the advantage of OFDM in terms of its low-complexity receiver structure.
△ Less
Submitted 8 May, 2022; v1 submitted 21 November, 2021;
originally announced November 2021.
-
Fronthaul Compression for Uplink Massive MIMO using Matrix Decomposition
Authors:
Aswathylakshmi P,
Radha Krishna Ganti
Abstract:
Massive MIMO opens up attractive possibilities for next generation wireless systems with its large number of antennas offering spatial diversity and multiplexing gain. However, the fronthaul link that connects a massive MIMO Remote Radio Head (RRH) and carries IQ samples to the Baseband Unit (BBU) of the base station can throttle the network capacity/speed if appropriate data compression technique…
▽ More
Massive MIMO opens up attractive possibilities for next generation wireless systems with its large number of antennas offering spatial diversity and multiplexing gain. However, the fronthaul link that connects a massive MIMO Remote Radio Head (RRH) and carries IQ samples to the Baseband Unit (BBU) of the base station can throttle the network capacity/speed if appropriate data compression techniques are not applied. In this paper, we propose an iterative technique for fronthaul load reduction in the uplink for massive MIMO systems that utilizes the convolution structure of the received signals. We use an alternating minimisation algorithm for blind deconvolution of the received data matrix that provides compression ratios of 30-50. In addition, the technique presented here can be used for blind decoding of OFDM signals in massive MIMO systems.
△ Less
Submitted 24 October, 2021;
originally announced October 2021.
-
Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages
Authors:
Anoop C S,
Prathosh A P,
A G Ramakrishnan
Abstract:
Building an automatic speech recognition (ASR) system from scratch requires a large amount of annotated speech data, which is difficult to collect in many languages. However, there are cases where the low-resource language shares a common acoustic space with a high-resource language having enough annotated data to build an ASR. In such cases, we show that the domain-independent acoustic models lea…
▽ More
Building an automatic speech recognition (ASR) system from scratch requires a large amount of annotated speech data, which is difficult to collect in many languages. However, there are cases where the low-resource language shares a common acoustic space with a high-resource language having enough annotated data to build an ASR. In such cases, we show that the domain-independent acoustic models learned from the high-resource language through unsupervised domain adaptation (UDA) schemes can enhance the performance of the ASR in the low-resource language. We use the specific example of Hindi in the source domain and Sanskrit in the target domain. We explore two architectures: i) domain adversarial training using gradient reversal layer (GRL) and ii) domain separation networks (DSN). The GRL and DSN architectures give absolute improvements of 6.71% and 7.32%, respectively, in word error rate over the baseline deep neural network model when trained on just 5.5 hours of data in the target domain. We also show that choosing a proper language (Telugu) in the source domain can bring further improvement. The results suggest that UDA schemes can be helpful in the development of ASR systems for low-resource languages, mitigating the hassle of collecting large amounts of annotated speech data.
△ Less
Submitted 16 September, 2021; v1 submitted 12 September, 2021;
originally announced September 2021.
-
RespVAD: Voice Activity Detection via Video-Extracted Respiration Patterns
Authors:
Arnab Kumar Mondal,
Prathosh A. P
Abstract:
Voice Activity Detection (VAD) refers to the task of identification of regions of human speech in digital signals such as audio and video. While VAD is a necessary first step in many speech processing systems, it poses challenges when there are high levels of ambient noise during the audio recording. To improve the performance of VAD in such conditions, several methods utilizing the visual informa…
▽ More
Voice Activity Detection (VAD) refers to the task of identification of regions of human speech in digital signals such as audio and video. While VAD is a necessary first step in many speech processing systems, it poses challenges when there are high levels of ambient noise during the audio recording. To improve the performance of VAD in such conditions, several methods utilizing the visual information extracted from the region surrounding the mouth/lip region of the speakers' video recording have been proposed. Even though these provide advantages over audio-only methods, they depend on faithful extraction of lip/mouth regions. Motivated by these, a new paradigm for VAD based on the fact that respiration forms the primary source of energy for speech production is proposed. Specifically, an audio-independent VAD technique using the respiration pattern extracted from the speakers' video is developed. The Respiration Pattern is first extracted from the video focusing on the abdominal-thoracic region of a speaker using an optical flow based method. Subsequently, voice activity is detected from the respiration pattern signal using neural sequence-to-sequence prediction models. The efficacy of the proposed method is demonstrated through experiments on a challenging dataset recorded in real acoustic environments and compared with four previous methods based on audio and visual cues.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Internet of Things(IoT) Based Multilevel Drunken Driving Detection and Prevention System Using Raspberry Pi 3
Authors:
Viswanatha V,
Venkata Siva Reddy R,
Ashwini Kumari P,
Pradeep Kumar S
Abstract:
In this paper, the proposed system has demonstrated three ways of detecting alcohol level in the body of the car driver and prevent car driver from driving the vehicle by turning off the ignition system. It also sends messages to concerned people. In order to detect breath alcohol level MQ-3 sensor is included in this module along with a heartbeat sensor which can detect the heart beat rate of dri…
▽ More
In this paper, the proposed system has demonstrated three ways of detecting alcohol level in the body of the car driver and prevent car driver from driving the vehicle by turning off the ignition system. It also sends messages to concerned people. In order to detect breath alcohol level MQ-3 sensor is included in this module along with a heartbeat sensor which can detect the heart beat rate of driver, facial recognition using webcam & MATLAB and a Wi-Fi module to send a message through the TCP/IP App, a Raspberry pi module to turn off the ignition and an alarm as prevention module. If a driver alcohol intake is more than the prescribed range, set by government the ignition will be made off provided either his heart beat abnormal or the driver is drowsy. In both the cases there will be a message sent to the App and from the App you can send it to family, friend, and well-wisher or nearest cop for the help. The system is developed considering the fact if driver is drunk and he needs a help, his friend can drive the car if he is not drunk. The safety of both the driver and the surroundings are aimed by this system and this aids in minimizing death cases by drunken driving and also burden on the cops.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
U-Det: A Modified U-Net architecture with bidirectional feature network for lung nodule segmentation
Authors:
Nikhil Varma Keetha,
Samson Anosh Babu P,
Chandra Sekhara Rao Annavarapu
Abstract:
Early diagnosis and analysis of lung cancer involve a precise and efficient lung nodule segmentation in computed tomography (CT) images. However, the anonymous shapes, visual features, and surroundings of the nodule in the CT image pose a challenging problem to the robust segmentation of the lung nodules. This article proposes U-Det, a resource-efficient model architecture, which is an end to end…
▽ More
Early diagnosis and analysis of lung cancer involve a precise and efficient lung nodule segmentation in computed tomography (CT) images. However, the anonymous shapes, visual features, and surroundings of the nodule in the CT image pose a challenging problem to the robust segmentation of the lung nodules. This article proposes U-Det, a resource-efficient model architecture, which is an end to end deep learning approach to solve the task at hand. It incorporates a Bi-FPN (bidirectional feature network) between the encoder and decoder. Furthermore, it uses Mish activation function and class weights of masks to enhance segmentation efficiency. The proposed model is extensively trained and evaluated on the publicly available LUNA-16 dataset consisting of 1186 lung nodules. The U-Det architecture outperforms the existing U-Net model with the Dice similarity coefficient (DSC) of 82.82% and achieves results comparable to human experts.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Fetal Head and Abdomen Measurement Using Convolutional Neural Network, Hough Transform, and Difference of Gaussian Revolved along Elliptical Path (Dogell) Algorithm
Authors:
Kezia Irene,
Aditya Yudha P.,
Harlan Haidi,
Nurul Faza,
Winston Chandra
Abstract:
The number of fetal-neonatal death in Indonesia is still high compared to developed countries. This is caused by the absence of maternal monitoring during pregnancy. This paper presents an automated measurement for fetal head circumference (HC) and abdominal circumference (AC) from the ultrasonography (USG) image. This automated measurement is beneficial to detect early fetal abnormalities during…
▽ More
The number of fetal-neonatal death in Indonesia is still high compared to developed countries. This is caused by the absence of maternal monitoring during pregnancy. This paper presents an automated measurement for fetal head circumference (HC) and abdominal circumference (AC) from the ultrasonography (USG) image. This automated measurement is beneficial to detect early fetal abnormalities during the pregnancy period. We used the convolutional neural network (CNN) method, to preprocess the USG data. After that, we approximate the head and abdominal circumference using the Hough transform algorithm and the difference of Gaussian Revolved along Elliptical Path (Dogell) Algorithm. We used the data set from national hospitals in Indonesia and for the accuracy measurement, we compared our results to the annotated images measured by professional obstetricians. The result shows that by using CNN, we reduced errors caused by a noisy image. We found that the Dogell algorithm performs better than the Hough transform algorithm in both time and accuracy. This is the first HC and AC approximation that used the CNN method to preprocess the data.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Adversarial Approximate Inference for Speech to Electroglottograph Conversion
Authors:
Prathosh A. P.,
Varun Srivastava,
Mayank Mishra
Abstract:
Speech produced by human vocal apparatus conveys substantial non-semantic information including the gender of the speaker, voice quality, affective state, abnormalities in the vocal apparatus etc. Such information is attributed to the properties of the voice source signal, which is usually estimated from the speech signal. However, most of the source estimation techniques depend heavily on the goo…
▽ More
Speech produced by human vocal apparatus conveys substantial non-semantic information including the gender of the speaker, voice quality, affective state, abnormalities in the vocal apparatus etc. Such information is attributed to the properties of the voice source signal, which is usually estimated from the speech signal. However, most of the source estimation techniques depend heavily on the goodness of the model assumptions and are prone to noise. A popular alternative is to indirectly obtain the source information through the Electroglottographic (EGG) signal that measures the electrical admittance around the vocal folds using dedicated hardware. In this paper, we address the problem of estimating the EGG signal directly from the speech signal, devoid of any hardware. Sampling from the intractable conditional distribution of the EGG signal given the speech signal is accomplished through optimization of an evidence lower bound. This is constructed via minimization of the KL-divergence between the true and the approximated posteriors of a latent variable learned using a deep neural auto-encoder that serves an informative prior. We demonstrate the efficacy of the method at generating the EGG signal by conducting several experiments on datasets comprising multiple speakers, voice qualities, noise settings and speech pathologies. The proposed method is evaluated on many benchmark metrics and is found to agree with the gold standard while proving better than the state-of-the-art algorithms on a few tasks such as epoch extraction.
△ Less
Submitted 7 September, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
QR Approximation for Massive MIMO Fronthaul Compression
Authors:
Aswathylakshmi P,
Radha Krishna Ganti
Abstract:
Massive MIMO's immense potential to serve large number of users at fast data rates also comes with the caveat of requiring tremendous processing power. This favours a centralized radio access network (C-RAN) architecture that concentrates the processing power at a common baseband unit (BBU) connected to multiple remote radio heads (RRH) via fronthaul links. The high bandwidths of 5G make the front…
▽ More
Massive MIMO's immense potential to serve large number of users at fast data rates also comes with the caveat of requiring tremendous processing power. This favours a centralized radio access network (C-RAN) architecture that concentrates the processing power at a common baseband unit (BBU) connected to multiple remote radio heads (RRH) via fronthaul links. The high bandwidths of 5G make the fronthaul data rate a major bottleneck. Since the number of active users in a massive MIMO system is much smaller than the number of antennas, we propose a dimension reduction scheme based on low rank approximation for fronthaul data compression. Link level simulations show that the proposed method achieves more than 17x compression while also improving the error performance of the system through denoising.
△ Less
Submitted 12 March, 2019;
originally announced March 2019.
-
Detection of Glottal Closure Instants from Raw Speech using Convolutional Neural Networks
Authors:
Mohit Goyal,
Varun Srivastava,
Prathosh A. P
Abstract:
Glottal Closure Instants (GCIs) correspond to the temporal locations of significant excitation to the vocal tract occurring during the production of voiced speech. GCI detection from speech signals is a well-studied problem given its importance in speech processing. Most of the existing approaches for GCI detection adopt a two-stage approach (i) Transformation of speech signal into a representativ…
▽ More
Glottal Closure Instants (GCIs) correspond to the temporal locations of significant excitation to the vocal tract occurring during the production of voiced speech. GCI detection from speech signals is a well-studied problem given its importance in speech processing. Most of the existing approaches for GCI detection adopt a two-stage approach (i) Transformation of speech signal into a representative signal where GCIs are localized better, (ii) extraction of GCIs using the representative signal obtained in first stage. The former stage is accomplished using signal processing techniques based on the principles of speech production and the latter with heuristic-algorithms such as dynamic-programming and peak-picking. These methods are thus task-specific and rely on the methods used for representative signal extraction. However, in this paper, we formulate the GCI detection problem from a representation learning perspective where appropriate representation is implicitly learned from the raw-speech data samples. Specifically, GCI detection is cast as a supervised multi-task learning problem solved using a deep convolutional neural network jointly optimizing a classification and regression cost. The learning capability is demonstrated with several experiments on standard datasets. The results compare well with the state-of-the-art algorithms while performing better in the case of presence of real-world non-stationary noise.
△ Less
Submitted 9 July, 2019; v1 submitted 26 April, 2018;
originally announced April 2018.