Search | arXiv e-print repository

Emerging ML-AI Techniques for Analog and RF EDA

Authors: Zhengfeng Wu, Ziyi Chen, Nnaemeka Achebe, Vaibhav V. Rao, Pratik Shrestha, Ioannis Savidis

Abstract: This survey explores the integration of machine learning (ML) into EDA workflows for analog and RF circuits, addressing challenges unique to analog design, which include complex constraints, nonlinear design spaces, and high computational costs. State-of-the-art learning and optimization techniques are reviewed for circuit tasks such as constraint formulation, topology generation, device modeling,… ▽ More This survey explores the integration of machine learning (ML) into EDA workflows for analog and RF circuits, addressing challenges unique to analog design, which include complex constraints, nonlinear design spaces, and high computational costs. State-of-the-art learning and optimization techniques are reviewed for circuit tasks such as constraint formulation, topology generation, device modeling, sizing, placement, and routing. The survey highlights the capability of ML to enhance automation, improve design quality, and reduce time-to-market while meeting the target specifications of an analog or RF circuit. Emerging trends and cross-cutting challenges, including robustness to variations and considerations of interconnect parasitics, are also discussed. △ Less

Submitted 12 May, 2025; originally announced June 2025.

Comments: 9 pages, 2 figures

arXiv:2505.02105 [pdf]

Deep Representation Learning for Electronic Design Automation

Authors: Pratik Shrestha, Saran Phatharodom, Alec Aversa, David Blankenship, Zhengfeng Wu, Ioannis Savidis

Abstract: Representation learning has become an effective technique utilized by electronic design automation (EDA) algorithms, which leverage the natural representation of workflow elements as images, grids, and graphs. By addressing challenges related to the increasing complexity of circuits and stringent power, performance, and area (PPA) requirements, representation learning facilitates the automatic ext… ▽ More Representation learning has become an effective technique utilized by electronic design automation (EDA) algorithms, which leverage the natural representation of workflow elements as images, grids, and graphs. By addressing challenges related to the increasing complexity of circuits and stringent power, performance, and area (PPA) requirements, representation learning facilitates the automatic extraction of meaningful features from complex data formats, including images, grids, and graphs. This paper examines the application of representation learning in EDA, covering foundational concepts and analyzing prior work and case studies on tasks that include timing prediction, routability analysis, and automated placement. Key techniques, including image-based methods, graph-based approaches, and hybrid multimodal solutions, are presented to illustrate the improvements provided in routing, timing, and parasitic prediction. The provided advancements demonstrate the potential of representation learning to enhance efficiency, accuracy, and scalability in current integrated circuit design flows. △ Less

Submitted 4 May, 2025; originally announced May 2025.

arXiv:2503.02904 [pdf, other]

Surgical Vision World Model

Authors: Saurabh Koju, Saurav Bastola, Prashant Shrestha, Sanskar Amgain, Yash Raj Shrestha, Rudra P. K. Poudel, Binod Bhattarai

Abstract: Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition… ▽ More Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition is infeasible. However, such works in the surgical domain have been limited to simplified computer simulations, and lack realism. Furthermore, existing literature in world models has predominantly dealt with action-labeled data, limiting their applicability to real-world surgical data, where obtaining action annotation is prohibitively expensive. Inspired by the recent success of Genie in leveraging unlabeled video game data to infer latent actions and enable action-controlled data generation, we propose the first surgical vision world model. The proposed model can generate action-controllable surgical data and the architecture design is verified with extensive experiments on the unlabeled SurgToolLoc-2022 dataset. Codes and implementation details are available at https://github.com/bhattarailab/Surgical-Vision-World-Model △ Less

Submitted 3 March, 2025; originally announced March 2025.

arXiv:2502.14623 [pdf]

Crosstalk Analysis in Quantum Networks: Detection and Localization Insights with photon counting OTDR

Authors: Anouar Rahmouni, Pranish Shrestha, YaShian Li-Baboud, Anne Marie Richards, Yicheng Shi, Mheni Merzouki, Lijun Ma, Alan Migdal, Abdella Battou, Oliver Slattery, Thomas Gerrits

Abstract: Optical crosstalk from sub-milliwatt classical-channel power into quantum channels presents a significant challenge in quantum network development, introducing substantial noise that limits the network's performance, scalability, and fidelity. Here we report a demonstration using photon counting optical time-domain reflectometry (ν-OTDR) to precisely identify and localize crosstalk between separat… ▽ More Optical crosstalk from sub-milliwatt classical-channel power into quantum channels presents a significant challenge in quantum network development, introducing substantial noise that limits the network's performance, scalability, and fidelity. Here we report a demonstration using photon counting optical time-domain reflectometry (ν-OTDR) to precisely identify and localize crosstalk between separate channels within the same fiber and between separate fibers. The coexistence of classical and quantum signals in the same network necessitates the use of optical switches for efficient routing and control. Crosstalk characterization of an optical switch reveals that crosstalk depends strongly on cross connect configuration, with higher levels observed when connections are presumed to be physically closer and lower levels when further apart. Additionally, we found that crosstalk exhibits a pronounced wavelength dependence, increasing over tenfold at longer wavelengths. These findings demonstrate the value of ν-OTDR in diagnosing and mitigating crosstalk in quantum networks. They highlight the importance of optimizing optical switch configurations and wavelength management to minimize noise, ultimately enhancing the scalability, fidelity, and overall performance of quantum networks. This work establishes a foundational approach to addressing crosstalk, paving the way for more robust and efficient quantum network designs. △ Less

Submitted 22 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

arXiv:2501.18643 [pdf, other]

3D Reconstruction of Shoes for Augmented Reality

Authors: Pratik Shrestha, Sujan Kapali, Swikar Gautam, Vishal Pokharel, Santosh Giri

Abstract: This paper introduces a mobile-based solution that enhances online shoe shopping through 3D modeling and Augmented Reality (AR), leveraging the efficiency of 3D Gaussian Splatting. Addressing the limitations of static 2D images, the framework generates realistic 3D shoe models from 2D images, achieving an average Peak Signal-to-Noise Ratio (PSNR) of 32, and enables immersive AR interactions via sm… ▽ More This paper introduces a mobile-based solution that enhances online shoe shopping through 3D modeling and Augmented Reality (AR), leveraging the efficiency of 3D Gaussian Splatting. Addressing the limitations of static 2D images, the framework generates realistic 3D shoe models from 2D images, achieving an average Peak Signal-to-Noise Ratio (PSNR) of 32, and enables immersive AR interactions via smartphones. A custom shoe segmentation dataset of 3120 images was created, with the best-performing segmentation model achieving an Intersection over Union (IoU) score of 0.95. This paper demonstrates the potential of 3D modeling and AR to revolutionize online shopping by offering realistic virtual interactions, with applicability across broader fashion categories. △ Less

Submitted 17 February, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

arXiv:2412.13010 [pdf, other]

Measurement of Medial Elbow Joint Space using Landmark Detection

Authors: Shizuka Akahori, Shotaro Teruya, Pragyan Shrestha, Yuichi Yoshii, Ryuhei Michinobu, Satoshi Iizuka, Itaru Kitahara

Abstract: Ultrasound imaging of the medial elbow is crucial for the early diagnosis of Ulnar Collateral Ligament (UCL) injuries. Specifically, measuring the elbow joint space in ultrasound images is used to assess the valgus instability of the elbow caused by UCL injuries. To automate this measurement, a model trained on a precisely annotated dataset is necessary; however, no publicly available dataset exis… ▽ More Ultrasound imaging of the medial elbow is crucial for the early diagnosis of Ulnar Collateral Ligament (UCL) injuries. Specifically, measuring the elbow joint space in ultrasound images is used to assess the valgus instability of the elbow caused by UCL injuries. To automate this measurement, a model trained on a precisely annotated dataset is necessary; however, no publicly available dataset exists to date. This study introduces a novel ultrasound medial elbow dataset to measure the joint space. The dataset comprises 4,201 medial elbow ultrasound images from 22 subjects, with landmark annotations on the humerus and ulna, based on the expertise of three orthopedic surgeons. We evaluated joint space measurement methods on our proposed dataset using heatmap-based, regression-based, and token-based landmark detection methods. While heatmap-based landmark detection methods generally achieve high accuracy, they sometimes produce multiple peaks on a heatmap, leading to incorrect detection. To mitigate this issue and enhance landmark localization, we propose Shape Subspace (SS) landmark refinement by measuring geometrical similarities between the detected and reference landmark positions. The results show that the mean joint space measurement error is 0.116 mm when using HRNet. Furthermore, SS landmark refinement can reduce the mean absolute error of landmark positions by 0.010 mm with HRNet and by 0.103 mm with ViTPose on average. These highlight the potential for high-precision, real-time diagnosis of UCL injuries by accurately measuring joint space. Lastly, we demonstrate point-based segmentation for the humerus and ulna using the detected landmarks as inputs. Our dataset will be publicly available at https://github.com/Akahori000/Ultrasound-Medial-Elbow-Dataset △ Less

Submitted 25 February, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

arXiv:2411.12635 [pdf, other]

M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction

Authors: Luoxi Zhang, Pragyan Shrestha, Yu Zhou, Chun Xie, Itaru Kitahara

Abstract: The precise reconstruction of 3D objects from a single RGB image in complex scenes presents a critical challenge in virtual reality, autonomous driving, and robotics. Existing neural implicit 3D representation methods face significant difficulties in balancing the extraction of global and local features, particularly in diverse and complex environments, leading to insufficient reconstruction preci… ▽ More The precise reconstruction of 3D objects from a single RGB image in complex scenes presents a critical challenge in virtual reality, autonomous driving, and robotics. Existing neural implicit 3D representation methods face significant difficulties in balancing the extraction of global and local features, particularly in diverse and complex environments, leading to insufficient reconstruction precision and quality. We propose M3D, a novel single-view 3D reconstruction framework, to tackle these challenges. This framework adopts a dual-stream feature extraction strategy based on Selective State Spaces to effectively balance the extraction of global and local features, thereby improving scene comprehension and representation precision. Additionally, a parallel branch extracts depth information, effectively integrating visual and geometric features to enhance reconstruction quality and preserve intricate details. Experimental results indicate that the fusion of multi-scale features with depth information via the dual-branch feature extraction significantly boosts geometric consistency and fidelity, achieving state-of-the-art reconstruction performance. △ Less

Submitted 20 November, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

Comments: 9 pages, 4 figures

ACM Class: I.3.5

arXiv:2410.15158 [pdf, other]

Automated Segmentation and Analysis of Cone Photoreceptors in Multimodal Adaptive Optics Imaging

Authors: Prajol Shrestha, Mikhail Kulyabin, Aline Sindel, Hilde R. Pedersen, Stuart Gilson, Rigmor Baraas, Andreas Maier

Abstract: Accurate detection and segmentation of cone cells in the retina are essential for diagnosing and managing retinal diseases. In this study, we used advanced imaging techniques, including confocal and non-confocal split detector images from adaptive optics scanning light ophthalmoscopy (AOSLO), to analyze photoreceptors for improved accuracy. Precise segmentation is crucial for understanding each co… ▽ More Accurate detection and segmentation of cone cells in the retina are essential for diagnosing and managing retinal diseases. In this study, we used advanced imaging techniques, including confocal and non-confocal split detector images from adaptive optics scanning light ophthalmoscopy (AOSLO), to analyze photoreceptors for improved accuracy. Precise segmentation is crucial for understanding each cone cell's shape, area, and distribution. It helps to estimate the surrounding areas occupied by rods, which allows the calculation of the density of cone photoreceptors in the area of interest. In turn, density is critical for evaluating overall retinal health and functionality. We explored two U-Net-based segmentation models: StarDist for confocal and Cellpose for calculated modalities. Analyzing cone cells in images from two modalities and achieving consistent results demonstrates the study's reliability and potential for clinical application. △ Less

Submitted 19 October, 2024; originally announced October 2024.

arXiv:2410.08152 [pdf, other]

RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace

Authors: Pragyan Shrestha, Chun Xie, Yuichi Yoshii, Itaru Kitahara

Abstract: Intra-operative 2D-3D registration of X-ray images with pre-operatively acquired CT scans is a crucial procedure in orthopedic surgeries. Anatomical landmarks pre-annotated in the CT volume can be detected in X-ray images to establish 2D-3D correspondences, which are then utilized for registration. However, registration often fails in certain view angles due to poor landmark visibility. We propose… ▽ More Intra-operative 2D-3D registration of X-ray images with pre-operatively acquired CT scans is a crucial procedure in orthopedic surgeries. Anatomical landmarks pre-annotated in the CT volume can be detected in X-ray images to establish 2D-3D correspondences, which are then utilized for registration. However, registration often fails in certain view angles due to poor landmark visibility. We propose a novel method to address this issue by detecting arbitrary landmark points in X-ray images. Our approach represents 3D points as distinct subspaces, formed by feature vectors (referred to as ray embeddings) corresponding to intersecting rays. Establishing 2D-3D correspondences then becomes a task of finding ray embeddings that are close to a given subspace, essentially performing an intersection test. Unlike conventional methods for landmark estimation, our approach eliminates the need for manually annotating fixed landmarks. We trained our model using the synthetic images generated from CTPelvic1K CLINIC dataset, which contains 103 CT volumes, and evaluated it on the DeepFluoro dataset, comprising real X-ray images. Experimental results demonstrate the superiority of our method over conventional methods. The code is available at https://github.com/Pragyanstha/rayemb. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: Accepted as an oral presentation at ACCV 2024

arXiv:2409.17272 [pdf]

Design and development of desktop braille printing machine at Fablab Nepal

Authors: Daya Bandhu Ghimire, Pallab Shrestha

Abstract: The development of a desktop Braille printing machine aims to create an affordable, user-friendly device for visually impaired users. This document outlines the entire process, from research and requirement analysis to distribution and support, leveraging the content and guidelines from the GitHub repository,https://github.com/fablabnepal1/Desktop-Braille-Printing-Machine. The development of a desktop Braille printing machine aims to create an affordable, user-friendly device for visually impaired users. This document outlines the entire process, from research and requirement analysis to distribution and support, leveraging the content and guidelines from the GitHub repository,https://github.com/fablabnepal1/Desktop-Braille-Printing-Machine. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2407.08648 [pdf, other]

CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities

Authors: Pranav Poudel, Prashant Shrestha, Sanskar Amgain, Yash Raj Shrestha, Prashnna Gyawali, Binod Bhattarai

Abstract: Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centr… ▽ More Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centralizing sensitive data, thus maintaining privacy and security. Yet, research in multimodal federated learning, particularly in scenarios with missing modalities a common issue in healthcare datasets remains scarce, highlighting a critical area for future exploration. Toward this, we propose a novel method for multimodal federated learning with missing modalities. Our contribution lies in a novel cross-modal data augmentation by retrieval, leveraging the small publicly available dataset to fill the missing modalities in the clients. Our method learns the parameters in a federated manner, ensuring privacy protection and improving performance in multiple challenging multimodal benchmarks in the medical domain, surpassing several competitive baselines. Code Available: https://github.com/bhattarailab/CAR-MFL △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted at MICCAI 2024

arXiv:2402.16734 [pdf, other]

Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

Authors: Bidur Khanal, Prashant Shrestha, Sanskar Amgain, Bishesh Khanal, Binod Bhattarai, Cristian A. Linte

Abstract: Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of… ▽ More Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image classification. In this paper, we investigate the architectural robustness of ViT against label noise and compare it to that of CNNs. We use two medical image classification datasets -- COVID-DU-Ex, and NCT-CRC-HE-100K -- both corrupted by injecting label noise at various rates. Additionally, we show that pretraining is crucial for ensuring ViT's improved robustness against label noise in supervised training. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.10035 [pdf, other]

Investigation of Federated Learning Algorithms for Retinal Optical Coherence Tomography Image Classification with Statistical Heterogeneity

Authors: Sanskar Amgain, Prashant Shrestha, Sophia Bano, Ignacio del Valle Torres, Michael Cunniffe, Victor Hernandez, Phil Beales, Binod Bhattarai

Abstract: Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely. Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy conc… ▽ More Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely. Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy concerns associated with centralizing data. We partitioned a publicly available OCT dataset across multiple clients under IID and Non-IID settings and conducted local training on the subsets for each client. We evaluated two federated learning methods, FedAvg and FedProx for these settings. Results: Our experiments on the dataset suggest that under IID settings, both methods perform on par with training on a central data pool. However, the performance of both algorithms declines as we increase the statistical heterogeneity across the client data, while FedProx consistently performs better than FedAvg in the increased heterogeneity settings. Conclusion: Despite the effectiveness of federated learning in the utilization of private data across multiple medical institutions, the large number of clients and heterogeneous distribution of labels deteriorate the performance of both algorithms. Notably, FedProx appears to be more robust to the increased heterogeneity. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2312.07435 [pdf, other]

Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval

Authors: Love Panta, Prashant Shrestha, Brabeem Sapkota, Amrita Bhattarai, Suresh Manandhar, Anand Kumar Sah

Abstract: Video moment retrieval is a challenging task requiring fine-grained interactions between video and text modalities. Recent work in image-text pretraining has demonstrated that most existing pretrained models suffer from information asymmetry due to the difference in length between visual and textual sequences. We question whether the same problem also exists in the video-text domain with an auxili… ▽ More Video moment retrieval is a challenging task requiring fine-grained interactions between video and text modalities. Recent work in image-text pretraining has demonstrated that most existing pretrained models suffer from information asymmetry due to the difference in length between visual and textual sequences. We question whether the same problem also exists in the video-text domain with an auxiliary need to preserve both spatial and temporal information. Thus, we evaluate a recently proposed solution involving the addition of an asymmetric co-attention network for video grounding tasks. Additionally, we incorporate momentum contrastive loss for robust, discriminative representation learning in both modalities. We note that the integration of these supplementary modules yields better performance compared to state-of-the-art models on the TACoS dataset and comparable results on ActivityNet Captions, all while utilizing significantly fewer parameters with respect to baseline. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06224 [pdf, other]

Medical Vision Language Pretraining: A survey

Authors: Prashant Shrestha, Sanskar Amgain, Bidur Khanal, Cristian A. Linte, Binod Bhattarai

Abstract: Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations. Such pretrained models have the potential to enhance multiple downstream medica… ▽ More Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations. Such pretrained models have the potential to enhance multiple downstream medical tasks simultaneously, reducing the dependency on labeled data. However, despite recent progress and its potential, there is no such comprehensive survey paper that has explored the various aspects and advancements in medical VLP. In this paper, we specifically review existing works through the lens of different pretraining objectives, architectures, downstream evaluation tasks, and datasets utilized for pretraining and downstream tasks. Subsequently, we delve into current challenges in medical VLP, discussing existing and potential solutions, and conclude by highlighting future directions. To the best of our knowledge, this is the first survey focused on medical VLP. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.15087 [pdf, other]

doi 10.1007/978-3-031-43999-5_74

X-Ray to CT Rigid Registration Using Scene Coordinate Regression

Authors: Pragyan Shrestha, Chun Xie, Hidehiko Shishido, Yuichi Yoshii, Itary Kitahara

Abstract: Intraoperative fluoroscopy is a frequently used modality in minimally invasive orthopedic surgeries. Aligning the intraoperatively acquired X-ray image with the preoperatively acquired 3D model of a computed tomography (CT) scan reduces the mental burden on surgeons induced by the overlapping anatomical structures in the acquired images. This paper proposes a fully automatic registration method th… ▽ More Intraoperative fluoroscopy is a frequently used modality in minimally invasive orthopedic surgeries. Aligning the intraoperatively acquired X-ray image with the preoperatively acquired 3D model of a computed tomography (CT) scan reduces the mental burden on surgeons induced by the overlapping anatomical structures in the acquired images. This paper proposes a fully automatic registration method that is robust to extreme viewpoints and does not require manual annotation of landmark points during training. It is based on a fully convolutional neural network (CNN) that regresses the scene coordinates for a given X-ray image. The scene coordinates are defined as the intersection of the back-projected rays from a pixel toward the 3D model. Training data for a patient-specific model were generated through a realistic simulation of a C-arm device using preoperative CT scans. In contrast, intraoperative registration was achieved by solving the perspective-n-point (PnP) problem with a random sample and consensus (RANSAC) algorithm. Experiments were conducted using a pelvic CT dataset that included several real fluoroscopic (X-ray) images with ground truth annotations. The proposed method achieved an average mean target registration error (mTRE) of 3.79 mm in the 50th percentile of the simulated test dataset and projected mTRE of 9.65 mm in the 50th percentile of real fluoroscopic images for pelvis registration. △ Less

Submitted 25 November, 2023; originally announced November 2023.

Journal ref: Medical Image Computing and Computer Assisted Intervention MICCAI 2023. Lecture Notes in Computer Science, vol 14229

arXiv:2305.01503 [pdf, other]

NewsPanda: Media Monitoring for Timely Conservation Action

Authors: Sedrick Scott Keh, Zheyuan Ryan Shi, David J. Patterson, Nirmal Bhagabati, Karun Dewan, Areendran Gopala, Pablo Izquierdo, Debojyoti Mallick, Ambika Sharma, Pooja Shrestha, Fei Fang

Abstract: Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes onlin… ▽ More Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes online articles related to environmental conservation and infrastructure construction. We fine-tune a BERT-based model using active learning methods and noise correction algorithms to identify articles that are relevant to conservation and infrastructure construction. For the identified articles, we perform further analysis, extracting keywords and finding potentially related sources. NewsPanda has been successfully deployed by the World Wide Fund for Nature teams in the UK, India, and Nepal since February 2022. It currently monitors over 80,000 websites and 1,074 conservation sites across India and Nepal, saving more than 30 hours of human efforts weekly. We have now scaled it up to cover 60,000 conservation sites globally. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: Accepted to IAAI-23: 35th Annual Conference on Innovative Applications of Artificial Intelligence. Winner of IAAI Deployed Application Award. Code at https://github.com/NewsPanda-WWF-CMU/weekly-pipeline

arXiv:2207.08338 [pdf, other]

MobileCodec: Neural Inter-frame Video Compression on Mobile Devices

Authors: Hoang Le, Liang Zhang, Amir Said, Guillaume Sautiere, Yang Yang, Pranav Shrestha, Fei Yin, Reza Pourreza, Auke Wiggers

Abstract: Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time opera… ▽ More Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time operation on a mobile device powered by Snapdragon technology. We show the first-ever inter-frame neural video decoder running on a commercial mobile phone, decoding high-definition videos in real-time while maintaining a low bitrate and high visual quality. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: ACM MMSys 2022

arXiv:2204.01932 [pdf, ps, other]

On near-martingales and a class of anticipating linear SDEs

Authors: Hui-Hsiung Kuo, Pujan Shrestha, Sudip Sinha, Padmanabhan Sundar

Abstract: The primary goal of this paper is to prove a near-martingale optional stopping theorem and establish solvability and large deviations for a class of anticipating linear stochastic differential equations. We prove the existence and uniqueness of solutions using two approaches: (1) Ayed-Kuo differential formula using an ansatz, and (2) a novel braiding technique by interpreting the integral in the S… ▽ More The primary goal of this paper is to prove a near-martingale optional stopping theorem and establish solvability and large deviations for a class of anticipating linear stochastic differential equations. We prove the existence and uniqueness of solutions using two approaches: (1) Ayed-Kuo differential formula using an ansatz, and (2) a novel braiding technique by interpreting the integral in the Skorokhod sense. We establish a Freidlin-Wentzell type large deviations result for solution of such equations. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 23 pages, 2 figures

MSC Class: 60H10; 60F10; 60G48; 60G40 (Primary) 60H05; 60H07; 60H20 (Secondary)

arXiv:2110.10129 [pdf, other]

Gummy Browsers: Targeted Browser Spoofing against State-of-the-Art Fingerprinting Techniques

Authors: Zengrui Liu, Prakash Shrestha, Nitesh Saxena

Abstract: We present a simple yet potentially devastating and hard-to-detect threat, called Gummy Browsers, whereby the browser fingerprinting information can be collected and spoofed without the victim's awareness, thereby compromising the privacy and security of any application that uses browser fingerprinting. The idea is that attacker A first makes the user U connect to his website (or to a well-known s… ▽ More We present a simple yet potentially devastating and hard-to-detect threat, called Gummy Browsers, whereby the browser fingerprinting information can be collected and spoofed without the victim's awareness, thereby compromising the privacy and security of any application that uses browser fingerprinting. The idea is that attacker A first makes the user U connect to his website (or to a well-known site the attacker controls) and transparently collects the information from U that is used for fingerprinting purposes. Then, A orchestrates a browser on his own machine to replicate and transmit the same fingerprinting information when connecting to W, fooling W to think that U is the one requesting the service rather than A. This will allow the attacker to profile U and compromise U's privacy. We design and implement the Gummy Browsers attack using three orchestration methods based on script injection, browser settings and debugging tools, and script modification, that can successfully spoof a wide variety of fingerprinting features to mimic many different browsers (including mobile browsers and the Tor browser). We then evaluate the attack against two state-of-the-art browser fingerprinting systems, FPStalker and Panopticlick. Our results show that A can accurately match his own manipulated browser fingerprint with that of any targeted victim user U's fingerprint for a long period of time, without significantly affecting the tracking of U and when only collecting U's fingerprinting information only once. The TPR (true positive rate) for the tracking of the benign user in the presence of the attack is larger than 0.9 in most cases. The FPR (false positive rate) for the tracking of the attacker is also high, larger than 0.9 in all cases. We also argue that the attack can remain completely oblivious to the user and the website, thus making it extremely difficult to thwart in practice. △ Less

Submitted 19 October, 2021; originally announced October 2021.

arXiv:2105.08859 [pdf, other]

Changes in Crime Rates During the COVID-19 Pandemic

Authors: Mikaela Meyer, Ahmed Hassafy, Gina Lewis, Prasun Shrestha, Amelia M. Haviland, Daniel S. Nagin

Abstract: We estimate changes in the rates of five FBI Part 1 crime (homicide, auto theft, burglary, robbery, and larceny) during the COVID-19 pandemic from March through December 2020. Using publicly available weekly crime count data from 29 of the 70 largest cities in the U.S. from January 2018 through December 2020, three different linear regression model specifications are used to detect changes. One de… ▽ More We estimate changes in the rates of five FBI Part 1 crime (homicide, auto theft, burglary, robbery, and larceny) during the COVID-19 pandemic from March through December 2020. Using publicly available weekly crime count data from 29 of the 70 largest cities in the U.S. from January 2018 through December 2020, three different linear regression model specifications are used to detect changes. One detects whether crime trends in four 2020 pre- and post-pandemic periods differ from those in 2018 and 2019. A second looks in more detail at the spring 2020 lockdowns to detect whether crime trends changed over successive biweekly periods into the lockdown. The third uses a city-level openness index that we created for the purpose of examining whether the degree of openness was associated with changing crime rates. For homicide and auto theft, we find significant increases during all or most of the pandemic. By contrast, we find significant declines in robbery and larceny during all or part of the pandemic and no significant changes in burglary over the course of the pandemic. Only larceny rates fluctuated with the degree of each city's lockdown. It is unusual for crime rates to move in different directions, and the reasons for the mixed findings for these five Part 1 Index crimes, one with no change, two with sustained increases, and two with sustained decreases, are not yet known. We hypothesize that the reasons may be related to changes in opportunity, and the pandemic provides unique opportunities for future research to better understand the forces impacting crime rates. In the absence of a clear understanding of the mechanisms by which the pandemic affected crime, in the spirit of evidence-based crime policy, we caution against advancing policy at this time based on lessons learned from the pandemic "natural experiment." △ Less

Submitted 18 May, 2021; originally announced May 2021.

arXiv:2012.02164 [pdf, other]

People Still Care About Facts: Twitter Users Engage More with Factual Discourse than Misinformation--A Comparison Between COVID and General Narratives on Twitter

Authors: Mirela Silva, Fabrício Ceschin, Prakash Shrestha, Christopher Brant, Shlok Gilda, Juliana Fernandes, Catia S. Silva, André Grégio, Daniela Oliveira, Luiz Giovanini

Abstract: Misinformation entails the dissemination of falsehoods that leads to the slow fracturing of society via decreased trust in democratic processes, institutions, and science. The public has grown aware of the role of social media as a superspreader of untrustworthy information, where even pandemics have not been immune. In this paper, we focus on COVID-19 misinformation and examine a subset of 2.1M t… ▽ More Misinformation entails the dissemination of falsehoods that leads to the slow fracturing of society via decreased trust in democratic processes, institutions, and science. The public has grown aware of the role of social media as a superspreader of untrustworthy information, where even pandemics have not been immune. In this paper, we focus on COVID-19 misinformation and examine a subset of 2.1M tweets to understand misinformation as a function of engagement, tweet content (COVID-19- vs. non-COVID-19-related), and veracity (misleading or factual). Using correlation analysis, we show the most relevant feature subsets among over 126 features that most heavily correlate with misinformation or facts. We found that (i) factual tweets, regardless of whether COVID-related, were more engaging than misinformation tweets; and (ii) features that most heavily correlated with engagement varied depending on the veracity and content of the tweet. △ Less

Submitted 9 September, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: 22 pages

arXiv:2008.06781 [pdf]

Correlation Analysis among Vorticity, Q method and Liutex

Authors: Yifei Yu1, Pushpa Shrestha, Oscar Alvarez, Charles Nottage, Chaoqun Liu

Abstract: Influenced by the fact that vorticity represents rotation for rigid body, people believe it also works for fluid flow. However, the theoretical predictions by vorticity do not match experiment results, which drove scientists to look for better methods to describe vortex. According to Dr. Liu classification, all methods applied to detect vortex can be categorized into three generations. The vortici… ▽ More Influenced by the fact that vorticity represents rotation for rigid body, people believe it also works for fluid flow. However, the theoretical predictions by vorticity do not match experiment results, which drove scientists to look for better methods to describe vortex. According to Dr. Liu classification, all methods applied to detect vortex can be categorized into three generations. The vorticity-based method is classified as the first generation. Methods relying on eigenvalues of velocity gradient tensor are considered as the second generation. Although so many methods appeared, people still believe vorticity is vortex since vorticity theory looks perfect in math, and all other methods are only scalars and unable to indicate swirl direction. Recently, Dr. Liu innovated a new vortex identification method called Liutex. Liutex, a vector quantity, which is regarded as the third-generation method, not only overcomes all previous methods drawbacks, but also has a clear physical meaning. The direction of Liutex represents the swirl axis of rotation, and its strength is equal to twice of angular speed. In this paper, we did a correlation analysis between vorticity, Q, Lambda ci, Lambda 2 methods and Liutex based on a DNS case of boundary layer transition. The results show that the correlation between vorticity and Liutex is minimal in strong shear region, which demonstrates the idea that using vorticity to detect vortex lacks a scientific foundation; in other words, vorticity is not vortex. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: 13 pages, 6 figures, 2979 words, pre-print

arXiv:2008.06779 [pdf]

Stretching and Shearing Contamination Analysis for Liutex/Rortex and Other Vortex Identification Methods

Authors: Pushpa Shrestha, Charles Nottage, Yifei Yu, Oscar Alvarez, Chaoqun Liu

Abstract: Although traditional vortex identification methods such as Q, Delta, Lambda2, Lambdaci remain popular in the identification and visualization of vortices, these methods count on shearing and stretching as a part of vortex strength. However, shearing and stretching do not contribute to fluid rotation. In this paper, the contamination effects of stretching and shearing of these methods are investiga… ▽ More Although traditional vortex identification methods such as Q, Delta, Lambda2, Lambdaci remain popular in the identification and visualization of vortices, these methods count on shearing and stretching as a part of vortex strength. However, shearing and stretching do not contribute to fluid rotation. In this paper, the contamination effects of stretching and shearing of these methods are investigated and compared with Liutex method. From our investigation, the Liutex is an exact definition of fluid rotation or vortex, while other vortex identification methods are contaminated by stretching and shearing at different levels. The decomposition of the velocity gradient tensor can only be conducted in a so-called Principal Coordinate for uniqueness. The mathematical relation between Liutex and other vortex identification function are derived in this paper and then the effects of shearing and stretching on different vortex identification methods are studied. The mathematical formula and computation of the stretching and shearing effects on different schemes clearly show that the Liutex method has superiority over the other vortex identification methods as it only counts on local fluid rigid rotation while other methods count on stretching/compression and shearing as a part of the fluid rotation or vortex. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: 25 pages, 4 figures, 5206 words, pre-print

MSC Class: fluid dynamics

arXiv:2004.07993 [pdf, other]

CrossCheck: Rapid, Reproducible, and Interpretable Model Evaluation

Authors: Dustin Arendt, Zhuanyi Huang, Prasha Shrestha, Ellyn Ayton, Maria Glenski, Svitlana Volkova

Abstract: Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We… ▽ More Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We then present three use cases (named entity recognition, reading comprehension, and clickbait detection) that show the benefits of using the tool for model evaluation. CrossCheck allows data scientists to make informed decisions to choose between multiple models, identify when the models are correct and for which examples, investigate whether the models are making the same mistakes as humans, evaluate models' generalizability and highlight models' limitations, strengths and weaknesses. Furthermore, CrossCheck is implemented as a Jupyter widget, which allows rapid and convenient integration into data scientists' model development workflows. △ Less

Submitted 16 April, 2020; originally announced April 2020.

arXiv:1912.05918 [pdf]

Effects of substrate anisotropy and edge diffusion on submonolayer growth during molecular beam epitaxy: A Kinetic Monte Carlo study

Authors: Jagannath Devkota, Shankar P. Shrestha

Abstract: We have performed Kinetic Monte Carlo simulation work to study the effect of diffusion anisotropy, bonding anisotropy and edge diffusion on island formation at different temperatures during the sub-monolayer film growth in Molecular Beam Epitaxy. We use simple cubic solid on solid model and event based Bortz, Kalos and Labowitch (BKL) algorithm on the Kinetic Monte Carlo method to simulate the phy… ▽ More We have performed Kinetic Monte Carlo simulation work to study the effect of diffusion anisotropy, bonding anisotropy and edge diffusion on island formation at different temperatures during the sub-monolayer film growth in Molecular Beam Epitaxy. We use simple cubic solid on solid model and event based Bortz, Kalos and Labowitch (BKL) algorithm on the Kinetic Monte Carlo method to simulate the physical phenomena. We have found that the island morphology and growth exponent are found to be influenced by substrate anisotropy as well as edge diffusion, however they do not play a significant role in island elongation. The growth exponent and island size distribution are observed to be influenced by substrate anisotropy but are negligibly influenced by edge diffusion. We have found fractal islands when edge diffusion is excluded and compact islands when edge diffusion is included. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: 14 pages, ICTP preprint 2007. arXiv admin note: text overlap with arXiv:1912.04877

Report number: IC--2007/129

arXiv:1912.04877 [pdf]

Study on Sub monolayer Epitaxy Growth under Anisotropic Detachment

Authors: Jagannath Devkota, Shankar P. Shrestha

Abstract: We have performed Kinetic Monte Carlo simulation to study the effect of diffusion anisotropy and bonding anisotropy on island formation at different temperatures during the sub-monolayer film growth in Molecular Beam Epitaxy. We use simple cubic solid on solid model and event based Bortz, Kalos and Labowitch (BKL) algorithm on Kinetic Monte Carlo method to simulate the physical phenomena. We have… ▽ More We have performed Kinetic Monte Carlo simulation to study the effect of diffusion anisotropy and bonding anisotropy on island formation at different temperatures during the sub-monolayer film growth in Molecular Beam Epitaxy. We use simple cubic solid on solid model and event based Bortz, Kalos and Labowitch (BKL) algorithm on Kinetic Monte Carlo method to simulate the physical phenomena. We have found that surface anisotropy has no significant role on island elongation however it influences on the island morphology, growth exponent and island size distribution. Elongated islands were obtained when bonding anisotropy was included. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: 7 pages, J. of Nepal Phys. Soc. Vol 24, No 1., December 2008

arXiv:1811.07143 [pdf, other]

High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures

Authors: Iddo Drori, Isht Dwivedi, Pranav Shrestha, Jeffrey Wan, Yueqi Wang, Yunchu He, Anthony Mazza, Hugh Krogh-Freeman, Dimitri Leggas, Kendal Sandridge, Linyong Nan, Kaveri Thakoor, Chinmay Joshi, Sonam Goenka, Chen Keasar, Itsik Pe'er

Abstract: We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the applicatio… ▽ More We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the application domain. We focus on the Q8 resolution of secondary structure, an active area for continuously improving methods. We use an ensemble of strong predictors to achieve accuracy of 70.7% (on the CB513 test set using the CB6133filtered training set). These results are statistically indistinguishable from those of the top existing predictors. In the spirit of reproducible research we make our data, models and code available, aiming to set a gold standard for purity of training and testing sets. Such good practices lower entry barriers to this domain and facilitate reproducible, extendable research. △ Less

Submitted 17 November, 2018; originally announced November 2018.

Comments: NIPS 2018 Workshop on Machine Learning for Molecules and Materials, 10 pages

arXiv:1506.02354

Secure Ad-hoc Routing Scheme

Authors: Anish Prasad Shrestha, Kyung Sup Kwak

Abstract: This paper investigates on the problem of combining routing scheme and physical layer security in multihop wireless networks with cooperative diversity. We propose an ad-hoc natured hop-by-hop best secure relay selection in a multihop network with several relays and an eavesdropper at each hop which provides a safe routing scheme to transmit confidential message from transmitter to legitimate rece… ▽ More This paper investigates on the problem of combining routing scheme and physical layer security in multihop wireless networks with cooperative diversity. We propose an ad-hoc natured hop-by-hop best secure relay selection in a multihop network with several relays and an eavesdropper at each hop which provides a safe routing scheme to transmit confidential message from transmitter to legitimate receiver. The selection is based on the instantaneous channel conditions of relay and eavesdropper at each hop. A theoretical analysis is performed to derive new closed form expressions for probability of non-zero secrecy capacity along with the exact end to end secrecy outage probability at a normalized secrecy rate. Furthermore, we provide the asymptotic expression to gain insights on the diversity gain. △ Less

Submitted 19 July, 2018; v1 submitted 8 June, 2015; originally announced June 2015.

Comments: There are some errors that needs to be fixed

arXiv:1505.05779 [pdf, ps, other]

Pitfalls in Designing Zero-Effort Deauthentication: Opportunistic Human Observation Attacks

Authors: O. Huhta, P. Shrestha, S. Udar, M. Juuti, N. Saxena, N. Asokan

Abstract: Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEB… ▽ More Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEBRA, which provides an interesting and a useful solution to a difficult problem as demonstrated in the original paper. We identify a subtle incorrect assumption in its adversary model that leads to a fundamental design flaw. We exploit this to break the scheme with a class of attacks that are much easier for a human to perform in a realistic adversary model, compared to the naïve attacks studied in the ZEBRA paper. For example, one of our main attacks, where the human attacker has to opportunistically mimic only the victim's keyboard typing activity at a nearby terminal, is significantly more successful compared to the naïve attack that requires mimicking keyboard and mouse activities as well as keyboard-mouse movements. Further, by understanding the design flaws in ZEBRA as cases of tainted input, we show that we can draw on well-understood design principles to improve ZEBRA's security. △ Less

Submitted 14 February, 2016; v1 submitted 21 May, 2015; originally announced May 2015.

ACM Class: K.6.5

arXiv:1311.1565

On Maximal Ratio Diversity with Weighting Errors for Physical Layer Security

Authors: Anish Prasad Shrestha, Kyung Sup Kwak

Abstract: In this letter, we introduce the performance of maximal ratio combining (MRC) with weighting errors for physical layer security. We assume both legitimate user and eavesdropper each equipped with multiple antennas employ non ideal MRC. The non ideal MRC is designed in terms of power correlation between the estimated and actual fadings. We derive new closedform and generalized expressions for secre… ▽ More In this letter, we introduce the performance of maximal ratio combining (MRC) with weighting errors for physical layer security. We assume both legitimate user and eavesdropper each equipped with multiple antennas employ non ideal MRC. The non ideal MRC is designed in terms of power correlation between the estimated and actual fadings. We derive new closedform and generalized expressions for secrecy outage probability. Next, we investigate the asymptotic behavior of secrecy outage probability for high signal-to-noise ratio in the main channel between legitimate user and transmitter. The asymptotic analysis provides the insights about actual diversity provided by MRC with weighting errors. We substantiate our claims with the analytic results and numerical evaluations. △ Less

Submitted 9 November, 2013; v1 submitted 6 November, 2013; originally announced November 2013.

Comments: It requires some major corrections in equations and numerical results

Showing 1–31 of 31 results for author: Shrestha, P