Skip to main content

Showing 1–50 of 82 results for author: Guo, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.13420  [pdf, other

    cs.RO cs.SE

    Testing the Fault-Tolerance of Multi-Sensor Fusion Perception in Autonomous Driving Systems

    Authors: Haoxiang Tian, Wenqiang Ding, Xingshuo Han, Guoquan Wu, An Guo, Junqi Zhang. Wei Chen, Jun Wei, Tianwei Zhang

    Abstract: High-level Autonomous Driving Systems (ADSs), such as Google Waymo and Baidu Apollo, typically rely on multi-sensor fusion (MSF) based approaches to perceive their surroundings. This strategy increases perception robustness by combining the respective strengths of the camera and LiDAR and directly affects the safety-critical driving decisions of autonomous vehicles (AVs). However, in real-world au… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  2. arXiv:2504.02812  [pdf, other

    cs.CV

    BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

    Authors: Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods… ▽ More

    Submitted 23 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2403.09799

  3. Rubikon: Intelligent Tutoring for Rubik's Cube Learning Through AR-enabled Physical Task Reconfiguration

    Authors: Haocheng Ren, Muzhe Wu, Gregory Croisdale, Anhong Guo, Xu Wang

    Abstract: Learning to solve a Rubik's Cube requires the learners to repeatedly practice a skill component, e.g., identifying a misplaced square and putting it back. However, for 3D physical tasks such as this, generating sufficient repeated practice opportunities for learners can be challenging, in part because it is difficult for novices to reconfigure the physical object to specific states. We propose Rub… ▽ More

    Submitted 14 May, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: DIS 2025

  4. arXiv:2503.10029  [pdf, other

    cs.HC

    HandProxy: Expanding the Affordances of Speech Interfaces in Immersive Environments with a Virtual Proxy Hand

    Authors: Chen Liang, Yuxuan Liu, Martez Mott, Anhong Guo

    Abstract: Hand interactions are increasingly used as the primary input modality in immersive environments, but they are not always feasible due to situational impairments, motor limitations, and environmental constraints. Speech interfaces have been explored as an alternative to hand input in research and commercial solutions, but are limited to initiating basic hand gestures and system controls. We introdu… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  5. arXiv:2502.21060  [pdf, other

    cs.LG cs.IT

    Efficient Transformer-based Decoder for Varshamov-Tenengolts Codes

    Authors: Yali Wei, Alan J. X. Guo, Zihui Yan, Yufan Dai

    Abstract: In recent years, the rise of DNA data storage technology has brought significant attention to the challenge of correcting insertion, deletion, and substitution (IDS) errors. Among various coding methods for IDS correction, Varshamov-Tenengolts (VT) codes, primarily designed for single-error correction, have emerged as a central research focus. While existing decoding methods achieve high accuracy… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 9 pages, 2 figures, 9 tables

  6. arXiv:2501.18749  [pdf, other

    cs.AR

    ACiS: Complex Processing in the Switch Fabric

    Authors: Pouya Haghi, Anqi Guo, Tong Geng, Anthony Skjellum, Martin Herbordt

    Abstract: For the last three decades a core use of FPGAs has been for processing communication: FPGA-based SmartNICs are in widespread use from the datacenter to IoT. Augmenting switches with FPGAs, however, has been less studied, but has numerous advantages built around the processing being moved from the edge of the network to the center. Communication switches have previously been augmented to process co… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  7. arXiv:2501.15660  [pdf

    cs.CV cs.AI eess.IV physics.med-ph

    Marker Track: Accurate Fiducial Marker Tracking for Evaluation of Residual Motions During Breath-Hold Radiotherapy

    Authors: Aimee Guo, Weihua Mao

    Abstract: Fiducial marker positions in projection image of cone-beam computed tomography (CBCT) scans have been studied to evaluate daily residual motion during breath-hold radiation therapy. Fiducial marker migration posed challenges in accurately locating markers, prompting the development of a novel algorithm that reconstructs volumetric probability maps of marker locations from filtered gradient maps of… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

    Comments: 14 pages, 9 figures, Regeneron STS 2025 project. Project page: https://sites.google.com/view/markertrack?usp=sharing

  8. arXiv:2501.00829  [pdf, other

    cs.NE cs.AI

    An LLM-Empowered Adaptive Evolutionary Algorithm For Multi-Component Deep Learning Systems

    Authors: Haoxiang Tian, Xingshuo Han, Guoquan Wu, An Guo, Yuan Zhou. Jie Zhang, Shuo Li, Jun Wei, Tianwei Zhang

    Abstract: Multi-objective evolutionary algorithms (MOEAs) are widely used for searching optimal solutions in complex multi-component applications. Traditional MOEAs for multi-component deep learning (MCDL) systems face challenges in enhancing the search efficiency while maintaining the diversity. To combat these, this paper proposes $μ$MOEA, the first LLM-empowered adaptive evolutionary search algorithm to… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: 9

  9. arXiv:2412.03118  [pdf, other

    cs.HC cs.CV

    ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People

    Authors: Ruiping Liu, Jiaming Zhang, Angela Schön, Karin Müller, Junwei Zheng, Kailun Yang, Anhong Guo, Kathrin Gerling, Rainer Stiefelhagen

    Abstract: Searching for objects in unfamiliar scenarios is a challenging task for blind people. It involves specifying the target object, detecting it, and then gathering detailed information according to the user's intent. However, existing description- and detection-based assistive technologies do not sufficiently support the multifaceted nature of interactive object search tasks. We present ObjectFinder,… ▽ More

    Submitted 30 April, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

  10. arXiv:2411.14349  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals

    Authors: Anxin Guo, Aravindan Vijayaraghavan

    Abstract: We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Despite the ReLU neuron being the basic building block of modern neural networks, we still do not understand the basic algorithmic question of whether one arbitrary ReLU neuron is learnable in the non-realizable setting. In particular, all existing polynomi… ▽ More

    Submitted 22 November, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

  11. arXiv:2411.03137  [pdf, other

    cs.HC

    From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice

    Authors: Alicia Guo, Shreya Sathyanarayanan, Leijie Wang, Jeffrey Heer, Amy Zhang

    Abstract: Creative writing is a deeply human craft, yet AI systems using large language models (LLMs) offer the automation of significant parts of the writing process. So why do some creative writers choose to use AI? Through interviews and observed writing sessions with 18 creative writers who already use AI regularly in their writing practice, we find that creative writers are intentional about how they i… ▽ More

    Submitted 3 May, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

  12. SoVAR: Building Generalizable Scenarios from Accident Reports for Autonomous Driving Testing

    Authors: An Guo, Yuan Zhou, Haoxiang Tian, Chunrong Fang, Yunjian Sun, Weisong Sun, Xinyu Gao, Anh Tuan Luu, Yang Liu, Zhenyu Chen

    Abstract: Autonomous driving systems (ADSs) have undergone remarkable development and are increasingly employed in safety-critical applications. However, recently reported data on fatal accidents involving ADSs suggests that the desired level of safety has not yet been fully achieved. Consequently, there is a growing need for more comprehensive and targeted testing approaches to ensure safe driving. Scenari… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Journal ref: 39th IEEE/ACM International Conference on Automated Software Engineering (ASE '24), October 27-November 1, 2024, Sacramento, CA, USA

  13. arXiv:2409.03962  [pdf, other

    stat.ME cs.LG stat.ML

    Average Causal Effect Estimation in DAGs with Hidden Variables: Extensions of Back-Door and Front-Door Criteria

    Authors: Anna Guo, Razieh Nabi

    Abstract: The identification theory for causal effects in directed acyclic graphs (DAGs) with hidden variables is well-developed, but methods for estimating and inferring functionals beyond the g-formula remain limited. Previous studies have proposed semiparametric estimators for identifiable functionals in a broad class of DAGs with hidden variables. While demonstrating double robustness in some models, ex… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  14. CooTest: An Automated Testing Approach for V2X Communication Systems

    Authors: An Guo, Xinyu Gao, Zhenyu Chen, Yuan Xiao, Jiakai Liu, Xiuting Ge, Weisong Sun, Chunrong Fang

    Abstract: Perceiving the complex driving environment precisely is crucial to the safe operation of autonomous vehicles. With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) collaboration has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. However, despite spectacular progress, several co… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Journal ref: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '24), September 16--20, 2024, Vienna, Austria

  15. Audio Description Customization

    Authors: Rosiana Natalie, Ruei-Che Chang, Smitha Sheshadri, Anhong Guo, Kotaro Hara

    Abstract: Blind and low-vision (BLV) people use audio descriptions (ADs) to access videos. However, current ADs are unalterable by end users, thus are incapable of supporting BLV individuals' potentially diverse needs and preferences. This research investigates if customizing AD could improve how BLV individuals consume videos. We conducted an interview study (Study 1) with fifteen BLV participants, which r… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: ASSETS 2024

  16. arXiv:2408.10499  [pdf, other

    cs.HC cs.AI cs.PL

    ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming

    Authors: Jaylin Herskovitz, Andi Xu, Rahaf Alharbi, Anhong Guo

    Abstract: Existing visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating cu… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: UIST 2024

  17. arXiv:2408.09382  [pdf, other

    cs.HC cs.AI cs.ET

    VRCopilot: Authoring 3D Layouts with Generative AI Models in VR

    Authors: Lei Zhang, Jin Pan, Jacob Gettig, Steve Oney, Anhong Guo

    Abstract: Immersive authoring provides an intuitive medium for users to create 3D scenes via direct manipulation in Virtual Reality (VR). Recent advances in generative AI have enabled the automatic creation of realistic 3D layouts. However, it is unclear how capabilities of generative AI can be used in immersive authoring to support fluid interactions, user agency, and creativity. We introduce VRCopilot, a… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: UIST 2024

  18. arXiv:2408.06632  [pdf, other

    cs.HC cs.AI cs.CL

    EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

    Authors: Ruei-Che Chang, Yuxuan Liu, Lotus Zhang, Anhong Guo

    Abstract: Image editing is an iterative process that requires precise visual evaluation and manipulation for the output to match the editing intent. However, current image editing tools do not provide accessible interaction nor sufficient feedback for blind and low vision individuals to achieve this level of control. To address this, we developed EditScribe, a prototype system that makes image editing acces… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: ASSETS 2024

  19. arXiv:2408.06627  [pdf, other

    cs.HC cs.AI cs.CL

    WorldScribe: Towards Context-Aware Live Visual Descriptions

    Authors: Ruei-Che Chang, Yuxuan Liu, Anhong Guo

    Abstract: Automated live visual descriptions can aid blind people in understanding their surroundings with autonomy and independence. However, providing descriptions that are rich, contextual, and just-in-time has been a long-standing challenge in accessibility. In this work, we develop WorldScribe, a system that generates automated live real-world visual descriptions that are customizable and adaptive to u… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: UIST 2024

  20. arXiv:2408.04683  [pdf, other

    cs.CR cs.AI cs.SE

    Eliminating Backdoors in Neural Code Models for Secure Code Understanding

    Authors: Weisong Sun, Yuchen Chen, Chunrong Fang, Yebo Feng, Yuan Xiao, An Guo, Quanjun Zhang, Yang Liu, Baowen Xu, Zhenyu Chen

    Abstract: Neural code models (NCMs) have been widely used to address various code understanding tasks, such as defect detection. However, numerous recent studies reveal that such models are vulnerable to backdoor attacks. Backdoored NCMs function normally on normal/clean code snippets, but exhibit adversary-expected behavior on poisoned code snippets injected with the adversary-crafted trigger. It poses a s… ▽ More

    Submitted 20 February, 2025; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted to the 33rd ACM International Conference on the Foundations of Software Engineering (FSE 2025)

    MSC Class: 68-06 ACM Class: D.2.3; I.2.2

  21. arXiv:2407.18929  [pdf, other

    cs.IT cs.ET cs.LG

    Gumbel-Softmax Discretization Constraint, Differentiable IDS Channel, and an IDS-Correcting Code for DNA Storage

    Authors: Alan J. X. Guo, Mengyi Wei, Yufan Dai, Yali Wei, Pengchen Zhang

    Abstract: Insertion, deletion, and substitution (IDS) error-correcting codes have garnered increased attention with recent advancements in DNA storage technology. However, a universal method for designing IDS-correcting codes across varying channel settings remains underexplored. We present an autoencoder-based method, THEA-code, aimed at efficiently generating IDS-correcting codes for complex IDS channels.… ▽ More

    Submitted 5 January, 2025; v1 submitted 10 July, 2024; originally announced July 2024.

  22. arXiv:2407.16664  [pdf, other

    cs.CL eess.AS

    Towards scalable efficient on-device ASR with transfer learning

    Authors: Laxmi Pandey, Ke Li, Jinxi Guo, Debjyoti Paul, Arthur Guo, Jay Mahadeokar, Xuedong Zhang

    Abstract: Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  23. arXiv:2406.10857  [pdf, other

    cs.SE

    LMM-enhanced Safety-Critical Scenario Generation for Autonomous Driving System Testing From Non-Accident Traffic Videos

    Authors: Haoxiang Tian, Xingshuo Han, Yuan Zhou, Guoquan Wu, An Guo, Mingfei Cheng, Shuo Li, Jun Wei, Tianwei Zhang

    Abstract: Safety testing serves as the fundamental pillar for the development of autonomous driving systems (ADSs). To ensure the safety of ADSs, it is paramount to generate a diverse range of safety-critical test scenarios. While existing ADS practitioners primarily focus on reproducing real-world traffic accidents in simulation environments to create test scenarios, it's essential to highlight that many o… ▽ More

    Submitted 1 January, 2025; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 17 pages

  24. arXiv:2405.12001  [pdf, other

    cs.LG cs.AI

    Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

    Authors: Hai Zhang, Boyuan Zheng, Tianying Ji, Jinhang Liu, Anqi Guo, Junqiao Zhao, Lanqing Li

    Abstract: Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that alternating optimization between the context encoder and the policy can lead to performance improvements, as long as th… ▽ More

    Submitted 2 February, 2025; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accept at ICLR 2025

  25. arXiv:2404.19168  [pdf, other

    cs.CV

    PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition

    Authors: Dongyun Lin, Yi Cheng, Shangbo Mao, Aiyuan Guo, Yiqun Li

    Abstract: Large vision-language models have impressively promote the performance of 2D visual recognition under zero/few-shot scenarios. In this paper, we focus on exploiting the large vision-language model, i.e., CLIP, to address zero/few-shot 3D shape recognition based on multi-view representations. The key challenge for both tasks is to generate a discriminative descriptor of the 3D shape represented by… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  26. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  27. arXiv:2403.02716  [pdf, other

    cs.SE

    Pre-trained Model-based Actionable Warning Identification: A Feasibility Study

    Authors: Xiuting Ge, Chunrong Fang, Quanjun Zhang, Daoyuan Wu, Bowen Yu, Qirui Zheng, An Guo, Shangwei Lin, Zhihong Zhao, Yang Liu, Zhenyu Chen

    Abstract: Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develo… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  28. arXiv:2402.12814  [pdf, other

    cs.HC

    Exploring the Impact of AI Value Alignment in Collaborative Ideation: Effects on Perception, Ownership, and Output

    Authors: Alicia Guo, Pat Pataranutaporn, Pattie Maes

    Abstract: AI-based virtual assistants are increasingly used to support daily ideation tasks. The values or bias present in these agents can influence output in hidden ways. They may also affect how people perceive the ideas produced with these AI agents and lead to implications for the design of AI-based tools. We explored the effects of AI agents with different values on the ideation process and user perce… ▽ More

    Submitted 22 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  29. arXiv:2402.11741  [pdf, other

    cs.DS cs.CC cs.DB cs.DC

    To Store or Not to Store: a graph theoretical approach for Dataset Versioning

    Authors: Anxin Guo, Jingwei Li, Pattara Sukprasert, Samir Khuller, Amol Deshpande, Koyel Mukherjee

    Abstract: In this work, we study the cost efficient data versioning problem, where the goal is to optimize the storage and reconstruction (retrieval) costs of data versions, given a graph of datasets as nodes and edges capturing edit/delta information. One central variant we study is MinSum Retrieval (MSR) where the goal is to minimize the total retrieval costs, while keeping the storage costs bounded. This… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted by IPDPS 2024

  30. arXiv:2402.05902  [pdf

    cs.CV cs.AI physics.med-ph

    ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation

    Authors: Aimee Guo, Grace Fei, Hemanth Pasupuleti, Jing Wang

    Abstract: The newly released Segment Anything Model (SAM) is a popular tool used in image processing due to its superior segmentation accuracy, variety of input prompts, training capabilities, and efficient model design. However, its current model is trained on a diverse dataset not tailored to medical images, particularly ultrasound images. Ultrasound images tend to have a lot of noise, making it difficult… ▽ More

    Submitted 24 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 5 pages, 2 figures, SPIE Medical Imaging Conference 2024. Project page: https://sites.google.com/view/clicksam/home

  31. InteractOut: Leveraging Interaction Proxies as Input Manipulation Strategies for Reducing Smartphone Overuse

    Authors: Tao Lu, Hongxiao Zheng, Tianying Zhang, Xuhai Xu, Anhong Guo

    Abstract: Smartphone overuse poses risks to people's physical and mental health. However, current intervention techniques mainly focus on explicitly changing screen content (i.e., output) and often fail to persistently reduce smartphone overuse due to being over-restrictive or over-flexible. We present the design and implementation of InteractOut, a suite of implicit input manipulation techniques that lever… ▽ More

    Submitted 19 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: CHI 2024

  32. arXiv:2401.11095  [pdf, other

    cs.HC cs.SD eess.AS

    SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

    Authors: Ruei-Che Chang, Chia-Sheng Hung, Bing-Yu Chen, Dhruv Jain, Anhong Guo

    Abstract: Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum… ▽ More

    Submitted 26 May, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: DIS 2024

  33. arXiv:2312.13816  [pdf, other

    cs.CL cs.AI cs.RO

    Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System

    Authors: Ryu Hirai, Shinya Iizuka, Haruhisa Iseno, Ao Guo, Jingjing Jiang, Atsumoto Ohashi, Ryuichiro Higashinaka

    Abstract: At the Dialogue Robot Competition 2023 (DRC2023), which was held to improve the capability of dialogue robots, our team developed a system that could build common ground and take more natural turns based on user utterance texts. Our system generated queries for sightseeing spot searches using the common ground and engaged in dialogue while waiting for user comprehension.

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: This paper is part of the proceedings of the Dialogue Robot Competition 2023

  34. arXiv:2312.12717  [pdf, other

    cs.IT cs.LG

    DoDo-Code: a Deep Levenshtein Distance Embedding-based Code for IDS Channel and DNA Storage

    Authors: Alan J. X. Guo, Sihan Sun, Xiang Wei, Mengyi Wei, Xin Chen

    Abstract: Recently, DNA storage has emerged as a promising data storage solution, offering significant advantages in storage density, maintenance cost efficiency, and parallel replication capability. Mathematically, the DNA storage pipeline can be viewed as an insertion, deletion, and substitution (IDS) channel. Because of the mathematical terra incognita of the Levenshtein distance, designing an IDS-correc… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  35. Levenshtein Distance Embedding with Poisson Regression for DNA Storage

    Authors: Xiang Wei, Alan J. X. Guo, Sihan Sun, Mengyi Wei, Wei Yu

    Abstract: Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural n… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, (2024) 38(14), 15796-15804

  36. arXiv:2312.00413  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

    Authors: Weisong Sun, Chunrong Fang, Yun Miao, Yudu You, Mengzhe Yuan, Yuchen Chen, Quanjun Zhang, An Guo, Xiang Chen, Yang Liu, Zhenyu Chen

    Abstract: Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of the source code features while preserving its semantics. These representations can be used for facilitating subsequent code-related tasks. The abstract syntax… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: submitted to ACM Transactions on Software Engineering and Methodology. arXiv admin note: text overlap with arXiv:2103.10668 by other authors

    MSC Class: 68-04; 68T30 ACM Class: D.2.3; I.2.2; I.2.4

  37. arXiv:2311.01410  [pdf, other

    cs.CV cs.LG

    The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

    Authors: Shen Nie, Hanzhong Allan Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li

    Abstract: We present a unified probabilistic formulation for diffusion-based image editing, where a latent variable is edited in a task-specific manner and generally deviates from the corresponding marginal distribution induced by the original stochastic or ordinary differential equation (SDE or ODE). Instead, it defines a corresponding SDE or ODE for editing. In the formulation, we prove that the Kullback-… ▽ More

    Submitted 29 February, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  38. arXiv:2310.15290  [pdf, other

    cs.LG

    Reliable Generation of Privacy-preserving Synthetic Electronic Health Record Time Series via Diffusion Models

    Authors: Muhang Tian, Bernie Chen, Allan Guo, Shiyi Jiang, Anru R. Zhang

    Abstract: Electronic Health Records (EHRs) are rich sources of patient-level data, offering valuable resources for medical data analysis. However, privacy concerns often restrict access to EHRs, hindering downstream analysis. Current EHR de-identification methods are flawed and can lead to potential privacy leakage. Additionally, existing publicly available EHR databases are limited, preventing the advancem… ▽ More

    Submitted 2 December, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  39. arXiv:2308.15990  [pdf, other

    cs.SD eess.AS

    Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

    Authors: Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao

    Abstract: Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. Nevertheless, the performance of these beamformers is inherently limited by the predictive accuracy of the pre-separation module. In this paper, we introduce a neural beamformer supported by a dual-path transformer. Initially, we employ the cross-… ▽ More

    Submitted 7 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  40. arXiv:2308.01477  [pdf, other

    cs.RO cs.CV

    HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions

    Authors: Andrew Guo, Bowen Wen, Jianhe Yuan, Jonathan Tremblay, Stephen Tyree, Jeffrey Smith, Stan Birchfield

    Abstract: We present the HANDAL dataset for category-level object pose estimation and affordance prediction. Unlike previous datasets, ours is focused on robotics-ready manipulable objects that are of the proper size and shape for functional grasping by robot manipulators, such as pliers, utensils, and screwdrivers. Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: IROS 2023. Project page: https://nvlabs.github.io/HANDAL/

  41. arXiv:2307.10601  [pdf, other

    cs.CV

    SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval

    Authors: Dongyun Lin, Yi Cheng, Aiyuan Guo, Shangbo Mao, Yiqun Li

    Abstract: To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors of 3D objects represented by a single modality, e.g., voxels, point clouds or multi-view images. It is promising to leverage the complementary information from multi-modality representations of 3D objects to further improve retrieval performance. However, multi-modality 3D object retrie… ▽ More

    Submitted 29 November, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

  42. arXiv:2306.15942  [pdf, other

    cs.SD cs.AI eess.AS

    Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

    Authors: Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

    Abstract: Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input fea… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  43. arXiv:2305.17567  [pdf, other

    cs.GT math.OC

    No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand

    Authors: Mengzi Amy Guo, Donghao Ying, Javad Lavaei, Zuo-Jun Max Shen

    Abstract: This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and t… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  44. arXiv:2211.11571  [pdf, other

    cs.CV

    SLLEN: Semantic-aware Low-light Image Enhancement Network

    Authors: Mingye Ju, Chuheng Chen, Charles A. Guo, Jinshan Pan, Jinhui Tang, Dacheng Tao

    Abstract: How to effectively explore semantic feature is vital for low-light image enhancement (LLE). Existing methods usually utilize the semantic feature that is only drawn from the output produced by high-level semantic segmentation (SS) network. However, if the output is not accurately estimated, it would affect the high-level semantic feature (HSF) extraction, which accordingly interferes with LLE. To… ▽ More

    Submitted 21 October, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

  45. arXiv:2210.09518  [pdf, other

    cs.CL cs.AI cs.RO

    Team Flow at DRC2022: Pipeline System for Travel Destination Recommendation Task in Spoken Dialogue

    Authors: Ryu Hirai, Atsumoto Ohashi, Ao Guo, Hideki Shiroma, Xulin Zhou, Yukihiko Tone, Shinya Iizuka, Ryuichiro Higashinaka

    Abstract: To improve the interactive capabilities of a dialogue system, e.g., to adapt to different customers, the Dialogue Robot Competition (DRC2022) was held. As one of the teams, we built a dialogue system with a pipeline structure containing four modules. The natural language understanding (NLU) and natural language generation (NLG) modules were GPT-2 based models, and the dialogue state tracking (DST)… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: This paper is part of the proceedings of the Dialogue Robot Competition 2022

  46. arXiv:2207.04713  [pdf, other

    cs.CL

    GMN: Generative Multi-modal Network for Practical Document Information Extraction

    Authors: Haoyu Cao, Jiefeng Ma, Antai Guo, Yiqing Hu, Hao Liu, Deqiang Jiang, Yinsong Liu, Bo Ren

    Abstract: Document Information Extraction (DIE) has attracted increasing attention due to its various advanced applications in the real world. Although recent literature has already achieved competitive results, these approaches usually fail when dealing with complex documents with noisy OCR results or mutative layouts. This paper proposes Generative Multi-modal Network (GMN) for real-world scenarios to add… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Accepted to NAACL 2022 main conference

  47. arXiv:2207.04684  [pdf, other

    cs.LG cs.ET

    Deep Squared Euclidean Approximation to the Levenshtein Distance for DNA Storage

    Authors: Alan J. X. Guo, Cong Liang, Qing-Hu Hou

    Abstract: Storing information in DNA molecules is of great interest because of its advantages in longevity, high storage density, and low maintenance cost. A key step in the DNA storage pipeline is to efficiently cluster the retrieved DNA sequences according to their similarities. Levenshtein distance is the most suitable metric on the similarity between two DNA sequences, but it is inferior in terms of com… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  48. arXiv:2206.13734  [pdf, other

    cs.AR cs.LG

    H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

    Authors: Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin Herbordt, Ang Li, Dingwen Tao

    Abstract: Graph Neural Networks (GNNs) have drawn tremendous attention due to their unique capability to extend Machine Learning (ML) approaches to applications broadly-defined as having unstructured data, especially graphs. Compared with other Machine Learning (ML) modalities, the acceleration of Graph Neural Networks (GNNs) is more challenging due to the irregularity and heterogeneity derived from graph t… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 8 pages, 8 figures, 4 tables, accepted by FPL'22

  49. arXiv:2205.10715  [pdf, other

    cs.LG math.OC

    Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

    Authors: Donghao Ying, Mengzi Amy Guo, Hyunin Lee, Yuhao Ding, Javad Lavaei, Zuo-Jun Max Shen

    Abstract: We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the chal… ▽ More

    Submitted 26 May, 2024; v1 submitted 21 May, 2022; originally announced May 2022.

  50. arXiv:2205.09185  [pdf, other

    physics.ins-det cs.LG hep-ex nucl-ex physics.comp-ph

    AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

    Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

    Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More

    Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: 16 pages, 18 figures, 2 appendices, 3 tables