Skip to main content

Showing 1–50 of 68 results for author: Hong, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.19867  [pdf, other

    cs.CL cs.DC cs.LG

    semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage

    Authors: Ke Hong, Lufang Chen, Zhong Wang, Xiuhong Li, Qiuli Mao, Jianping Ma, Chao Xiong, Guanyu Wu, Buhe Han, Guohao Dai, Yun Liang, Yu Wang

    Abstract: Existing large language model (LLM) serving systems fall into two categories: 1) a unified system where prefill phase and decode phase are co-located on the same GPU, sharing the unified computational resource and storage, and 2) a disaggregated system where the two phases are disaggregated to different GPUs. The design of the disaggregated system addresses the latency interference and sophisticat… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 18 pages, 16 figures

  2. arXiv:2504.19519  [pdf, other

    cs.DC cs.CL cs.LG

    FlashOverlap: A Lightweight Design for Efficiently Overlapping Communication and Computation

    Authors: Ke Hong, Xiuhong Li, Minxu Liu, Qiuli Mao, Tianqi Wu, Zixiao Huang, Lufang Chen, Zhong Wang, Yichong Zhang, Zhenhua Zhu, Guohao Dai, Yu Wang

    Abstract: Generative models have achieved remarkable success across various applications, driving the demand for multi-GPU computing. Inter-GPU communication becomes a bottleneck in multi-GPU computing systems, particularly on consumer-grade GPUs. By exploiting concurrent hardware execution, overlapping computation and communication latency is an effective technique for mitigating the communication overhead… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 17 pages, 11 figures, 4 tables

  3. arXiv:2504.15482  [pdf, other

    cs.HC

    From Overload to Insight: Scaffolding Creative Ideation through Structuring Inspiration

    Authors: Yaqing Yang, Vikram Mohanty, Nikolas Martelaro, Aniket Kittur, Yan-Ying Chen, Matthew K. Hong

    Abstract: Creative ideation relies on exploring diverse stimuli, but the overwhelming abundance of information often makes it difficult to identify valuable insights or reach the `aha' moment. Traditional methods for accessing design stimuli lack organization and fail to support users in discovering promising opportunities within large idea spaces. In this position paper, we explore how AI can be leveraged… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 2025 CHI workshop

  4. arXiv:2504.11997  [pdf, ps, other

    cs.LG cs.AI

    A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs

    Authors: Kihyuk Hong, Ambuj Tewari

    Abstract: We study reinforcement learning in infinite-horizon average-reward settings with linear MDPs. Previous work addresses this problem by approximating the average-reward setting by discounted setting and employing a value iteration-based algorithm that uses clipping to constrain the span of the value function for improved statistical efficiency. However, the clipping procedure requires computing the… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  5. arXiv:2503.13180  [pdf, other

    cs.LG cs.AI cs.DC

    GC-Fed: Gradient Centralized Federated Learning with Partial Client Participation

    Authors: Jungwon Seo, Ferhat Ozgur Catak, Chunming Rong, Kibeom Hong, Minhoe Kim

    Abstract: Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but is challenged by client drift in highly heterogeneous data settings. Many existing drift-mitigation strategies rely on reference-based techniques--such as gradient adjustments or proximal loss--that use historical snapshots (e.g., past gradients or previous global models) as reference points. When only a… ▽ More

    Submitted 20 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  6. BioSpark: Beyond Analogical Inspiration to LLM-augmented Transfer

    Authors: Hyeonsu Kang, David Chuan-en Lin, Yan-Ying Chen, Matthew K. Hong, Nikolas Martelaro, Aniket Kittur

    Abstract: We present BioSpark, a system for analogical innovation designed to act as a creativity partner in reducing the cognitive effort in finding, mapping, and creatively adapting diverse inspirations. While prior approaches have focused on initial stages of finding inspirations, BioSpark uses LLMs embedded in a familiar, visual, Pinterest-like interface to go beyond inspiration to supporting users in i… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Journal ref: ACM CHI 2025

  7. arXiv:2501.18588  [pdf, other

    cs.HC cs.AI cs.CV cs.MM

    Inkspire: Supporting Design Exploration with Generative AI through Analogical Sketching

    Authors: David Chuan-En Lin, Hyeonsu B. Kang, Nikolas Martelaro, Aniket Kittur, Yan-Ying Chen, Matthew K. Hong

    Abstract: With recent advancements in the capabilities of Text-to-Image (T2I) AI models, product designers have begun experimenting with them in their work. However, T2I models struggle to interpret abstract language and the current user experience of T2I tools can induce design fixation rather than a more iterative, exploratory process. To address these challenges, we developed Inkspire, a sketch-driven to… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: Accepted to CHI 2025

  8. arXiv:2501.07430  [pdf, other

    cs.CV cs.AI

    Introducing 3D Representation for Medical Image Volume-to-Volume Translation via Score Fusion

    Authors: Xiyue Zhu, Dou Hoon Kwark, Ruike Zhu, Kaiwen Hong, Yiqi Tao, Shirui Luo, Yudu Li, Zhi-Pei Liang, Volodymyr Kindratenko

    Abstract: In volume-to-volume translations in medical images, existing models often struggle to capture the inherent volumetric distribution using 3D voxelspace representations, due to high computational dataset demands. We present Score-Fusion, a novel volumetric translation model that effectively learns 3D representations by ensembling perpendicularly trained 2D diffusion models in score function space. B… ▽ More

    Submitted 6 February, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

  9. arXiv:2501.07108  [pdf, other

    cs.AI

    How GPT learns layer by layer

    Authors: Jason Du, Kelly Hong, Alishba Imran, Erfan Jahanparast, Mehdi Khfifi, Kaichun Qiao

    Abstract: Large Language Models (LLMs) excel at tasks like language processing, strategy games, and reasoning but struggle to build generalizable internal representations essential for adaptive decision-making in agents. For agents to effectively navigate complex environments, they must construct reliable world models. While LLMs perform well on specific benchmarks, they often fail to generalize, leading to… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  10. arXiv:2501.02429  [pdf, other

    cs.IR

    Citation Structural Diversity: A Novel and Concise Metric Combining Structure and Semantics for Literature Evaluation

    Authors: Mingyue Kong, Yinglong Zhang, Likun Sheng, Kaifeng Hong

    Abstract: As academic research becomes increasingly diverse, traditional literature evaluation methods face significant limitations,particularly in capturing the complexity of academic dissemination and the multidimensional impacts of literature. To address these challenges, this paper introduces a novel literature evaluation model of citation structural diversity, with a focus on assessing its feasibility… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: 18 pages, 10 figures

  11. arXiv:2412.19509  [pdf, other

    cs.CV cs.AI

    MBQ: Modality-Balanced Quantization for Large Vision-Language Models

    Authors: Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang

    Abstract: Vision-Language Models (VLMs) have enabled a variety of real-world applications. The large parameter size of VLMs brings large memory and computation overhead which poses significant challenges for deployment. Post-Training Quantization (PTQ) is an effective technique to reduce the memory and computation overhead. Existing PTQ methods mainly focus on large language models (LLMs), without consideri… ▽ More

    Submitted 21 March, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

  12. arXiv:2411.12150  [pdf, other

    cs.RO cs.AI cs.LG

    HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

    Authors: Shuijing Liu, Haochen Xia, Fatemeh Cheraghi Pouria, Kaiwen Hong, Neeloy Chakraborty, Zichao Hu, Joydeep Biswas, Katherine Driggs-Campbell

    Abstract: We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framewor… ▽ More

    Submitted 1 May, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  13. arXiv:2411.07447  [pdf, other

    cs.PF cs.AI

    Optimizing LLM Inference for Database Systems: Cost-Aware Scheduling for Concurrent Requests

    Authors: Kyoungmin Kim, Kijae Hong, Caglar Gulcehre, Anastasia Ailamaki

    Abstract: LLMs are increasingly used inside database systems and in database applications for better complexity management and decision-making, where LLM inferences require significant GPU costs. LLM inference systems, however, are slow compared to database systems, limiting the expansion of the use of LLMs inside database systems. This paper first analyzes the LLM inference performance and focuses on a dat… ▽ More

    Submitted 16 April, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

  14. arXiv:2410.19606  [pdf, other

    cs.CV cs.RO

    Multi-modal Motion Prediction using Temporal Ensembling with Learning-based Aggregation

    Authors: Kai-Yin Hong, Chieh-Chih Wang, Wen-Chieh Lin

    Abstract: Recent years have seen a shift towards learning-based methods for trajectory prediction, with challenges remaining in addressing uncertainty and capturing multi-modal distributions. This paper introduces Temporal Ensembling with Learning-based Aggregation, a meta-algorithm designed to mitigate the issue of missing behaviors in trajectory prediction, which leads to inconsistent predictions across c… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), accepted by IROS2024

  15. arXiv:2410.17592  [pdf, other

    cs.LG

    A Kernel Perspective on Distillation-based Collaborative Learning

    Authors: Sejun Park, Kihun Hong, Ganguk Hwang

    Abstract: Over the past decade, there is a growing interest in collaborative learning that can enhance AI models of multiple parties. However, it is still challenging to enhance performance them without sharing private data and models from individual parties. One recent promising approach is to develop distillation-based algorithms that exploit unlabeled public data but the results are still unsatisfactory… ▽ More

    Submitted 30 October, 2024; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024

  16. arXiv:2410.14992  [pdf, other

    cs.LG math.OC

    Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span

    Authors: Woojin Chae, Kihyuk Hong, Yufan Zhang, Ambuj Tewari, Dabeen Lee

    Abstract: This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear mixture Markov decision processes (MDPs) under the Bellman optimality condition. Our algorithm for linear mixture MDPs achieves a nearly minimax optimal regret upper bound of $\widetilde{\mathcal{O}}(d\sqrt{\mathrm{sp}(v^*)T})$ over $T$ time steps where $\mathrm{sp}(v^*)$ is the span of th… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  17. arXiv:2410.04123  [pdf

    eess.IV cs.CV cs.LG physics.comp-ph physics.optics

    WAVE-UNET: Wavelength based Image Reconstruction method using attention UNET for OCT images

    Authors: Maryam Viqar, Erdem Sahin, Violeta Madjarova, Elena Stoykova, Keehoon Hong

    Abstract: In this work, we propose to leverage a deep-learning (DL) based reconstruction framework for high quality Swept-Source Optical Coherence Tomography (SS-OCT) images, by incorporating wavelength (λ) space interferometric fringes. Generally, the SS-OCT captured fringe is linear in wavelength space and if Inverse Discrete Fourier Transform (IDFT) is applied to extract depth-resolved spectral informati… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  18. arXiv:2407.10558  [pdf, other

    cs.CV cs.LG

    ConTEXTure: Consistent Multiview Images to Texture

    Authors: Jaehoon Ahn, Sumin Cho, Harim Jung, Kibeom Hong, Seonghoon Ban, Moon-Ryul Jung

    Abstract: We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  19. arXiv:2407.02450  [pdf, other

    q-bio.QM cs.IT q-bio.NC

    Message-Relevant Dimension Reduction of Neural Populations

    Authors: Amanda Merkley, Alice Y. Nam, Y. Kate Hong, Pulkit Grover

    Abstract: Quantifying relevant interactions between neural populations is a prominent question in the analysis of high-dimensional neural recordings. However, existing dimension reduction methods often discuss communication in the absence of a formal framework, while frameworks proposed to address this gap are impractical in data analysis. This work bridges the formal framework of M-Information Flow with pr… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  20. arXiv:2407.00299  [pdf, other

    cs.RO cs.AI cs.CV cs.HC cs.LG

    Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

    Authors: Shengcheng Luo, Quanquan Peng, Jun Lv, Kaiwen Hong, Katherine Rose Driggs-Campbell, Cewu Lu, Yong-Lu Li

    Abstract: Employing a teleoperation system for gathering demonstrations offers the potential for more efficient learning of robot manipulation. However, teleoperating a robot arm equipped with a dexterous hand or gripper, via a teleoperation system presents inherent challenges due to the task's high dimensionality, complexity of motion, and differences between physiological structures. In this study, we int… ▽ More

    Submitted 21 October, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures

  21. arXiv:2406.08815  [pdf, other

    cs.RO eess.SY

    Deep Reinforcement Learning-based Quadcopter Controller: A Practical Approach and Experiments

    Authors: Truong-Dong Do, Nguyen Xuan Mung, Sung Kyung Hong

    Abstract: Quadcopters have been studied for decades thanks to their maneuverability and capability of operating in a variety of circumstances. However, quadcopters suffer from dynamical nonlinearity, actuator saturation, as well as sensor noise that make it challenging and time consuming to obtain accurate dynamic models and achieve satisfactory control performance. Fortunately, deep reinforcement learning… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures, 3 tables

  22. arXiv:2405.16830  [pdf, other

    cs.RO cs.AI cs.LG

    Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

    Authors: Shuijing Liu, Kaiwen Hong, Neeloy Chakraborty, Katherine Driggs-Campbell

    Abstract: We investigate the feasibility of deploying reinforcement learning (RL) policies for constrained crowd navigation using a low-fidelity simulator. We introduce a representation of the dynamic environment, separating human and obstacle representations. Humans are represented through detected states, while obstacles are represented as computed point clouds based on maps and robot localization. This r… ▽ More

    Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  23. arXiv:2405.15050  [pdf, ps, other

    stat.ML cs.LG

    Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs

    Authors: Kihyuk Hong, Woojin Chae, Yufan Zhang, Dabeen Lee, Ambuj Tewari

    Abstract: We study the problem of infinite-horizon average-reward reinforcement learning with linear Markov decision processes (MDPs). The associated Bellman operator of the problem not being a contraction makes the algorithm design challenging. Previous approaches either suffer from computational inefficiency or require strong assumptions on dynamics, such as ergodicity, for achieving a regret bound of… ▽ More

    Submitted 10 March, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

  24. arXiv:2405.08172  [pdf, other

    cs.CL cs.AI

    CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation

    Authors: Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic

    Abstract: This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. The main objectives of the study are to develop a model that can effectively translate Cantonese to English and evaluate it against state-of-the-art commercial models. To achieve this, a new parallel corpus h… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: on-going work, 30 pages

  25. arXiv:2404.14294  [pdf, other

    cs.CL cs.AI

    A Survey on Efficient Inference for Large Language Models

    Authors: Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

    Abstract: Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards developing techniques aimed at enhancing the efficiency of LLM inference. This p… ▽ More

    Submitted 19 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  26. arXiv:2403.11346  [pdf, other

    cs.CL cs.AI

    CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data

    Authors: Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic

    Abstract: Neural Machine Translation (NMT) for low-resource languages is still a challenging task in front of NLP researchers. In this work, we deploy a standard data augmentation methodology by back-translation to a new language translation direction Cantonese-to-English. We present the models we fine-tuned using the limited amount of real data and the synthetic data we generated using back-translation inc… ▽ More

    Submitted 9 June, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by: The 25th Annual Conference of The European Association for Machine Translation, 24 - 27 June 2024, Sheffield, UK (forthcoming)

  27. arXiv:2402.04493  [pdf, ps, other

    stat.ML cs.LG

    A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

    Authors: Kihyuk Hong, Ambuj Tewari

    Abstract: We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $ε$-optimal policy with $O(ε^{-2})$ s… ▽ More

    Submitted 2 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  28. arXiv:2401.11505  [pdf, other

    cs.CL cs.IR

    CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling

    Authors: Jawook Gu, Kihyun You, Han-Cheol Cho, Jiho Kim, Eun Kyoung Hong, Byungseok Roh

    Abstract: Free-text radiology reports present a rich data source for various medical tasks, but effectively labeling these texts remains challenging. Traditional rule-based labeling methods fall short of capturing the nuances of diverse free-text patterns. Moreover, models using expert-annotated data are limited by data scarcity and pre-defined classes, impacting their performance, flexibility and scalabili… ▽ More

    Submitted 5 November, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: 16 pages, 3 figures

  29. arXiv:2312.11388  [pdf, other

    cs.HC

    BioSpark: An End-to-End Generative System for Biological-Analogical Inspirations and Ideation

    Authors: Hyeonsu B. Kang, David Chuan-En Lin, Nikolas Martelaro, Aniket Kittur, Yan-Ying Chen, Matthew K. Hong

    Abstract: Nature is often used to inspire solutions for complex engineering problems, but achieving its full potential is challenging due to difficulties in discovering relevant analogies and synthesizing from them. Here, we present an end-to-end system, BioSpark, that generates biological-analogical mechanisms and provides an interactive interface to comprehend and synthesize from them. BioSpark pipeline s… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 Workshop on Machine Learning for Creativity and Design

  30. arXiv:2311.12862  [pdf, other

    cs.DC cs.CV cs.LG cs.PF

    TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

    Authors: Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

    Abstract: Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to i… ▽ More

    Submitted 25 October, 2023; originally announced November 2023.

    Comments: MICRO 2023; Haotian Tang and Shang Yang contributed equally to this project

  31. arXiv:2311.01282  [pdf, other

    cs.LG cs.CL

    FlashDecoding++: Faster Large Language Model Inference on GPUs

    Authors: Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong, Yu Wang

    Abstract: As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1) Synchronized partial softmax update. The softmax operation requires a synchronized update operation among each partial softmax result, leading to ~20% overheads for the attention computation in LLMs. (2) Under-utilized compu… ▽ More

    Submitted 5 January, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  32. CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

    Authors: Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh

    Abstract: A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by exp… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by MICCAI 2023

  33. arXiv:2309.16873  [pdf, other

    cs.RO

    Predicting Object Interactions with Behavior Primitives: An Application in Stowing Tasks

    Authors: Haonan Chen, Yilong Niu, Kaiwen Hong, Shuijing Liu, Yixuan Wang, Yunzhu Li, Katherine Driggs-Campbell

    Abstract: Stowing, the task of placing objects in cluttered shelves or bins, is a common task in warehouse and manufacturing operations. However, this task is still predominantly carried out by human workers as stowing is challenging to automate due to the complex multi-object interactions and long-horizon nature of the task. Previous works typically involve extensive data collection and costly human labeli… ▽ More

    Submitted 3 November, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Project Page: https://haonan16.github.io/stow_page/ 16 pages, 9 figures, Accepted for an oral presentation at CoRL 2023

  34. arXiv:2309.06933  [pdf, other

    cs.CV

    DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

    Authors: Namhyuk Ahn, Junsoo Lee, Chunggi Lee, Kunhee Kim, Daesik Kim, Seung-Hun Nam, Kibeom Hong

    Abstract: Recent progresses in large-scale text-to-image models have yielded remarkable accomplishments, finding various applications in art domain. However, expressing unique characteristics of an artwork (e.g. brushwork, colortone, or composition) with text prompts alone may encounter limitations due to the inherent constraints of verbal description. To this end, we introduce DreamStyler, a novel framewor… ▽ More

    Submitted 18 December, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: AAAI 2024

  35. arXiv:2308.10554  [pdf, other

    cs.CV

    Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

    Authors: Seogkyu Jeon, Bei Liu, Pilhyeon Lee, Kibeom Hong, Jianlong Fu, Hyeran Byun

    Abstract: Training deep generative models usually requires a large amount of data. To alleviate the data collection cost, the task of zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain without any further training samples. Due to the data absence, the textual description of the target domain and the vision-language models, e.g., CLIP, are utilized… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023 (poster)

  36. arXiv:2307.09724  [pdf, other

    cs.CV

    AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks

    Authors: Kibeom Hong, Seogkyu Jeon, Junsoo Lee, Namhyuk Ahn, Kunhee Kim, Pilhyeon Lee, Daesik Kim, Youngjung Uh, Hyeran Byun

    Abstract: To deliver the artistic expression of the target style, recent studies exploit the attention mechanism owing to its ability to map the local patches of the style image to the corresponding patches of the content image. However, because of the low semantic correspondence between arbitrary content and artworks, the attention module repeatedly abuses specific local patches from the style image, resul… ▽ More

    Submitted 8 August, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023. Code is available at this https://github.com/Kibeom-Hong/AesPA-Net

  37. arXiv:2307.08209  [pdf, other

    cs.CV

    Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

    Authors: Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles. One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redu… ▽ More

    Submitted 8 August, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: Accepted at ICCV2023

  38. arXiv:2307.06924  [pdf, other

    cs.RO cs.AI cs.CL cs.HC cs.LG

    DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding

    Authors: Shuijing Liu, Aamir Hasan, Kaiwen Hong, Runxuan Wang, Peixin Chang, Zachary Mizrachi, Justin Lin, D. Livingston McPherson, Wendy A. Rogers, Katherine Driggs-Campbell

    Abstract: Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associ… ▽ More

    Submitted 5 March, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Published in IEEE Robotics and Automation Letters (RA-L)

  39. arXiv:2306.15774  [pdf

    cs.HC cs.CL cs.CV cs.LG

    Next Steps for Human-Centered Generative AI: A Technical Perspective

    Authors: Xiang 'Anthony' Chen, Jeff Burke, Ruofei Du, Matthew K. Hong, Jennifer Jacobs, Philippe Laban, Dingzeyu Li, Nanyun Peng, Karl D. D. Willis, Chien-Sheng Wu, Bolei Zhou

    Abstract: Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary… ▽ More

    Submitted 22 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  40. arXiv:2306.09591  [pdf, other

    cs.CV cs.RO

    A Vision-based Autonomous Perching Approach for Nano Aerial Vehicles

    Authors: Truong-Dong Do, Sung Kyung Hong

    Abstract: Over the past decades, quadcopters have been investigated, due to their mobility and flexibility to operate in a wide range of environments. They have been used in various areas, including surveillance and monitoring. During a mission, drones do not have to remain active once they have reached a target location. To conserve energy and maintain a static position, it is possible to perch and stop th… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: 6 pages, 6 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2304.14838

  41. arXiv:2306.07818  [pdf, other

    cs.LG stat.ML

    A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

    Authors: Kihyuk Hong, Yuhang Li, Ambuj Tewari

    Abstract: Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function… ▽ More

    Submitted 19 October, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

  42. arXiv:2306.01217  [pdf, ps, other

    cs.HC

    Generative AI for Product Design: Getting the Right Design and the Design Right

    Authors: Matthew K. Hong, Shabnam Hakimi, Yan-Ying Chen, Heishiro Toyoda, Charlene Wu, Matt Klenk

    Abstract: Generative AI (GenAI) models excel in their ability to recognize patterns in existing data and generate new and unexpected content. Recent advances have motivated applications of GenAI tools (e.g., Stable Diffusion, ChatGPT) to professional practice across industries, including product design. While these generative capabilities may seem enticing on the surface, certain barriers limit their practi… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  43. arXiv:2305.15194  [pdf, other

    cs.CV cs.AI cs.LG

    DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

    Authors: Sungnyun Kim, Junsoo Lee, Kibeom Hong, Daesik Kim, Namhyuk Ahn

    Abstract: In this study, we aim to extend the capabilities of diffusion-based text-to-image (T2I) generation models by incorporating diverse modalities beyond textual description, such as sketch, box, color palette, and style embedding, within a single model. We thus design a multimodal T2I diffusion model, coined as DiffBlender, by separating the channels of conditions into three types, i.e., image forms,… ▽ More

    Submitted 21 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Project page: https://sungnyun.github.io/diffblender/

  44. arXiv:2302.09144  [pdf

    cs.RO

    Designing a Wayfinding Robot for People with Visual Impairments

    Authors: Shuijing Liu, Aamir Hasan, Kaiwen Hong, Chun-Kai Yao, Justin Lin, Weihang Liang, Megan A. Bayles, Wendy A. Rogers, Katherine Driggs-Campbell

    Abstract: People with visual impairments (PwVI) often have difficulties navigating through unfamiliar indoor environments. However, current wayfinding tools are fairly limited. In this short paper, we present our in-progress work on a wayfinding robot for PwVI. The robot takes an audio command from the user that specifies the intended destination. Then, the robot autonomously plans a path to navigate to the… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Presented at ICRA 2022 Workshop on Intelligent Control Methods and Machine Learning Algorithms for Human-Robot Interaction and Assistive Robotics

  45. arXiv:2301.09749  [pdf, other

    cs.RO

    A Data-Efficient Visual-Audio Representation with Intuitive Fine-tuning for Voice-Controlled Robots

    Authors: Peixin Chang, Shuijing Liu, Tianchen Ji, Neeloy Chakraborty, Kaiwen Hong, Katherine Driggs-Campbell

    Abstract: A command-following robot that serves people in everyday life must continually improve itself in deployment domains with minimal help from its end users, instead of engineers. Previous methods are either difficult to continuously improve after the deployment or require a large number of new labels during fine-tuning. Motivated by (self-)supervised contrastive learning, we propose a novel represent… ▽ More

    Submitted 16 October, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

    Comments: Published at Conference on Robot Learning (CoRL), 2023

  46. arXiv:2301.00181  [pdf, other

    cs.NE cs.LG

    Smooth Mathematical Function from Compact Neural Networks

    Authors: I. K. Hong

    Abstract: This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; cons… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

  47. arXiv:2212.09555  [pdf, other

    cs.CV

    Interactive Cartoonization with Controllable Perceptual Factors

    Authors: Namhyuk Ahn, Patrick Kwon, Jihye Back, Kibeom Hong, Seungkwon Kim

    Abstract: Cartoonization is a task that renders natural photos into cartoon styles. Previous deep cartoonization methods only have focused on end-to-end translation, which may hinder editability. Instead, we propose a novel solution with editing features of texture and color based on the cartoon creation process. To do that, we design a model architecture to have separate decoders, texture and color, to dec… ▽ More

    Submitted 8 March, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  48. Exploiting Domain Transferability for Collaborative Inter-level Domain Adaptive Object Detection

    Authors: Mirae Do, Seogkyu Jeon, Pilhyeon Lee, Kibeom Hong, Yu-seung Ma, Hyeran Byun

    Abstract: Domain adaptation for object detection (DAOD) has recently drawn much attention owing to its capability of detecting target objects without any annotations. To tackle the problem, previous works focus on aligning features extracted from partial levels (e.g., image-level, instance-level, RPN-level) in a two-stage detector via adversarial training. However, individual levels in the object detection… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted to Expert Systems with Applications. The first three authors contributed equally

    Journal ref: Expert Systems with Applications 205 (2022): 117697

  49. arXiv:2206.09842  [pdf, other

    cs.CV cs.CY

    Practical Deepfake Detection: Vulnerabilities in Global Contexts

    Authors: Yang A. Chuming, Daniel J. Wu, Ken Hong

    Abstract: Recent advances in deep learning have enabled realistic digital alterations to videos, known as deepfakes. This technology raises important societal concerns regarding disinformation and authenticity, galvanizing the development of numerous deepfake detection algorithms. At the same time, there are significant differences between training data and in-the-wild video data, which may undermine their… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: 6 pages, 6 figures, presented as a workshop paper at Responsible AI @ ICLR 2021

  50. arXiv:2205.14775  [pdf, other

    stat.ML cs.LG

    An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

    Authors: Kihyuk Hong, Yuhang Li, Ambuj Tewari

    Abstract: We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret… ▽ More

    Submitted 19 February, 2023; v1 submitted 29 May, 2022; originally announced May 2022.