Skip to main content

Showing 1–50 of 74 results for author: You, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.22089  [pdf, ps, other

    cs.CV

    Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule

    Authors: San Jiang, Kan You, Wanshou Jiang, Qingquan Li

    Abstract: Feature matching dominats the time costs in structure from motion (SfM). The primary contribution of this study is a GPU data schedule algorithm for efficient feature matching of Unmanned aerial vehicle (UAV) images. The core idea is to divide the whole dataset into blocks based on the matrix band reduction (MBR) and achieve efficient feature matching via GPU-accelerated cascade hashing. First, ma… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  2. arXiv:2505.09433  [pdf, ps, other

    cs.CV eess.IV

    Efficient LiDAR Reflectance Compression via Scanning Serialization

    Authors: Jiahao Zhu, Kang You, Dandan Ding, Zhan Ma

    Abstract: Reflectance attributes in LiDAR point clouds provide essential information for downstream tasks but remain underexplored in neural compression methods. To address this, we introduce SerLiC, a serialization-based neural compression framework to fully exploit the intrinsic characteristics of LiDAR reflectance. SerLiC first transforms 3D LiDAR point clouds into 1D sequences via scan-order serializati… ▽ More

    Submitted 27 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2504.17307  [pdf, other

    cs.NI

    An Extensible Software Transport Layer for GPU Networking

    Authors: Yang Zhou, Zhongjie Chen, Ziming Mao, ChonLam Lao, Shuo Yang, Pravein Govindan Kannan, Jiaqi Gao, Yilong Zhao, Yongji Wu, Kaichao You, Fengyuan Ren, Zhiying Xu, Costin Raiciu, Ion Stoica

    Abstract: Fast-evolving machine learning (ML) workloads have increasing requirements for networking. However, host network transport on RDMA NICs is hard to evolve, causing problems for ML workloads. For example, single-path RDMA traffic is prone to flow collisions that severely degrade collective communication performance. We present UCCL, an extensible software transport layer to evolve GPU networking. UC… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  4. arXiv:2504.16318  [pdf, other

    cs.LG

    Semantics at an Angle: When Cosine Similarity Works Until It Doesn't

    Authors: Kisung You

    Abstract: Cosine similarity has become a standard metric for comparing embeddings in modern machine learning. Its scale-invariance and alignment with model training objectives have contributed to its widespread adoption. However, recent studies have revealed important limitations, particularly when embedding norms carry meaningful semantic information. This informal article offers a reflective and selective… ▽ More

    Submitted 20 May, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  5. arXiv:2504.14164  [pdf, other

    stat.ML cs.LG stat.ME

    Learning over von Mises-Fisher Distributions via a Wasserstein-like Geometry

    Authors: Kisung You, Dennis Shung, Mauro Giuffrè

    Abstract: We introduce a novel, geometry-aware distance metric for the family of von Mises-Fisher (vMF) distributions, which are fundamental models for directional data on the unit hypersphere. Although the vMF distribution is widely employed in a variety of probabilistic learning tasks involving spherical data, principled tools for comparing vMF distributions remain limited, primarily due to the intractabi… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  6. arXiv:2503.23653  [pdf, other

    stat.ML cs.LG

    Scalable Geometric Learning with Correlation-Based Functional Brain Networks

    Authors: Kisung You, Yelim Lee, Hae-Jeong Park

    Abstract: The correlation matrix is a central representation of functional brain networks in neuroimaging. Traditional analyses often treat pairwise interactions independently in a Euclidean setting, overlooking the intrinsic geometry of correlation matrices. While earlier attempts have embraced the quotient geometry of the correlation manifold, they remain limited by computational inefficiency and numerica… ▽ More

    Submitted 9 April, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

  7. arXiv:2503.18292  [pdf, other

    cs.DC

    Jenga: Effective Memory Management for Serving LLM with Heterogeneity

    Authors: Chen Zhang, Kuntai Du, Shu Liu, Woosuk Kwon, Xiangxi Mo, Yufeng Wang, Xiaoxuan Liu, Kaichao You, Zhuohan Li, Mingsheng Long, Jidong Zhai, Joseph Gonzalez, Ion Stoica

    Abstract: Large language models (LLMs) are widely used but expensive to run, especially as inference workloads grow. To lower costs, maximizing the request batch size by managing GPU memory efficiently is crucial. While PagedAttention has recently been proposed to improve the efficiency of memory management, we find that the growing heterogeneity in the embeddings dimensions, attention, and access patterns… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 16 pages, 19 figures

  8. arXiv:2503.12382  [pdf, other

    cs.CV eess.IV

    RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

    Authors: Kang You, Tong Chen, Dandan Ding, M. Salman Asif, Zhan Ma

    Abstract: Despite the substantial advancements demonstrated by learning-based neural models in the LiDAR Point Cloud Compression (LPCC) task, realizing real-time compression - an indispensable criterion for numerous industrial applications - remains a formidable challenge. This paper proposes RENO, the first real-time neural codec for 3D LiDAR point clouds, achieving superior performance with a lightweight… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  9. arXiv:2412.14054  [pdf, other

    cs.CL cs.AI

    Digestion Algorithm in Hierarchical Symbolic Forests: A Fast Text Normalization Algorithm and Semantic Parsing Framework for Specific Scenarios and Lightweight Deployment

    Authors: Kevin You

    Abstract: Text Normalization and Semantic Parsing have numerous applications in natural language processing, such as natural language programming, paraphrasing, data augmentation, constructing expert systems, text matching, and more. Despite the prominent achievements of deep learning in Large Language Models (LLMs), the interpretability of neural network architectures is still poor, which affects their cre… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: 8 pages, 3 figures, 1 table

  10. arXiv:2410.18967  [pdf, other

    cs.CV cs.CL cs.LG

    Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

    Authors: Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeff Nichols, Yinfei Yang, Zhe Gan

    Abstract: Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. B… ▽ More

    Submitted 27 February, 2025; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: Accepted to ICLR 2025

  11. arXiv:2410.17823  [pdf, other

    cs.LG cs.CV eess.IV

    Att2CPC: Attention-Guided Lossy Attribute Compression of Point Clouds

    Authors: Kai Liu, Kang You, Pan Gao, Manoranjan Paul

    Abstract: With the great progress of 3D sensing and acquisition technology, the volume of point cloud data has grown dramatically, which urges the development of efficient point cloud compression methods. In this paper, we focus on the task of learned lossy point cloud attribute compression (PCAC). We propose an efficient attention-based method for lossy compression of point cloud attributes leveraging on a… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  12. arXiv:2410.02746  [pdf, other

    cs.CV cs.LG

    Contrastive Localized Language-Image Pre-Training

    Authors: Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan

    Abstract: Contrastive Language-Image Pre-training (CLIP) has been a celebrated method for training vision encoders to generate image/text representations facilitating various applications. Recently, CLIP has been widely adopted as the vision backbone of multimodal large language models (MLLMs) to connect image inputs for language interactions. The success of CLIP as a vision-language foundation model relies… ▽ More

    Submitted 19 February, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Preprint

  13. arXiv:2409.20566  [pdf, other

    cs.CV cs.CL cs.LG

    MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

    Authors: Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, Afshin Dehghan, Peter Grasch, Yinfei Yang

    Abstract: We present MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning. Building upon the MM1 architecture, MM1.5 adopts a data-centric approach to model training, systematically exploring the impact of diverse data mixtures across the entire model training lifecycle. Th… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  14. arXiv:2408.10543  [pdf, other

    cs.CV cs.AI eess.IV

    Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

    Authors: Kai Liu, Kang You, Pan Gao

    Abstract: Stable diffusion networks have emerged as a groundbreaking development for their ability to produce realistic and detailed visual content. This characteristic renders them ideal decoders, capable of producing high-quality and aesthetically pleasing reconstructions. In this paper, we introduce the first diffusion-based point cloud compression method, dubbed Diff-PCC, to leverage the expressive powe… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  15. arXiv:2408.09403  [pdf, other

    cs.AI cs.CV

    Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

    Authors: Jiahao Su, Kang You, Zekai Xu, Weizhi Xu, Zhezhi He

    Abstract: Spiking neural networks (SNNs) are becoming a promising alternative to conventional artificial neural networks (ANNs) due to their rich neural dynamics and the implementation of energy-efficient neuromorphic chips. However, the non-differential binary communication mechanism makes SNN hard to converge to an ANN-level accuracy. When SNN encounters sequence learning, the situation becomes worse due… ▽ More

    Submitted 25 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by 33rd International Conference on Artificial Neural Networks

  16. arXiv:2407.09083  [pdf, other

    cs.NE

    BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

    Authors: Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He

    Abstract: Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to th… ▽ More

    Submitted 14 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: accepted by European Conference on Computer Vision (ECCV) 2024

    Journal ref: European Conference on Computer Vision 2024

  17. arXiv:2407.08994  [pdf, other

    cs.CV

    Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

    Authors: Zihao Li, Pan Gao, Kang You, Chuan Yan, Manoranjan Paul

    Abstract: Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Atten… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  18. arXiv:2406.03470  [pdf, other

    cs.NE cs.AI

    SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

    Authors: Kang You, Zekai Xu, Chen Nie, Zhijie Deng, Qinghai Guo, Xiang Wang, Zhezhi He

    Abstract: Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy. Currently, the ANN-to-SNN conversion methods can obtain ANN on-par accuracy SNN with ultra-low latency (8 time-steps) in CNN structure on computer vision (CV) tasks. However, as Transformer-based networks have achieved prevailing precision on both CV and natural language processing… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: * These authors contributed equally to this work

    Journal ref: International Conference on Machine Learning 2024

  19. arXiv:2404.13550  [pdf, other

    cs.CV eess.IV

    Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

    Authors: Kang You, Kai Liu, Li Yu, Pan Gao, Dandan Ding

    Abstract: Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance an… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  20. arXiv:2404.06936  [pdf, other

    cs.CV cs.MM

    Efficient and Generic Point Model for Lossless Point Cloud Attribute Compression

    Authors: Kang You, Pan Gao, Zhan Ma

    Abstract: The past several years have witnessed the emergence of learned point cloud compression (PCC) techniques. However, current learning-based lossless point cloud attribute compression (PCAC) methods either suffer from high computational complexity or deteriorated compression performance. Moreover, the significant variations in point cloud scale and sparsity encountered in real-world applications make… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  21. arXiv:2404.05719  [pdf, other

    cs.CV cs.CL cs.HC

    Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

    Authors: Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

    Abstract: Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities. Given… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  22. arXiv:2404.02476  [pdf, ps, other

    math.OC cs.AI cs.LG

    Deep Reinforcement Learning for Traveling Purchaser Problems

    Authors: Haofeng Yuan, Rongping Zhu, Wanlu Yang, Shiji Song, Keyou You, Wei Fan, C. L. Philip Chen

    Abstract: The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In… ▽ More

    Submitted 2 July, 2025; v1 submitted 3 April, 2024; originally announced April 2024.

  23. arXiv:2403.13839  [pdf, other

    cs.LG cs.AI cs.PL

    depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

    Authors: Kaichao You, Runsheng Bai, Meng Cao, Jianmin Wang, Ion Stoica, Mingsheng Long

    Abstract: PyTorch \texttt{2.x} introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, adapting to the PyTorch compiler to full potential can be challenging. The compiler operates at the Python bytecode level, making it appear as an opaque box. To address this, we introduce \texttt{depyf}, a tool designed to demystify the inner workings of the PyTorch… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 16 pages, 2 figures

  24. arXiv:2401.11505  [pdf, other

    cs.CL cs.IR

    CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling

    Authors: Jawook Gu, Kihyun You, Han-Cheol Cho, Jiho Kim, Eun Kyoung Hong, Byungseok Roh

    Abstract: Free-text radiology reports present a rich data source for various medical tasks, but effectively labeling these texts remains challenging. Traditional rule-based labeling methods fall short of capturing the nuances of diverse free-text patterns. Moreover, models using expert-annotated data are limited by data scarcity and pre-defined classes, impacting their performance, flexibility and scalabili… ▽ More

    Submitted 5 November, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: 16 pages, 3 figures

  25. arXiv:2312.10072  [pdf, other

    cs.HC cs.AI cs.LG stat.AP

    Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk

    Authors: Colleen Chan, Kisung You, Sunny Chung, Mauro Giuffrè, Theo Saarinen, Niroop Rajashekar, Yuan Pu, Yeo Eun Shin, Loren Laine, Ambrose Wong, René Kizilcec, Jasjeet Sekhon, Dennis Shung

    Abstract: Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electroni… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10, 2023, New Orleans, United States, 11 pages

  26. arXiv:2311.18210  [pdf, other

    math.OC cs.DC

    Distributed Adaptive Greedy Quasi-Newton Methods with Explicit Non-asymptotic Convergence Bounds

    Authors: Yubo Du, Keyou You

    Abstract: Though quasi-Newton methods have been extensively studied in the literature, they either suffer from local convergence or use a series of line searches for global convergence which is not acceptable in the distributed setting. In this work, we first propose a line search free greedy quasi-Newton (GQN) method with adaptive steps and establish explicit non-asymptotic bounds for both the global conve… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  27. arXiv:2311.16588  [pdf

    cs.CL

    Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

    Authors: Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D. L. Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li

    Abstract: This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing f… ▽ More

    Submitted 9 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: 5 figures, 4 tables

  28. arXiv:2311.03736  [pdf, other

    cs.AI cs.LG cs.MA

    Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

    Authors: Joseph Suárez, Phillip Isola, Kyoung Whan Choe, David Bloomin, Hao Xiang Li, Nikhil Pinnaparaju, Nishaanth Kanna, Daniel Scott, Ryan Sullivan, Rose S. Shuman, Lucas de Alcântara, Herbie Bradley, Louis Castricato, Kirsty You, Yuhao Jiang, Qimai Li, Jiaxin Chen, Xiaolong Zhu

    Abstract: Neural MMO 2.0 is a massively multi-agent environment for reinforcement learning research. The key feature of this new version is a flexible task system that allows users to define a broad range of objectives and reward signals. We challenge researchers to train agents capable of generalizing to tasks, maps, and opponents never seen during training. Neural MMO features procedurally generated maps… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  29. CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

    Authors: Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh

    Abstract: A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by exp… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by MICCAI 2023

  30. arXiv:2306.12770  [pdf

    cs.CV

    3D Reconstruction of Spherical Images based on Incremental Structure from Motion

    Authors: San Jiang, Kan You, Yaxin Li, Duojie Weng, Wu Chen

    Abstract: 3D reconstruction plays an increasingly important role in modern photogrammetric systems. Conventional satellite or aerial-based remote sensing (RS) platforms can provide the necessary data sources for the 3D reconstruction of large-scale landforms and cities. Even with low-altitude UAVs (Unmanned Aerial Vehicles), 3D reconstruction in complicated situations, such as urban canyons and indoor scene… ▽ More

    Submitted 24 June, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  31. arXiv:2305.11624  [pdf, other

    cs.AI

    Efficient ConvBN Blocks for Transfer Learning and Beyond

    Authors: Kaichao You, Guo Qin, Anchang Bao, Meng Cao, Ping Huang, Jiulong Shan, Mingsheng Long

    Abstract: Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and beyond, and the Deploy mode is designed for the deployment of models. This paper focuses on th… ▽ More

    Submitted 28 February, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: ICLR 2024, camera ready version

  32. arXiv:2302.04495  [pdf, other

    cs.CV

    3D reconstruction from spherical images: A review of techniques, applications, and prospects

    Authors: San Jiang, Yaxin Li, Duojie Weng, Kan You, Wu Chen

    Abstract: 3D reconstruction plays an increasingly important role in modern photogrammetric systems. Conventional satellite or aerial-based remote sensing (RS) platforms can provide the necessary data sources for the 3D reconstruction of large-scale landforms and cities. Even with low-altitude UAVs (Unmanned Aerial Vehicles), 3D reconstruction in complicated situations, such as urban canyons and indoor scene… ▽ More

    Submitted 17 May, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

  33. arXiv:2209.04373  [pdf, other

    eess.SY cs.DM math.OC

    Optimal $(0,1)$-Matrix Completion with Majorization Ordered Objectives (To the memory of Pravin Varaiya)

    Authors: Yanfang Mo, Wei Chen, Keyou You, Li Qiu

    Abstract: We propose and examine two optimal $(0,1)$-matrix completion problems with majorization ordered objectives. They elevate the seminal study by Gale and Ryser from feasibility to optimality in partial order programming (POP), referring to optimization with partially ordered objectives. We showcase their applications in electric vehicle charging, portfolio optimization, and secure data storage. Solvi… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: 16pages, 6 figures

  34. arXiv:2208.09127  [pdf, other

    cs.CV cs.AI

    Video Interpolation by Event-driven Anisotropic Adjustment of Optical Flow

    Authors: Song Wu, Kaichao You, Weihua He, Chen Yang, Yang Tian, Yaoyuan Wang, Ziyang Zhang, Jianxing Liao

    Abstract: Video frame interpolation is a challenging task due to the ever-changing real-world scene. Previous methods often calculate the bi-directional optical flows and then predict the intermediate optical flows under the linear motion assumptions, leading to isotropic intermediate flow generation. Follow-up research obtained anisotropic adjustment through estimated higher-order motion information with e… ▽ More

    Submitted 7 December, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

    Comments: Accepted to ECCV2022; Fix a few typos in the equation and figure

  35. arXiv:2208.02519  [pdf

    cs.CV cs.IT cs.MM eess.IV

    IPDAE: Improved Patch-Based Deep Autoencoder for Lossy Point Cloud Geometry Compression

    Authors: Kang You, Pan Gao, Qing Li

    Abstract: Point cloud is a crucial representation of 3D contents, which has been widely used in many areas such as virtual reality, mixed reality, autonomous driving, etc. With the boost of the number of points in the data, how to efficiently compress point cloud becomes a challenging problem. In this paper, we propose a set of significant improvements to patch-based point cloud compression, i.e., a learnab… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: 12 pages

  36. arXiv:2204.06604  [pdf, other

    cs.CL

    EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts

    Authors: Irene Li, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Dragomir Radev

    Abstract: The Electronic Health Record (EHR) is an essential part of the modern medical system and impacts healthcare delivery, operations, and research. Unstructured text is attracting much attention despite structured information in the EHRs and has become an exciting research field. The success of the recent neural Natural Language Processing (NLP) method has led to a new direction for processing unstruc… ▽ More

    Submitted 27 June, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

  37. arXiv:2203.13859  [pdf, other

    cs.CV cs.AI

    TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation

    Authors: Weihua He, Kaichao You, Zhendong Qiao, Xu Jia, Ziyang Zhang, Wenhui Wang, Huchuan Lu, Yaoyuan Wang, Jianxing Liao

    Abstract: Recording fast motion in a high FPS (frame-per-second) requires expensive high-speed cameras. As an alternative, interpolating low-FPS videos from commodity cameras has attracted significant attention. If only low-FPS videos are available, motion assumptions (linear or quadratic) are necessary to infer intermediate frames, which fail to model complex motions. Event camera, a new camera with pixels… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022, project page https://sites.google.com/view/timereplayer/

  38. arXiv:2203.07375  [pdf, other

    cs.LG

    From Big to Small: Adaptive Learning to Partial-Set Domains

    Authors: Zhangjie Cao, Kaichao You, Ziyang Zhang, Jianmin Wang, Mingsheng Long

    Abstract: Domain adaptation targets at knowledge acquisition and dissemination from a labeled source domain to an unlabeled target domain under distribution shift. Still, the common requirement of identical class space shared across domains hinders applications of domain adaptation to partial-set domains. Recent advances show that deep pre-trained models of large scale endow rich knowledge to tackle diverse… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: accepted to TPAMI in 2022

  39. arXiv:2110.10545  [pdf, other

    cs.LG

    Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs

    Authors: Kaichao You, Yong Liu, Ziyang Zhang, Jianmin Wang, Michael I. Jordan, Mingsheng Long

    Abstract: Model hubs with many pre-trained models (PTMs) have become a cornerstone of deep learning. Although built at a high cost, they remain \emph{under-exploited} -- practitioners usually pick one PTM from the provided model hub by popularity and then fine-tune the PTM to solve the target task. This naïve but common practice poses two obstacles to full exploitation of pre-trained model hubs: first, the… ▽ More

    Submitted 14 July, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: 47 pages, camera-ready version for JMLR 2022

  40. arXiv:2110.09109  [pdf, other

    cs.CV cs.MM eess.IV

    Patch-Based Deep Autoencoder for Point Cloud Geometry Compression

    Authors: Kang You, Pan Gao

    Abstract: The ever-increasing 3D application makes the point cloud compression unprecedentedly important and needed. In this paper, we propose a patch-based compression process using deep learning, focusing on the lossy point cloud geometry compression. Unlike existing point cloud compression networks, which apply feature extraction and reconstruction on the entire point cloud, we divide the point cloud int… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: Accepted to ACM Multimedia Asia (MMAsia '21)

  41. arXiv:2107.14171  [pdf, other

    cs.LG

    Tianshou: a Highly Modularized Deep Reinforcement Learning Library

    Authors: Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, Jun Zhu

    Abstract: In this paper, we present Tianshou, a highly modularized Python library for deep reinforcement learning (DRL) that uses PyTorch as its backend. Tianshou intends to be research-friendly by providing a flexible and reliable infrastructure of DRL algorithms. It supports online and offline training with more than 20 classic algorithms through a unified interface. To facilitate related research and pro… ▽ More

    Submitted 10 August, 2022; v1 submitted 29 July, 2021; originally announced July 2021.

  42. arXiv:2106.02096  [pdf, ps, other

    stat.ML cs.LG

    Shape-Preserving Dimensionality Reduction : An Algorithm and Measures of Topological Equivalence

    Authors: Byeongsu Yu, Kisung You

    Abstract: We introduce a linear dimensionality reduction technique preserving topological features via persistent homology. The method is designed to find linear projection $L$ which preserves the persistent diagram of a point cloud $\mathbb{X}$ via simulated annealing. The projection $L$ induces a set of canonical simplicial maps from the Rips (or Čech) filtration of $\mathbb{X}$ to that of $L\mathbb{X}$.… ▽ More

    Submitted 13 June, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: 18 pages, 2 figures

  43. arXiv:2105.15097  [pdf, other

    cs.NI eess.SP

    Multiple Sources Localization with Sparse Recovery under Log-normal Shadow Fading

    Authors: Yueyan Chu, Kangyong You, Wenbin Guo

    Abstract: Localization based on received signal strength (RSS) has drawn great interest in the wireless sensor network (WSN). In this paper, we investigate the RSS-based multi-sources localization problem with unknown transmitted power under shadow fading. The log-normal shadowing effect is approximated through Fenton-Wilkinson (F-W) method and maximum likelihood estimation is adopted to optimize the RSS-ba… ▽ More

    Submitted 31 March, 2021; originally announced May 2021.

  44. arXiv:2105.06697  [pdf, other

    math.OC cs.DC cs.LG eess.SP eess.SY

    Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

    Authors: Jiaqi Zhang, Keyou You, Lihua Xie

    Abstract: Information compression is essential to reduce communication cost in distributed optimization over peer-to-peer networks. This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems. By compressing innovation vectors, which are the differences between decision vectors and their estimates, COLD is able to achieve line… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: 14 pages

  45. arXiv:2102.11005  [pdf, other

    cs.LG cs.AI

    LogME: Practical Assessment of Pre-trained Models for Transfer Learning

    Authors: Kaichao You, Yong Liu, Jianmin Wang, Mingsheng Long

    Abstract: This paper studies task adaptive pre-trained model selection, an underexplored problem of assessing pre-trained models for the target task and select best ones from the model zoo \emph{without fine-tuning}. A few pilot works addressed the problem in transferring supervised pre-trained models to classification tasks, but they cannot handle emerging unsupervised pre-trained models or regression task… ▽ More

    Submitted 23 June, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: 13 pages (ICML 2021 camera ready version)

  46. arXiv:2011.10931  [pdf, other

    eess.SY cs.LG

    Primal-dual Learning for the Model-free Risk-constrained Linear Quadratic Regulator

    Authors: Feiran Zhao, Keyou You

    Abstract: Risk-aware control, though with promise to tackle unexpected events, requires a known exact dynamical model. In this work, we propose a model-free framework to learn a risk-aware controller with a focus on the linear system. We formulate it as a discrete-time infinite-horizon LQR problem with a state predictive variance constraint. To solve it, we parameterize the policy with a feedback gain pair… ▽ More

    Submitted 30 May, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: To appear in the Annual Conference on Learning for Dynamics and Control (L4DC) 2021

  47. arXiv:2010.15313  [pdf, other

    cs.CL

    "where is this relationship going?": Understanding Relationship Trajectories in Narrative Text

    Authors: Keen You, Dan Goldwasser

    Abstract: We examine a new commonsense reasoning task: given a narrative describing a social interaction that centers on two protagonists, systems make inferences about the underlying relationship trajectory. Specifically, we propose two evaluation tasks: Relationship Outlook Prediction MCQ and Resolution Prediction MCQ. In Relationship Outlook Prediction, a system maps an interaction to a relationship outl… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: Accepted to *Sem 2020

  48. arXiv:2009.04170  [pdf, other

    cs.CV

    Diversified Mutual Learning for Deep Metric Learning

    Authors: Wonpyo Park, Wonjae Kim, Kihyun You, Minsu Cho

    Abstract: Mutual learning is an ensemble training strategy to improve generalization by transferring individual knowledge to each other while simultaneously training multiple models. In this work, we propose an effective mutual learning method for deep metric learning, called Diversified Mutual Metric Learning, which enhances embedding models with diversified mutual learning. We transfer relational knowledg… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

    Comments: Accepted to ECCV Workshop 2020

  49. arXiv:2008.03405  [pdf, other

    eess.AS cs.SD

    Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection

    Authors: Takuya Higuchi, Mohammad Ghasemzadeh, Kisun You, Chandra Dhir

    Abstract: We propose a stacked 1D convolutional neural network (S1DCNN) for end-to-end small footprint voice trigger detection in a streaming scenario. Voice trigger detection is an important speech application, with which users can activate their devices by simply saying a keyword or phrase. Due to privacy and latency reasons, a voice trigger detection system should run on an always-on processor on device.… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020

  50. Rdimtools: An R package for Dimension Reduction and Intrinsic Dimension Estimation

    Authors: Kisung You

    Abstract: Discovering patterns of the complex high-dimensional data is a long-standing problem. Dimension Reduction (DR) and Intrinsic Dimension Estimation (IDE) are two fundamental thematic programs that facilitate geometric understanding of the data. We present Rdimtools - an R package that supports 133 DR and 17 IDE algorithms whose extent makes multifaceted scrutiny of the data in one place easier. Rdim… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.