Skip to main content

Showing 1–18 of 18 results for author: Kou, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.00411  [pdf, ps, other

    cs.RO cs.AI

    LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks

    Authors: Yi Yang, Jiaxuan Sun, Siqi Kou, Yihan Wang, Zhijie Deng

    Abstract: Real-world embodied agents face long-horizon tasks, characterized by high-level goals demanding multi-step solutions beyond single actions. Successfully navigating these requires both high-level task planning (i.e., decomposing goals into sub-tasks) and low-level motion control (i.e., generating precise robot actions). While existing vision language action (VLA) models and hierarchical architectur… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  2. arXiv:2505.22525  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Thinking with Generated Images

    Authors: Ethan Chern, Zhulin Hu, Steffi Chern, Siqi Kou, Jiadi Su, Yan Ma, Zhijie Deng, Pengfei Liu

    Abstract: We present Thinking with Generated Images, a novel paradigm that fundamentally transforms how large multimodal models (LMMs) engage with visual reasoning by enabling them to natively think across text and vision modalities through spontaneous generation of intermediate visual thinking steps. Current visual reasoning with LMMs is constrained to either processing fixed user-provided images or reason… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  3. arXiv:2505.21723  [pdf, ps, other

    stat.CO cs.LG stat.ML

    Are Statistical Methods Obsolete in the Era of Deep Learning?

    Authors: Skyler Wu, Shihao Yang, S. C. Kou

    Abstract: In the era of AI, neural networks have become increasingly popular for modeling, inference, and prediction, largely due to their potential for universal approximation. With the proliferation of such deep learning models, a question arises: are leaner statistical methods still relevant? To shed insight on this question, we employ the mechanistic nonlinear ordinary differential equation (ODE) invers… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 35 pages, 11 figures (main text)

  4. arXiv:2505.19949  [pdf, ps, other

    cs.LG

    Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

    Authors: Siqi Kou, Qingyuan Tian, Hanwen Xu, Zihao Zeng, Zhijie Deng

    Abstract: Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding, often bolstered by post-training on the chain-of-thoughts (CoTs) generated by stronger models. However, existing strategies for curating such training data predominantly rely on heuristics, limiting generalizability and failing to capture subtleties underlying in data. To address these limitations,… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  5. NLGR: Utilizing Neighbor Lists for Generative Rerank in Personalized Recommendation Systems

    Authors: Shuli Wang, Xue Wei, Senjie Kou, Chi Wang, Wenshuai Chen, Qi Tang, Yinhua Zhu, Xiong Xiao, Xingxing Wang

    Abstract: Reranking plays a crucial role in modern multi-stage recommender systems by rearranging the initial ranking list. Due to the inherent challenges of combinatorial search spaces, some current research adopts an evaluator-generator paradigm, with a generator generating feasible sequences and an evaluator selecting the best sequence based on the estimated list utility. However, these methods still fac… ▽ More

    Submitted 11 February, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW 2025 Industry Track

  6. arXiv:2412.00127  [pdf, other

    cs.CV cs.AI cs.CL

    Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

    Authors: Siqi Kou, Jiachun Jin, Zhihong Liu, Chang Liu, Ye Ma, Jian Jia, Quan Chen, Peng Jiang, Zhijie Deng

    Abstract: We introduce Orthus, an autoregressive (AR) transformer that excels in generating images given textual prompts, answering questions based on visual inputs, and even crafting lengthy image-text interleaved contents. Unlike prior arts on unified multimodal modeling, Orthus simultaneously copes with discrete text tokens and continuous image features under the AR modeling principle. The continuous tre… ▽ More

    Submitted 16 April, 2025; v1 submitted 28 November, 2024; originally announced December 2024.

  7. arXiv:2411.08488  [pdf

    eess.IV cs.CV

    UNSCT-HRNet: Modeling Anatomical Uncertainty for Landmark Detection in Total Hip Arthroplasty

    Authors: Jiaxin Wan, Lin Liu, Haoran Wang, Liangwei Li, Wei Li, Shuheng Kou, Runtian Li, Jiayi Tang, Juanxiu Liu, Jing Zhang, Xiaohui Du, Ruqian Hao

    Abstract: Total hip arthroplasty (THA) relies on accurate landmark detection from radiographic images, but unstructured data caused by irregular patient postures or occluded anatomical markers pose significant challenges for existing methods. To address this, we propose UNSCT-HRNet (Unstructured CT - High-Resolution Net), a deep learning-based framework that integrates a Spatial Relationship Fusion (SRF) mo… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  8. arXiv:2410.14731  [pdf, other

    cs.LG cs.AI cs.CL

    MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

    Authors: Bokai Lin, Zihao Zeng, Zipeng Xiao, Siqi Kou, Tianqi Hou, Xiaofeng Gao, Hao Zhang, Zhijie Deng

    Abstract: KV cache has become a de facto technique for the inference of large language models (LLMs), where tensors of shape (layer number, head number, sequence length, feature dimension) are introduced to cache historical information for self-attention. As the size of the model and data grows, the KV cache can quickly become a bottleneck within the system in both storage and memory transfer. To address th… ▽ More

    Submitted 16 May, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

  9. arXiv:2404.12305  [pdf, other

    cs.NI

    SAFLA: Semantic-aware Full Lifecycle Assurance Designed for Intent-Driven Networks

    Authors: Shiwen Kou, Chungang Yang, Mingji Wu

    Abstract: Intent-driven Networks (IDNs) are crucial in enhancing network management efficiency by enabling the translation of high-level intents into executable configurations via a top-down approach. The escalating complexity of network architectures, however, has led to a semantic gap between these intents and their actual configurations, posing significant challenges to the accuracy and reliability of ID… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 11 pages, 10 figures, 3 tables

  10. arXiv:2403.00835  [pdf, other

    cs.CL cs.AI

    CLLMs: Consistency Large Language Models

    Authors: Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang

    Abstract: Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation. However, in practice, it achieves little speedup compared to traditional autoregressive (AR) decoding, primarily because Jacobi decoding seldom accurately predicts more than one token in a s… ▽ More

    Submitted 13 June, 2024; v1 submitted 28 February, 2024; originally announced March 2024.

    Comments: In the proceedings of the 41st International Conference on Machine Learning (ICML) 2024

  11. arXiv:2310.11142  [pdf, other

    cs.CV cs.LG

    BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

    Authors: Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, Zhijie Deng

    Abstract: Diffusion models have impressive image generation capability, but low-quality generations still exist, and their identification remains challenging due to the lack of a proper sample-wise metric. To address this, we propose BayesDiff, a pixel-wise uncertainty estimator for generations from diffusion models based on Bayesian inference. In particular, we derive a novel uncertainty iteration principl… ▽ More

    Submitted 4 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  12. arXiv:2309.03729  [pdf, other

    cs.CV

    Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption

    Authors: Teng Hu, Jiangning Zhang, Liang Liu, Ran Yi, Siqi Kou, Haokun Zhu, Xu Chen, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: Training a generative model with limited number of samples is a challenging task. Current methods primarily rely on few-shot model adaption to train the network. However, in scenarios where data is extremely limited (less than 10), the generative network tends to overfit and suffers from content degradation. To address these problems, we propose a novel phasic content fusing few-shot diffusion mod… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  13. arXiv:2211.11581  [pdf, other

    cs.CY eess.SY

    Modeling 100% Electrified Transportation in NYC

    Authors: Jingrong Zhang, Amber Jiang, Brian Newborn, Sara Kou, Robert Mieth

    Abstract: Envisioning a future 100% electrified transportation sector, this paper uses socio-economic, demographic, and geographic data to assess electric energy demand from commuter traffic. We explore the individual mode choices, which allows to create mode-mix scenarios for the entire population, and quantify the electric energy demand for each scenario using technical specifications of battery and elect… ▽ More

    Submitted 17 February, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted for publication at the 2023 IEEE PES General Meeting

  14. arXiv:2206.04958  [pdf, other

    cs.CV

    Self-Supervised Deep Subspace Clustering with Entropy-norm

    Authors: Guangyi Zhao, Simin Kou, Xuesong Yin

    Abstract: Auto-Encoder based deep subspace clustering (DSC) is widely used in computer vision, motion segmentation and image processing. However, it suffers from the following three issues in the self-expressive matrix learning process: the first one is less useful information for learning self-expressive weights due to the simple reconstruction loss; the second one is that the construction of the self-expr… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

  15. arXiv:2103.08538  [pdf

    cs.CY cs.CC

    UrbanVCA: a vector-based cellular automata framework to simulate the urban land-use change at the land-parcel level

    Authors: Yao Yao, Linlong Li, Zhaotang Liang, Tao Cheng, Zhenhui Sun, Peng Luo, Qingfeng Guan, Yaqian Zhai, Shihao Kou, Yuyang Cai, Lefei Li, Xinyue Ye

    Abstract: Vector-based cellular automata (CA) based on real land-parcel has become an important trend in current urban development simulation studies. Compared with raster-based and parcel-based CA models, vector CA models are difficult to be widely used because of their complex data structures and technical difficulties. The UrbanVCA, a brand-new vector CA-based urban development simulation framework was p… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

    Comments: 27 pages, 7 figures, 6 tables

  16. arXiv:1705.06123  [pdf

    cs.IR

    JCTC: A Large Job posting Corpus for Text Classification

    Authors: Haoyu Xu, Chongyang Gu, Han Zhou, Sengpan Kou, Junjie Zhang

    Abstract: The absence of an appropriate text classification corpus makes the massive amount of online job information unusable for labor market analysis. This paper presents JCTC, a large job posting corpus for text classification. In JCTC construction framework, a formal specification issued by the Chinese central government is chosen as the classification standard. The unsupervised learning (WE-cos), supe… ▽ More

    Submitted 11 June, 2017; v1 submitted 17 May, 2017; originally announced May 2017.

    Comments: 15 pages, 3 figures

  17. Kernelization and Parameterized Algorithms for 3-Path Vertex Cover

    Authors: Mingyu Xiao, Shaowei Kou

    Abstract: A 3-path vertex cover in a graph is a vertex subset $C$ such that every path of three vertices contains at least one vertex from $C$. The parameterized 3-path vertex cover problem asks whether a graph has a 3-path vertex cover of size at most $k$. In this paper, we give a kernel of $5k$ vertices and an $O^*(1.7485^k)$-time and polynomial-space algorithm for this problem, both new results improve p… ▽ More

    Submitted 25 August, 2016; originally announced August 2016.

    Comments: in TAMC 2016, LNCS 9796, 2016

    Journal ref: TAMC 2017, LNCS 10185, 654-668

  18. arXiv:1505.00864  [pdf, other

    stat.AP cs.SI stat.ML

    Accurate estimation of influenza epidemics using Google search data via ARGO

    Authors: Shihao Yang, Mauricio Santillana, S. C. Kou

    Abstract: Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based trac… ▽ More

    Submitted 16 November, 2015; v1 submitted 4 May, 2015; originally announced May 2015.

    Comments: 23 pages, 2 figures, Proceedings of the National Academy of Sciences (2015)