Search | arXiv e-print repository

LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks

Authors: Yi Yang, Jiaxuan Sun, Siqi Kou, Yihan Wang, Zhijie Deng

Abstract: Real-world embodied agents face long-horizon tasks, characterized by high-level goals demanding multi-step solutions beyond single actions. Successfully navigating these requires both high-level task planning (i.e., decomposing goals into sub-tasks) and low-level motion control (i.e., generating precise robot actions). While existing vision language action (VLA) models and hierarchical architectur… ▽ More Real-world embodied agents face long-horizon tasks, characterized by high-level goals demanding multi-step solutions beyond single actions. Successfully navigating these requires both high-level task planning (i.e., decomposing goals into sub-tasks) and low-level motion control (i.e., generating precise robot actions). While existing vision language action (VLA) models and hierarchical architectures offer potential in embodied tasks, the former often falter in planning, and the latter can suffer from coordination issues, both hampering performance. We introduce a new unified VLA framework for long-horizon tasks, dubbed LoHoVLA, to overcome these limitations. LoHoVLA leverages a large pretrained vision language model (VLM) as the backbone to jointly generate language and action tokens for sub-task generation and robot action prediction, respectively. This shared representation promotes better generalization across tasks. Additionally, LoHoVLA embraces a hierarchical closed-loop control mechanism to mitigate errors originating from both high-level planning and low-level control. To train LoHoVLA, we introduce LoHoSet, a dataset built on the Ravens simulator, containing 20 long-horizon tasks, each with 1,000 expert demonstrations composed of visual observations, linguistic goals, sub-tasks, and robot actions. Experimental results show that LoHoVLA significantly surpasses both hierarchical and standard VLA approaches on long-horizon embodied tasks in the Ravens simulator. These findings underscore the promise of unified architectures for advancing generalizable embodied intelligence. △ Less

Submitted 31 May, 2025; originally announced June 2025.

arXiv:2505.22525 [pdf, ps, other]

Thinking with Generated Images

Authors: Ethan Chern, Zhulin Hu, Steffi Chern, Siqi Kou, Jiadi Su, Yan Ma, Zhijie Deng, Pengfei Liu

Abstract: We present Thinking with Generated Images, a novel paradigm that fundamentally transforms how large multimodal models (LMMs) engage with visual reasoning by enabling them to natively think across text and vision modalities through spontaneous generation of intermediate visual thinking steps. Current visual reasoning with LMMs is constrained to either processing fixed user-provided images or reason… ▽ More We present Thinking with Generated Images, a novel paradigm that fundamentally transforms how large multimodal models (LMMs) engage with visual reasoning by enabling them to natively think across text and vision modalities through spontaneous generation of intermediate visual thinking steps. Current visual reasoning with LMMs is constrained to either processing fixed user-provided images or reasoning solely through text-based chain-of-thought (CoT). Thinking with Generated Images unlocks a new dimension of cognitive capability where models can actively construct intermediate visual thoughts, critique their own visual hypotheses, and refine them as integral components of their reasoning process. We demonstrate the effectiveness of our approach through two complementary mechanisms: (1) vision generation with intermediate visual subgoals, where models decompose complex visual tasks into manageable components that are generated and integrated progressively, and (2) vision generation with self-critique, where models generate an initial visual hypothesis, analyze its shortcomings through textual reasoning, and produce refined outputs based on their own critiques. Our experiments on vision generation benchmarks show substantial improvements over baseline approaches, with our models achieving up to 50% (from 38% to 57%) relative improvement in handling complex multi-object scenarios. From biochemists exploring novel protein structures, and architects iterating on spatial designs, to forensic analysts reconstructing crime scenes, and basketball players envisioning strategic plays, our approach enables AI models to engage in the kind of visual imagination and iterative refinement that characterizes human creative, analytical, and strategic thinking. We release our open-source suite at https://github.com/GAIR-NLP/thinking-with-generated-images. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.21723 [pdf, ps, other]

Are Statistical Methods Obsolete in the Era of Deep Learning?

Authors: Skyler Wu, Shihao Yang, S. C. Kou

Abstract: In the era of AI, neural networks have become increasingly popular for modeling, inference, and prediction, largely due to their potential for universal approximation. With the proliferation of such deep learning models, a question arises: are leaner statistical methods still relevant? To shed insight on this question, we employ the mechanistic nonlinear ordinary differential equation (ODE) invers… ▽ More In the era of AI, neural networks have become increasingly popular for modeling, inference, and prediction, largely due to their potential for universal approximation. With the proliferation of such deep learning models, a question arises: are leaner statistical methods still relevant? To shed insight on this question, we employ the mechanistic nonlinear ordinary differential equation (ODE) inverse problem as a testbed, using physics-informed neural network (PINN) as a representative of the deep learning paradigm and manifold-constrained Gaussian process inference (MAGI) as a representative of statistically principled methods. Through case studies involving the SEIR model from epidemiology and the Lorenz model from chaotic dynamics, we demonstrate that statistical methods are far from obsolete, especially when working with sparse and noisy observations. On tasks such as parameter inference and trajectory reconstruction, statistically principled methods consistently achieve lower bias and variance, while using far fewer parameters and requiring less hyperparameter tuning. Statistical methods can also decisively outperform deep learning models on out-of-sample future prediction, where the absence of relevant data often leads overparameterized models astray. Additionally, we find that statistically principled approaches are more robust to accumulation of numerical imprecision and can represent the underlying system more faithful to the true governing ODEs. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: 35 pages, 11 figures (main text)

arXiv:2505.19949 [pdf, ps, other]

Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

Authors: Siqi Kou, Qingyuan Tian, Hanwen Xu, Zihao Zeng, Zhijie Deng

Abstract: Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding, often bolstered by post-training on the chain-of-thoughts (CoTs) generated by stronger models. However, existing strategies for curating such training data predominantly rely on heuristics, limiting generalizability and failing to capture subtleties underlying in data. To address these limitations,… ▽ More Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding, often bolstered by post-training on the chain-of-thoughts (CoTs) generated by stronger models. However, existing strategies for curating such training data predominantly rely on heuristics, limiting generalizability and failing to capture subtleties underlying in data. To address these limitations, we leverage influence functions to systematically attribute LLMs' reasoning ability on math and coding to individual training examples, sequences, and tokens, enabling deeper insights into effective data characteristics. Our Influence-based Reasoning Attribution (Infra) uncovers nontrivial cross-domain effects across math and coding tasks: high-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning. Based on these findings, we introduce a simple yet effective dataset reweighting strategy by flipping task difficulty, which doubles AIME24 accuracy from 10\% to 20\% and boosts LiveCodeBench accuracy from 33.8\% to 35.3\% for Qwen2.5-7B-Instruct. Moreover, our fine-grained attribution reveals that the sequence-level exploratory behaviors enhance reasoning performance in both math and code, and the token-level influence patterns are distinct for math and code reasoning: the former prefers natural language logic connectors and the latter emphasizes structural syntax. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2502.06097 [pdf, other]

doi 10.1145/3701716.3715251

NLGR: Utilizing Neighbor Lists for Generative Rerank in Personalized Recommendation Systems

Authors: Shuli Wang, Xue Wei, Senjie Kou, Chi Wang, Wenshuai Chen, Qi Tang, Yinhua Zhu, Xiong Xiao, Xingxing Wang

Abstract: Reranking plays a crucial role in modern multi-stage recommender systems by rearranging the initial ranking list. Due to the inherent challenges of combinatorial search spaces, some current research adopts an evaluator-generator paradigm, with a generator generating feasible sequences and an evaluator selecting the best sequence based on the estimated list utility. However, these methods still fac… ▽ More Reranking plays a crucial role in modern multi-stage recommender systems by rearranging the initial ranking list. Due to the inherent challenges of combinatorial search spaces, some current research adopts an evaluator-generator paradigm, with a generator generating feasible sequences and an evaluator selecting the best sequence based on the estimated list utility. However, these methods still face two issues. Firstly, due to the goal inconsistency problem between the evaluator and generator, the generator tends to fit the local optimal solution of exposure distribution rather than combinatorial space optimization. Secondly, the strategy of generating target items one by one is difficult to achieve optimality because it ignores the information of subsequent items. To address these issues, we propose a utilizing Neighbor Lists model for Generative Reranking (NLGR), which aims to improve the performance of the generator in the combinatorial space. NLGR follows the evaluator-generator paradigm and improves the generator's training and generating methods. Specifically, we use neighbor lists in combination space to enhance the training process, making the generator perceive the relative scores and find the optimization direction. Furthermore, we propose a novel sampling-based non-autoregressive generation method, which allows the generator to jump flexibly from the current list to any neighbor list. Extensive experiments on public and industrial datasets validate NLGR's effectiveness and we have successfully deployed NLGR on the Meituan food delivery platform. △ Less

Submitted 11 February, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

Comments: Accepted by WWW 2025 Industry Track

arXiv:2412.00127 [pdf, other]

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

Authors: Siqi Kou, Jiachun Jin, Zhihong Liu, Chang Liu, Ye Ma, Jian Jia, Quan Chen, Peng Jiang, Zhijie Deng

Abstract: We introduce Orthus, an autoregressive (AR) transformer that excels in generating images given textual prompts, answering questions based on visual inputs, and even crafting lengthy image-text interleaved contents. Unlike prior arts on unified multimodal modeling, Orthus simultaneously copes with discrete text tokens and continuous image features under the AR modeling principle. The continuous tre… ▽ More We introduce Orthus, an autoregressive (AR) transformer that excels in generating images given textual prompts, answering questions based on visual inputs, and even crafting lengthy image-text interleaved contents. Unlike prior arts on unified multimodal modeling, Orthus simultaneously copes with discrete text tokens and continuous image features under the AR modeling principle. The continuous treatment of visual signals minimizes the information loss for both image understanding and generation while the fully AR formulation renders the characterization of the correlation between modalities straightforward. The key mechanism enabling Orthus to leverage these advantages lies in its modality-specific heads -- one regular language modeling (LM) head predicts discrete text tokens and one diffusion head generates continuous image features conditioning on the output of the backbone. We devise an efficient strategy for building Orthus -- by substituting the Vector Quantization (VQ) operation in the existing unified AR model with a soft alternative, introducing a diffusion head, and tuning the added modules to reconstruct images, we can create an Orthus-base model effortlessly (e.g., within mere 72 A100 GPU hours). Orthus-base can further embrace post-training to better model interleaved images and texts. Empirically, Orthus surpasses competing baselines including Show-o and Chameleon across standard benchmarks, achieving a GenEval score of 0.58 and an MME-P score of 1265.8 using 7B parameters. Orthus also shows exceptional mixed-modality generation capabilities, reflecting the potential for handling intricate practical generation tasks. △ Less

Submitted 16 April, 2025; v1 submitted 28 November, 2024; originally announced December 2024.

arXiv:2411.08488 [pdf]

UNSCT-HRNet: Modeling Anatomical Uncertainty for Landmark Detection in Total Hip Arthroplasty

Authors: Jiaxin Wan, Lin Liu, Haoran Wang, Liangwei Li, Wei Li, Shuheng Kou, Runtian Li, Jiayi Tang, Juanxiu Liu, Jing Zhang, Xiaohui Du, Ruqian Hao

Abstract: Total hip arthroplasty (THA) relies on accurate landmark detection from radiographic images, but unstructured data caused by irregular patient postures or occluded anatomical markers pose significant challenges for existing methods. To address this, we propose UNSCT-HRNet (Unstructured CT - High-Resolution Net), a deep learning-based framework that integrates a Spatial Relationship Fusion (SRF) mo… ▽ More Total hip arthroplasty (THA) relies on accurate landmark detection from radiographic images, but unstructured data caused by irregular patient postures or occluded anatomical markers pose significant challenges for existing methods. To address this, we propose UNSCT-HRNet (Unstructured CT - High-Resolution Net), a deep learning-based framework that integrates a Spatial Relationship Fusion (SRF) module and an Uncertainty Estimation (UE) module. The SRF module, utilizing coordinate convolution and polarized attention, enhances the model's ability to capture complex spatial relationships. Meanwhile, the UE module which based on entropy ensures predictions are anatomically relevant. For unstructured data, the proposed method can predict landmarks without relying on the fixed number of points, which shows higher accuracy and better robustness comparing with the existing methods. Our UNSCT-HRNet demonstrates over a 60% improvement across multiple metrics in unstructured data. The experimental results also reveal that our approach maintains good performance on the structured dataset. Overall, the proposed UNSCT-HRNet has the potential to be used as a new reliable, automated solution for THA surgical planning and postoperative monitoring. △ Less

Submitted 13 November, 2024; originally announced November 2024.

arXiv:2410.14731 [pdf, other]

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

Authors: Bokai Lin, Zihao Zeng, Zipeng Xiao, Siqi Kou, Tianqi Hou, Xiaofeng Gao, Hao Zhang, Zhijie Deng

Abstract: KV cache has become a de facto technique for the inference of large language models (LLMs), where tensors of shape (layer number, head number, sequence length, feature dimension) are introduced to cache historical information for self-attention. As the size of the model and data grows, the KV cache can quickly become a bottleneck within the system in both storage and memory transfer. To address th… ▽ More KV cache has become a de facto technique for the inference of large language models (LLMs), where tensors of shape (layer number, head number, sequence length, feature dimension) are introduced to cache historical information for self-attention. As the size of the model and data grows, the KV cache can quickly become a bottleneck within the system in both storage and memory transfer. To address this, prior studies usually focus on the first three axes of the cache tensors for compression. This paper supplements them, focusing on the feature dimension axis, by utilizing low-rank projection matrices to transform the cache features into spaces with reduced dimensions. We begin by investigating the canonical orthogonal projection method for data compression through principal component analysis (PCA). We observe the issue with PCA projection where significant performance degradation is observed at low compression rates. To bridge the gap, we propose to directly tune the orthogonal projection matrices with a distillation objective using an elaborate Matryoshka training strategy. After training, we adaptively search for the optimal compression rates for various layers and heads given varying compression budgets. Compared to previous works, our method can easily embrace pre-trained LLMs and hold a smooth tradeoff between performance and compression rate. We empirically witness the high data efficiency of our training procedure and find that our method can sustain over 90% performance with an average KV cache compression rate of 60% (and up to 75% in certain extreme scenarios) for popular LLMs like LLaMA2-7B-base and Mistral-7B-v0.3-base. △ Less

Submitted 16 May, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

arXiv:2404.12305 [pdf, other]

SAFLA: Semantic-aware Full Lifecycle Assurance Designed for Intent-Driven Networks

Authors: Shiwen Kou, Chungang Yang, Mingji Wu

Abstract: Intent-driven Networks (IDNs) are crucial in enhancing network management efficiency by enabling the translation of high-level intents into executable configurations via a top-down approach. The escalating complexity of network architectures, however, has led to a semantic gap between these intents and their actual configurations, posing significant challenges to the accuracy and reliability of ID… ▽ More Intent-driven Networks (IDNs) are crucial in enhancing network management efficiency by enabling the translation of high-level intents into executable configurations via a top-down approach. The escalating complexity of network architectures, however, has led to a semantic gap between these intents and their actual configurations, posing significant challenges to the accuracy and reliability of IDNs. While existing methodologies attempt to address this gap through a bottom-up analysis of network metadata, they often fall short, focusing primarily on intent extraction or reasoning without fully leveraging insights to tackle the inherent challenges of IDNs. To mitigate this, we introduce SAFLA, a semantic-aware framework specifically designed to assure the full lifecycle of intents within IDNs. By seamlessly integrating top-down and bottom-up approaches, SAFLA not only provides comprehensive intent assurance but also effectively bridges the semantic gap. This integration facilitates a self-healing mechanism, substantially reducing the need for manual intervention even in dynamically changing network environments. Experimental results demonstrate the framework's feasibility and efficiency, confirming its capacity to quickly adapt intents in response to network changes, thus marking an important advancement in the field of IDNs. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 11 pages, 10 figures, 3 tables

arXiv:2403.00835 [pdf, other]

CLLMs: Consistency Large Language Models

Authors: Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang

Abstract: Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation. However, in practice, it achieves little speedup compared to traditional autoregressive (AR) decoding, primarily because Jacobi decoding seldom accurately predicts more than one token in a s… ▽ More Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation. However, in practice, it achieves little speedup compared to traditional autoregressive (AR) decoding, primarily because Jacobi decoding seldom accurately predicts more than one token in a single fixed-point iteration step. To address this, we develop a new approach aimed at realizing fast convergence from any state to the fixed point on a Jacobi trajectory. This is accomplished by refining the target LLM to consistently predict the fixed point given any state as input. Extensive experiments demonstrate the effectiveness of our method, showing 2.4$\times$ to 3.4$\times$ improvements in generation speed while preserving generation quality across both domain-specific and open-domain benchmarks. △ Less

Submitted 13 June, 2024; v1 submitted 28 February, 2024; originally announced March 2024.

Comments: In the proceedings of the 41st International Conference on Machine Learning (ICML) 2024

arXiv:2310.11142 [pdf, other]

BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

Authors: Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, Zhijie Deng

Abstract: Diffusion models have impressive image generation capability, but low-quality generations still exist, and their identification remains challenging due to the lack of a proper sample-wise metric. To address this, we propose BayesDiff, a pixel-wise uncertainty estimator for generations from diffusion models based on Bayesian inference. In particular, we derive a novel uncertainty iteration principl… ▽ More Diffusion models have impressive image generation capability, but low-quality generations still exist, and their identification remains challenging due to the lack of a proper sample-wise metric. To address this, we propose BayesDiff, a pixel-wise uncertainty estimator for generations from diffusion models based on Bayesian inference. In particular, we derive a novel uncertainty iteration principle to characterize the uncertainty dynamics in diffusion, and leverage the last-layer Laplace approximation for efficient Bayesian inference. The estimated pixel-wise uncertainty can not only be aggregated into a sample-wise metric to filter out low-fidelity images but also aids in augmenting successful generations and rectifying artifacts in failed generations in text-to-image tasks. Extensive experiments demonstrate the efficacy of BayesDiff and its promise for practical applications. △ Less

Submitted 4 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: ICLR 2024

arXiv:2309.03729 [pdf, other]

Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption

Authors: Teng Hu, Jiangning Zhang, Liang Liu, Ran Yi, Siqi Kou, Haokun Zhu, Xu Chen, Yabiao Wang, Chengjie Wang, Lizhuang Ma

Abstract: Training a generative model with limited number of samples is a challenging task. Current methods primarily rely on few-shot model adaption to train the network. However, in scenarios where data is extremely limited (less than 10), the generative network tends to overfit and suffers from content degradation. To address these problems, we propose a novel phasic content fusing few-shot diffusion mod… ▽ More Training a generative model with limited number of samples is a challenging task. Current methods primarily rely on few-shot model adaption to train the network. However, in scenarios where data is extremely limited (less than 10), the generative network tends to overfit and suffers from content degradation. To address these problems, we propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss, which targets different learning objectives at distinct training stages of the diffusion model. Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large, and learn local details of target domain when t is small, leading to an improvement in the capture of content, style and local details. Furthermore, we introduce a novel directional distribution consistency loss that ensures the consistency between the generated and source distributions more efficiently and stably than the prior methods, preventing our model from overfitting. Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation. Theoretical analysis, qualitative and quantitative experiments demonstrate the superiority of our approach in few-shot generative model adaption tasks compared to state-of-the-art methods. The source code is available at: https://github.com/sjtuplayer/few-shot-diffusion. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV 2023

arXiv:2211.11581 [pdf, other]

Modeling 100% Electrified Transportation in NYC

Authors: Jingrong Zhang, Amber Jiang, Brian Newborn, Sara Kou, Robert Mieth

Abstract: Envisioning a future 100% electrified transportation sector, this paper uses socio-economic, demographic, and geographic data to assess electric energy demand from commuter traffic. We explore the individual mode choices, which allows to create mode-mix scenarios for the entire population, and quantify the electric energy demand for each scenario using technical specifications of battery and elect… ▽ More Envisioning a future 100% electrified transportation sector, this paper uses socio-economic, demographic, and geographic data to assess electric energy demand from commuter traffic. We explore the individual mode choices, which allows to create mode-mix scenarios for the entire population, and quantify the electric energy demand for each scenario using technical specifications of battery and electric drives technology in combination with different charging scenarios. Using data sets for New York City, our results highlight the need for infrastructure investments, the usefulness of flexible charging policies, and the positive impact of incentivizing micromobility and mass-transit options. Our model and results are publicly available as interactive dashboard. △ Less

Submitted 17 February, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Accepted for publication at the 2023 IEEE PES General Meeting

arXiv:2206.04958 [pdf, other]

Self-Supervised Deep Subspace Clustering with Entropy-norm

Authors: Guangyi Zhao, Simin Kou, Xuesong Yin

Abstract: Auto-Encoder based deep subspace clustering (DSC) is widely used in computer vision, motion segmentation and image processing. However, it suffers from the following three issues in the self-expressive matrix learning process: the first one is less useful information for learning self-expressive weights due to the simple reconstruction loss; the second one is that the construction of the self-expr… ▽ More Auto-Encoder based deep subspace clustering (DSC) is widely used in computer vision, motion segmentation and image processing. However, it suffers from the following three issues in the self-expressive matrix learning process: the first one is less useful information for learning self-expressive weights due to the simple reconstruction loss; the second one is that the construction of the self-expression layer associated with the sample size requires high-computational cost; and the last one is the limited connectivity of the existing regularization terms. In order to address these issues, in this paper we propose a novel model named Self-Supervised deep Subspace Clustering with Entropy-norm (S$^{3}$CE). Specifically, S$^{3}$CE exploits a self-supervised contrastive network to gain a more effetive feature vector. The local structure and dense connectivity of the original data benefit from the self-expressive layer and additional entropy-norm constraint. Moreover, a new module with data enhancement is designed to help S$^{3}$CE focus on the key information of data, and improve the clustering performance of positive and negative instances through spectral clustering. Extensive experimental results demonstrate the superior performance of S$^{3}$CE in comparison to the state-of-the-art approaches. △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2103.08538 [pdf]

UrbanVCA: a vector-based cellular automata framework to simulate the urban land-use change at the land-parcel level

Authors: Yao Yao, Linlong Li, Zhaotang Liang, Tao Cheng, Zhenhui Sun, Peng Luo, Qingfeng Guan, Yaqian Zhai, Shihao Kou, Yuyang Cai, Lefei Li, Xinyue Ye

Abstract: Vector-based cellular automata (CA) based on real land-parcel has become an important trend in current urban development simulation studies. Compared with raster-based and parcel-based CA models, vector CA models are difficult to be widely used because of their complex data structures and technical difficulties. The UrbanVCA, a brand-new vector CA-based urban development simulation framework was p… ▽ More Vector-based cellular automata (CA) based on real land-parcel has become an important trend in current urban development simulation studies. Compared with raster-based and parcel-based CA models, vector CA models are difficult to be widely used because of their complex data structures and technical difficulties. The UrbanVCA, a brand-new vector CA-based urban development simulation framework was proposed in this study, which supports multiple machine-learning models. To measure the simulation accuracy better, this study also first proposes a vector-based landscape index (VecLI) model based on the real land-parcels. Using Shunde, Guangdong as the study area, the UrbanVCA simulates multiple types of urban land-use changes at the land-parcel level have achieved a high accuracy (FoM=0.243) and the landscape index similarity reaches 87.3%. The simulation results in 2030 show that the eco-protection scenario can promote urban agglomeration and reduce ecological aggression and loss of arable land by at least 60%. Besides, we have developed and released UrbanVCA software for urban planners and researchers. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: 27 pages, 7 figures, 6 tables

arXiv:1705.06123 [pdf]

JCTC: A Large Job posting Corpus for Text Classification

Authors: Haoyu Xu, Chongyang Gu, Han Zhou, Sengpan Kou, Junjie Zhang

Abstract: The absence of an appropriate text classification corpus makes the massive amount of online job information unusable for labor market analysis. This paper presents JCTC, a large job posting corpus for text classification. In JCTC construction framework, a formal specification issued by the Chinese central government is chosen as the classification standard. The unsupervised learning (WE-cos), supe… ▽ More The absence of an appropriate text classification corpus makes the massive amount of online job information unusable for labor market analysis. This paper presents JCTC, a large job posting corpus for text classification. In JCTC construction framework, a formal specification issued by the Chinese central government is chosen as the classification standard. The unsupervised learning (WE-cos), supervised learning algorithm (SVM) and human judgements are all used in the construction process. JCTC has 102581 online job postings distributed in 465 categories. The method proposed here can not only ameliorate the high demands on people's skill and knowledge, but reduce the subjective influences as well. Besides, the method is not limited in Chinese. We benchmark five state-of-the-art deep learning approaches on JCTC providing baseline results for future studies. JCTC might be the first job posting corpus for text classification and the largest one in Chinese. With the help of JCTC, related organizations are able to monitor, analyze and predict the labor market in a comprehensive, accurate and timely manner. △ Less

Submitted 11 June, 2017; v1 submitted 17 May, 2017; originally announced May 2017.

Comments: 15 pages, 3 figures

arXiv:1608.07022 [pdf, ps, other]

doi 10.1007/978-3-319-55911-7_47

Kernelization and Parameterized Algorithms for 3-Path Vertex Cover

Authors: Mingyu Xiao, Shaowei Kou

Abstract: A 3-path vertex cover in a graph is a vertex subset $C$ such that every path of three vertices contains at least one vertex from $C$. The parameterized 3-path vertex cover problem asks whether a graph has a 3-path vertex cover of size at most $k$. In this paper, we give a kernel of $5k$ vertices and an $O^*(1.7485^k)$-time and polynomial-space algorithm for this problem, both new results improve p… ▽ More A 3-path vertex cover in a graph is a vertex subset $C$ such that every path of three vertices contains at least one vertex from $C$. The parameterized 3-path vertex cover problem asks whether a graph has a 3-path vertex cover of size at most $k$. In this paper, we give a kernel of $5k$ vertices and an $O^*(1.7485^k)$-time and polynomial-space algorithm for this problem, both new results improve previous known bounds. △ Less

Submitted 25 August, 2016; originally announced August 2016.

Comments: in TAMC 2016, LNCS 9796, 2016

Journal ref: TAMC 2017, LNCS 10185, 654-668

arXiv:1505.00864 [pdf, other]

doi 10.1073/pnas.1515373112

Accurate estimation of influenza epidemics using Google search data via ARGO

Authors: Shihao Yang, Mauricio Santillana, S. C. Kou

Abstract: Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based trac… ▽ More Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in people's online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions. △ Less

Submitted 16 November, 2015; v1 submitted 4 May, 2015; originally announced May 2015.

Comments: 23 pages, 2 figures, Proceedings of the National Academy of Sciences (2015)

Showing 1–18 of 18 results for author: Kou, S