Search | arXiv e-print repository

Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models

Authors: Jiaxiang Liu, Boxuan Xing, Chenhao Yuan, Chenxiang Zhang, Di Wu, Xiusheng Huang, Haida Yu, Chuhan Lang, Pengfei Cao, Jun Zhao, Kang Liu

Abstract: As large language models (LLMs) continue to advance, there is a growing urgency to enhance the interpretability of their internal knowledge mechanisms. Consequently, many interpretation methods have emerged, aiming to unravel the knowledge mechanisms of LLMs from various perspectives. However, current interpretation methods differ in input data formats and interpreting outputs. The tools integrati… ▽ More As large language models (LLMs) continue to advance, there is a growing urgency to enhance the interpretability of their internal knowledge mechanisms. Consequently, many interpretation methods have emerged, aiming to unravel the knowledge mechanisms of LLMs from various perspectives. However, current interpretation methods differ in input data formats and interpreting outputs. The tools integrating these methods are only capable of supporting tasks with specific inputs, significantly constraining their practical applications. To address these challenges, we present an open-source Knowledge Mechanisms Revealer&Interpreter (Know-MRI) designed to analyze the knowledge mechanisms within LLMs systematically. Specifically, we have developed an extensible core module that can automatically match different input data with interpretation methods and consolidate the interpreting outputs. It enables users to freely choose appropriate interpretation methods based on the inputs, making it easier to comprehensively diagnose the model's internal knowledge mechanisms from multiple perspectives. Our code is available at https://github.com/nlpkeg/Know-MRI. We also provide a demonstration video on https://youtu.be/NVWZABJ43Bs. △ Less

Submitted 10 June, 2025; originally announced June 2025.

arXiv:2503.06683 [pdf, other]

Dynamic Dictionary Learning for Remote Sensing Image Segmentation

Authors: Xuechao Zou, Yue Li, Shun Zhang, Kai Li, Shiying Wang, Pin Tao, Junliang Xing, Congyan Lang

Abstract: Remote sensing image segmentation faces persistent challenges in distinguishing morphologically similar categories and adapting to diverse scene variations. While existing methods rely on implicit representation learning paradigms, they often fail to dynamically adjust semantic embeddings according to contextual cues, leading to suboptimal performance in fine-grained scenarios such as cloud thickn… ▽ More Remote sensing image segmentation faces persistent challenges in distinguishing morphologically similar categories and adapting to diverse scene variations. While existing methods rely on implicit representation learning paradigms, they often fail to dynamically adjust semantic embeddings according to contextual cues, leading to suboptimal performance in fine-grained scenarios such as cloud thickness differentiation. This work introduces a dynamic dictionary learning framework that explicitly models class ID embeddings through iterative refinement. The core contribution lies in a novel dictionary construction mechanism, where class-aware semantic embeddings are progressively updated via multi-stage alternating cross-attention querying between image features and dictionary embeddings. This process enables adaptive representation learning tailored to input-specific characteristics, effectively resolving ambiguities in intra-class heterogeneity and inter-class homogeneity. To further enhance discriminability, a contrastive constraint is applied to the dictionary space, ensuring compact intra-class distributions while maximizing inter-class separability. Extensive experiments across both coarse- and fine-grained datasets demonstrate consistent improvements over state-of-the-art methods, particularly in two online test benchmarks (LoveDA and UAVid). Code is available at https://anonymous.4open.science/r/D2LS-8267/. △ Less

Submitted 9 March, 2025; originally announced March 2025.

arXiv:2412.06664 [pdf, other]

Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation

Authors: Shun Zhang, Xuechao Zou, Kai Li, Congyan Lang, Shiying Wang, Pin Tao, Tengfei Cao

Abstract: Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTMs) pre-trained on large-scale datasets have demonstrated strong zero-shot generalization. However, directly applying them to specific tasks may lead to domain shift. We introduce a novel end-to-end learning paradigm combining know… ▽ More Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTMs) pre-trained on large-scale datasets have demonstrated strong zero-shot generalization. However, directly applying them to specific tasks may lead to domain shift. We introduce a novel end-to-end learning paradigm combining knowledge guidance with domain refinement to enhance performance. We present two key components: the Feature Alignment Module (FAM) and the Feature Modulation Module (FMM). FAM aligns features from a CNN-based backbone with those from the pretrained VTM's encoder using channel transformation and spatial interpolation, and transfers knowledge via KL divergence and L2 normalization constraint. FMM further adapts the knowledge to the specific domain to address domain shift. We also introduce a fine-grained grass segmentation dataset and demonstrate, through experiments on two datasets, that our method achieves a significant improvement of 2.57 mIoU on the grass dataset and 3.73 mIoU on the cloud dataset. The results highlight the potential of combining knowledge transfer and domain adaptation to overcome domain-related challenges and data limitations. The project page is available at https://xavierjiezou.github.io/KTDA/. △ Less

Submitted 14 January, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

Comments: 6 pages, 3 figures, 6 tables

arXiv:2411.15799 [pdf, other]

doi 10.1007/s10489-024-05849-5

Symmetric Perception and Ordinal Regression for Detecting Scoliosis Natural Image

Authors: Xiaojia Zhu, Rui Chen, Xiaoqi Guo, Zhiwen Shao, Yuhu Dai, Ming Zhang, Chuandong Lang

Abstract: Scoliosis is one of the most common diseases in adolescents. Traditional screening methods for the scoliosis usually use radiographic examination, which requires certified experts with medical instruments and brings the radiation risk. Considering such requirement and inconvenience, we propose to use natural images of the human back for wide-range scoliosis screening, which is a challenging proble… ▽ More Scoliosis is one of the most common diseases in adolescents. Traditional screening methods for the scoliosis usually use radiographic examination, which requires certified experts with medical instruments and brings the radiation risk. Considering such requirement and inconvenience, we propose to use natural images of the human back for wide-range scoliosis screening, which is a challenging problem. In this paper, we notice that the human back has a certain degree of symmetry, and asymmetrical human backs are usually caused by spinal lesions. Besides, scoliosis severity levels have ordinal relationships. Taking inspiration from this, we propose a dual-path scoliosis detection network with two main modules: symmetric feature matching module (SFMM) and ordinal regression head (ORH). Specifically, we first adopt a backbone to extract features from both the input image and its horizontally flipped image. Then, we feed the two extracted features into the SFMM to capture symmetric relationships. Finally, we use the ORH to transform the ordinal regression problem into a series of binary classification sub-problems. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods as well as human performance, which provides a promising and economic solution to wide-range scoliosis screening. In particular, our method achieves accuracies of 95.11% and 81.46% in estimation of general severity level and fine-grained severity level of the scoliosis, respectively. △ Less

Submitted 24 November, 2024; originally announced November 2024.

Comments: This paper has been accepted by Applied Intelligence

arXiv:2411.13127 [pdf, other]

Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images

Authors: Xuechao Zou, Shun Zhang, Kai Li, Shiying Wang, Junliang Xing, Lei Jin, Congyan Lang, Pin Tao

Abstract: Cloud segmentation is a critical challenge in remote sensing image interpretation, as its accuracy directly impacts the effectiveness of subsequent data processing and analysis. Recently, vision foundation models (VFM) have demonstrated powerful generalization capabilities across various visual tasks. In this paper, we present a parameter-efficient adaptive approach, termed Cloud-Adapter, designed… ▽ More Cloud segmentation is a critical challenge in remote sensing image interpretation, as its accuracy directly impacts the effectiveness of subsequent data processing and analysis. Recently, vision foundation models (VFM) have demonstrated powerful generalization capabilities across various visual tasks. In this paper, we present a parameter-efficient adaptive approach, termed Cloud-Adapter, designed to enhance the accuracy and robustness of cloud segmentation. Our method leverages a VFM pretrained on general domain data, which remains frozen, eliminating the need for additional training. Cloud-Adapter incorporates a lightweight spatial perception module that initially utilizes a convolutional neural network (ConvNet) to extract dense spatial representations. These multi-scale features are then aggregated and serve as contextual inputs to an adapting module, which modulates the frozen transformer layers within the VFM. Experimental results demonstrate that the Cloud-Adapter approach, utilizing only 0.6% of the trainable parameters of the frozen backbone, achieves substantial performance gains. Cloud-Adapter consistently achieves state-of-the-art performance across various cloud segmentation datasets from multiple satellite sources, sensor series, data processing levels, land cover scenarios, and annotation granularities. We have released the code and model checkpoints at https://xavierjiezou.github.io/Cloud-Adapter/ to support further research. △ Less

Submitted 23 November, 2024; v1 submitted 20 November, 2024; originally announced November 2024.

Comments: 13 pages, 9 figures

arXiv:2409.13749 [pdf, other]

KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models

Authors: Neel Rajani, Lilli Kiessling, Aleksandr Ogaltsov, Claus Lang

Abstract: Although powerful, current cutting-edge LLMs may not fulfil the needs of highly specialised sectors. We introduce KodeXv0.1, a family of large language models that outclass GPT-4 in financial question answering. We utilise the base variants of Llama 3.1 8B and 70B and adapt them to the financial domain through a custom training regime. To this end, we collect and process a large number of publicly… ▽ More Although powerful, current cutting-edge LLMs may not fulfil the needs of highly specialised sectors. We introduce KodeXv0.1, a family of large language models that outclass GPT-4 in financial question answering. We utilise the base variants of Llama 3.1 8B and 70B and adapt them to the financial domain through a custom training regime. To this end, we collect and process a large number of publicly available financial documents such as earnings calls and business reports. These are used to generate a high-quality, synthetic dataset consisting of Context-Question-Answer triplets which closely mirror real-world financial tasks. Using the train split of this dataset, we perform RAG-aware 4bit LoRA instruction tuning runs of Llama 3.1 base variants to produce KodeX-8Bv0.1 and KodeX-70Bv0.1. We then complete extensive model evaluations using FinanceBench, FinQABench and the withheld test split of our dataset. Our results show that KodeX-8Bv0.1 is more reliable in financial contexts than cutting-edge instruct models in the same parameter regime, surpassing them by up to 9.24%. In addition, it is even capable of outperforming state-of-the-art proprietary models such as GPT-4 by up to 7.07%. KodeX-70Bv0.1 represents a further improvement upon this, exceeding GPT-4's performance on every tested benchmark. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 11 pages, 8 figures

ACM Class: I.2.7

arXiv:2408.08583 [pdf, other]

GrassNet: State Space Model Meets Graph Neural Network

Authors: Gongpei Zhao, Tao Wang, Yi Jin, Congyan Lang, Yidong Li, Haibin Ling

Abstract: Designing spectral convolutional networks is a formidable task in graph learning. In traditional spectral graph neural networks (GNNs), polynomial-based methods are commonly used to design filters via the Laplacian matrix. In practical applications, however, these polynomial methods encounter inherent limitations, which primarily arise from the the low-order truncation of polynomial filters and th… ▽ More Designing spectral convolutional networks is a formidable task in graph learning. In traditional spectral graph neural networks (GNNs), polynomial-based methods are commonly used to design filters via the Laplacian matrix. In practical applications, however, these polynomial methods encounter inherent limitations, which primarily arise from the the low-order truncation of polynomial filters and the lack of overall modeling of the graph spectrum. This leads to poor performance of existing spectral approaches on real-world graph data, especially when the spectrum is highly concentrated or contains many numerically identical values, as they tend to apply the exact same modulation to signals with the same frequencies. To overcome these issues, in this paper, we propose Graph State Space Network (GrassNet), a novel graph neural network with theoretical support that provides a simple yet effective scheme for designing and learning arbitrary graph spectral filters. In particular, our GrassNet introduces structured state space models (SSMs) to model the correlations of graph signals at different frequencies and derives a unique rectification for each frequency in the graph spectrum. To the best of our knowledge, our work is the first to employ SSMs for the design of GNN spectral filters, and it theoretically offers greater expressive power compared with polynomial filters. Extensive experiments on nine public benchmarks reveal that GrassNet achieves superior performance in real-world graph modeling tasks. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2406.02040 [pdf, other]

DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment

Authors: Gongpei Zhao, Tao Wang, Congyan Lang, Yi Jin, Yidong Li, Haibin Ling

Abstract: Graph neural networks are recognized for their strong performance across various applications, with the backpropagation algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. Whil… ▽ More Graph neural networks are recognized for their strong performance across various applications, with the backpropagation algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. While several non-BP training algorithms, such as the direct feedback alignment, have been successfully applied to fully-connected and convolutional network components for handling Euclidean data, directly adapting these non-BP frameworks to manage non-Euclidean graph data in GNN models presents significant challenges. These challenges primarily arise from the violation of the i.i.d. assumption in graph data and the difficulty in accessing prediction errors for all samples (nodes) within the graph. To overcome these obstacles, in this paper we propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning. The proposed method breaks the limitations of BP by using a dedicated forward training mechanism. Specifically, DFA-GNN extends the principles of DFA to adapt to graph data and unique architecture of GNNs, which incorporates the information of graph topology into the feedback links to accommodate the non-Euclidean characteristics of graph data. Additionally, for semi-supervised graph learning tasks, we developed a pseudo error generator that spreads residual errors from training data to create a pseudo error for each unlabeled node. These pseudo errors are then utilized to train GNNs using DFA. Extensive experiments on 10 public benchmarks reveal that our learning framework outperforms not only previous non-BP methods but also the standard BP methods, and it exhibits excellent robustness against various types of noise and attacks. △ Less

Submitted 5 November, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2404.12798 [pdf, other]

A Point-Based Approach to Efficient LiDAR Multi-Task Perception

Authors: Christopher Lang, Alexander Braun, Lars Schillingmann, Abhinav Valada

Abstract: Multi-task networks can potentially improve performance and computational efficiency compared to single-task networks, facilitating online deployment. However, current multi-task architectures in point cloud perception combine multiple task-specific point cloud representations, each requiring a separate feature encoder and making the network structures bulky and slow. We propose PAttFormer, an eff… ▽ More Multi-task networks can potentially improve performance and computational efficiency compared to single-task networks, facilitating online deployment. However, current multi-task architectures in point cloud perception combine multiple task-specific point cloud representations, each requiring a separate feature encoder and making the network structures bulky and slow. We propose PAttFormer, an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds that only relies on a point-based representation. The network builds on transformer-based feature encoders using neighborhood attention and grid-pooling and a query-based detection decoder using a novel 3D deformable-attention detection head design. Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for multiple task-specific point cloud representations, resulting in a network that is 3x smaller and 1.4x faster while achieving competitive performance on the nuScenes and KITTI benchmarks for autonomous driving perception. Our extensive evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP on the nuScenes benchmark compared to the single-task models. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures, 8 tables

arXiv:2312.00812 [pdf, other]

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

Authors: Yixuan Wang, Ruochen Jiao, Sinong Simon Zhan, Chengtian Lang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu

Abstract: Autonomous Driving (AD) encounters significant safety hurdles in long-tail unforeseen driving scenarios, largely stemming from the non-interpretability and poor generalization of the deep neural networks within the AD system, particularly in out-of-distribution and uncertain data. To this end, this paper explores the integration of Large Language Models (LLMs) into AD systems, leveraging their rob… ▽ More Autonomous Driving (AD) encounters significant safety hurdles in long-tail unforeseen driving scenarios, largely stemming from the non-interpretability and poor generalization of the deep neural networks within the AD system, particularly in out-of-distribution and uncertain data. To this end, this paper explores the integration of Large Language Models (LLMs) into AD systems, leveraging their robust common-sense knowledge and reasoning abilities. The proposed methodologies employ LLMs as intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning, for enhancing driving performance and safety. We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine. Demonstrating superior performance and safety metrics compared to state-of-the-art approaches, our approach shows the promising potential for using LLMs for autonomous vehicles. △ Less

Submitted 22 March, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

Comments: Accepted to LLMAgent workshop @ICLR2024

arXiv:2304.13147 [pdf, other]

Self-Supervised Multi-Object Tracking For Autonomous Driving From Consistency Across Timescales

Authors: Christopher Lang, Alexander Braun, Lars Schillingmann, Abhinav Valada

Abstract: Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data. However, their re-identification accuracy still falls short compared to their supervised counterparts. We hypothesize that this drawback results from formulating self-supervised objectives that are limited to single frames or frame pairs. Such formulations do not capture sufficien… ▽ More Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data. However, their re-identification accuracy still falls short compared to their supervised counterparts. We hypothesize that this drawback results from formulating self-supervised objectives that are limited to single frames or frame pairs. Such formulations do not capture sufficient visual appearance variations to facilitate learning consistent re-identification features for autonomous driving when the frame rate is low or object dynamics are high. In this work, we propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames by enforcing consistent association scores across short and long timescales. We perform extensive evaluations demonstrating that re-identification features trained from longer sequences significantly reduce ID switches on standard autonomous driving datasets compared to existing self-supervised learning methods, which are limited to training on frame pairs. Using our proposed SubCo loss function, we set the new state-of-the-art among self-supervised methods and even perform on par with fully supervised learning methods. △ Less

Submitted 21 September, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 8 pages, 3 figures, 5 tables

arXiv:2303.09728 [pdf, other]

The Cascaded Forward Algorithm for Neural Network Training

Authors: Gongpei Zhao, Tao Wang, Yidong Li, Yi Jin, Congyan Lang, Haibin Ling

Abstract: Backpropagation algorithm has been widely used as a mainstream learning procedure for neural networks in the past decade, and has played a significant role in the development of deep learning. However, there exist some limitations associated with this algorithm, such as getting stuck in local minima and experiencing vanishing/exploding gradients, which have led to questions about its biological pl… ▽ More Backpropagation algorithm has been widely used as a mainstream learning procedure for neural networks in the past decade, and has played a significant role in the development of deep learning. However, there exist some limitations associated with this algorithm, such as getting stuck in local minima and experiencing vanishing/exploding gradients, which have led to questions about its biological plausibility. To address these limitations, alternative algorithms to backpropagation have been preliminarily explored, with the Forward-Forward (FF) algorithm being one of the most well-known. In this paper we propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples and thus leads to a more efficient process at both training and testing. Moreover, in our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems. The proposed method is evaluated on four public image classification benchmarks, and the experimental results illustrate significant improvement in prediction accuracy in comparison with the baseline. △ Less

Submitted 11 December, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2302.09043 [pdf, other]

Self-Supervised Representation Learning from Temporal Ordering of Automated Driving Sequences

Authors: Christopher Lang, Alexander Braun, Lars Schillingmann, Karsten Haug, Abhinav Valada

Abstract: Self-supervised feature learning enables perception systems to benefit from the vast raw data recorded by vehicle fleets worldwide. While video-level self-supervised learning approaches have shown strong generalizability on classification tasks, the potential to learn dense representations from sequential data has been relatively unexplored. In this work, we propose TempO, a temporal ordering pret… ▽ More Self-supervised feature learning enables perception systems to benefit from the vast raw data recorded by vehicle fleets worldwide. While video-level self-supervised learning approaches have shown strong generalizability on classification tasks, the potential to learn dense representations from sequential data has been relatively unexplored. In this work, we propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks. We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems, and formulate the sequential ordering by predicting frame transition probabilities in a transformer-based multi-frame architecture whose complexity scales less than quadratic with respect to the sequence length. Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods as well as supervised transfer learning initialization strategies, achieving an improvement of +0.7% in mAP for object detection and +2.0% in the HOTA score for multi-object tracking. △ Less

Submitted 8 November, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: 12 pages, 7 figures

arXiv:2301.06262 [pdf, other]

doi 10.1109/MITS.2023.3298534

Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges

Authors: Yushan Han, Hui Zhang, Huifang Li, Yi Jin, Congyan Lang, Yidong Li

Abstract: Collaborative perception is essential to address occlusion and sensor failure issues in autonomous driving. In recent years, theoretical and experimental investigations of novel works for collaborative perception have increased tremendously. So far, however, few reviews have focused on systematical collaboration modules and large-scale collaborative perception datasets. This work reviews recent ac… ▽ More Collaborative perception is essential to address occlusion and sensor failure issues in autonomous driving. In recent years, theoretical and experimental investigations of novel works for collaborative perception have increased tremendously. So far, however, few reviews have focused on systematical collaboration modules and large-scale collaborative perception datasets. This work reviews recent achievements in this field to bridge this gap and motivate future research. We start with a brief overview of collaboration schemes. After that, we systematically summarize the collaborative perception methods for ideal scenarios and real-world issues. The former focuses on collaboration modules and efficiency, and the latter is devoted to addressing the problems in actual application. Furthermore, we present large-scale public datasets and summarize quantitative results on these benchmarks. Finally, we highlight gaps and overlook challenges between current academic research and real-world applications. The project page is https://github.com/CatOneTwo/Collaborative-Perception-in-Autonomous-Driving △ Less

Submitted 30 August, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

Comments: 18 pages, 6 figures. Accepted by IEEE Intelligent Transportation Systems Magazine. URL: https://github.com/CatOneTwo/Collaborative-Perception-in-Autonomous-Driving

Journal ref: IEEE Intelligent Transportation Systems Magazine

arXiv:2204.09903 [pdf, other]

Beyond the Prototype: Divide-and-conquer Proxies for Few-shot Segmentation

Authors: Chunbo Lang, Binfei Tu, Gong Cheng, Junwei Han

Abstract: Few-shot segmentation, which aims to segment unseen-class objects given only a handful of densely labeled samples, has received widespread attention from the community. Existing approaches typically follow the prototype learning paradigm to perform meta-inference, which fails to fully exploit the underlying information from support image-mask pairs, resulting in various segmentation failures, e.g.… ▽ More Few-shot segmentation, which aims to segment unseen-class objects given only a handful of densely labeled samples, has received widespread attention from the community. Existing approaches typically follow the prototype learning paradigm to perform meta-inference, which fails to fully exploit the underlying information from support image-mask pairs, resulting in various segmentation failures, e.g., incomplete objects, ambiguous boundaries, and distractor activation. To this end, we propose a simple yet versatile framework in the spirit of divide-and-conquer. Specifically, a novel self-reasoning scheme is first implemented on the annotated support image, and then the coarse segmentation mask is divided into multiple regions with different properties. Leveraging effective masked average pooling operations, a series of support-induced proxies are thus derived, each playing a specific role in conquering the above challenges. Moreover, we devise a unique parallel decoder structure that integrates proxies with similar attributes to boost the discrimination power. Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information as a guide at the "episode" level, not just about the object cues themselves. Extensive experiments on PASCAL-5i and COCO-20i demonstrate the superiority of DCP over conventional prototype-based approaches (up to 5~10% on average), which also establishes a new state-of-the-art. Code is available at github.com/chunbolang/DCP. △ Less

Submitted 30 May, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: accepted to IJCAI 2022 Long Oral

arXiv:2203.08049 [pdf, other]

On Hyperbolic Embeddings in 2D Object Detection

Authors: Christopher Lang, Alexander Braun, Abhinav Valada

Abstract: Object detection, for the most part, has been formulated in the euclidean space, where euclidean or spherical geodesic distances measure the similarity of an image region to an object class prototype. In this work, we study whether a hyperbolic geometry better matches the underlying structure of the object classification space. We incorporate a hyperbolic classifier in two-stage, keypoint-based, a… ▽ More Object detection, for the most part, has been formulated in the euclidean space, where euclidean or spherical geodesic distances measure the similarity of an image region to an object class prototype. In this work, we study whether a hyperbolic geometry better matches the underlying structure of the object classification space. We incorporate a hyperbolic classifier in two-stage, keypoint-based, and transformer-based object detection architectures and evaluate them on large-scale, long-tailed, and zero-shot object detection benchmarks. In our extensive experimental evaluations, we observe categorical class hierarchies emerging in the structure of the classification space, resulting in lower classification errors and boosting the overall object detection performance. △ Less

Submitted 18 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: 14 pages, 5 figures

arXiv:2203.07615 [pdf, other]

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

Authors: Chunbo Lang, Gong Cheng, Binfei Tu, Junwei Han

Abstract: Recently few-shot segmentation (FSS) has been extensively developed. Most previous works strive to achieve generalization through the meta-learning framework derived from classification tasks; however, the trained models are biased towards the seen classes instead of being ideally class-agnostic, thus hindering the recognition of new concepts. This paper proposes a fresh and straightforward insigh… ▽ More Recently few-shot segmentation (FSS) has been extensively developed. Most previous works strive to achieve generalization through the meta-learning framework derived from classification tasks; however, the trained models are biased towards the seen classes instead of being ideally class-agnostic, thus hindering the recognition of new concepts. This paper proposes a fresh and straightforward insight to alleviate the problem. Specifically, we apply an additional branch (base learner) to the conventional FSS model (meta learner) to explicitly identify the targets of base classes, i.e., the regions that do not need to be segmented. Then, the coarse results output by these two learners in parallel are adaptively integrated to yield precise segmentation prediction. Considering the sensitivity of meta learner, we further introduce an adjustment factor to estimate the scene differences between the input image pairs for facilitating the model ensemble forecasting. The substantial performance gains on PASCAL-5i and COCO-20i verify the effectiveness, and surprisingly, our versatile scheme sets a new state-of-the-art even with two plain learners. Moreover, in light of the unique nature of the proposed approach, we also extend it to a more realistic but challenging setting, i.e., generalized FSS, where the pixels of both base and novel classes are required to be determined. The source code is available at github.com/chunbolang/BAM. △ Less

Submitted 28 March, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR 2022 Oral

arXiv:2201.02057 [pdf, other]

GLAN: A Graph-based Linear Assignment Network

Authors: He Liu, Tao Wang, Congyan Lang, Songhe Feng, Yi Jin, Yidong Li

Abstract: Differentiable solvers for the linear assignment problem (LAP) have attracted much research attention in recent years, which are usually embedded into learning frameworks as components. However, previous algorithms, with or without learning strategies, usually suffer from the degradation of the optimality with the increment of the problem size. In this paper, we propose a learnable linear assignme… ▽ More Differentiable solvers for the linear assignment problem (LAP) have attracted much research attention in recent years, which are usually embedded into learning frameworks as components. However, previous algorithms, with or without learning strategies, usually suffer from the degradation of the optimality with the increment of the problem size. In this paper, we propose a learnable linear assignment solver based on deep graph networks. Specifically, we first transform the cost matrix to a bipartite graph and convert the assignment task to the problem of selecting reliable edges from the constructed graph. Subsequently, a deep graph network is developed to aggregate and update the features of nodes and edges. Finally, the network predicts a label for each edge that indicates the assignment relationship. The experimental results on a synthetic dataset reveal that our method outperforms state-of-the-art baselines and achieves consistently high accuracy with the increment of the problem size. Furthermore, we also embed the proposed solver, in comparison with state-of-the-art baseline solvers, into a popular multi-object tracking (MOT) framework to train the tracker in an end-to-end manner. The experimental results on MOT benchmarks illustrate that the proposed LAP solver improves the tracker by the largest margin. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2201.01603 [pdf, other]

Deep Probabilistic Graph Matching

Authors: He Liu, Tao Wang, Yidong Li, Congyan Lang, Songhe Feng, Haibin Ling

Abstract: Most previous learning-based graph matching algorithms solve the \textit{quadratic assignment problem} (QAP) by dropping one or more of the matching constraints and adopting a relaxed assignment solver to obtain sub-optimal correspondences. Such relaxation may actually weaken the original graph matching problem, and in turn hurt the matching performance. In this paper we propose a deep learning-ba… ▽ More Most previous learning-based graph matching algorithms solve the \textit{quadratic assignment problem} (QAP) by dropping one or more of the matching constraints and adopting a relaxed assignment solver to obtain sub-optimal correspondences. Such relaxation may actually weaken the original graph matching problem, and in turn hurt the matching performance. In this paper we propose a deep learning-based graph matching framework that works for the original QAP without compromising on the matching constraints. In particular, we design an affinity-assignment prediction network to jointly learn the pairwise affinity and estimate the node assignments, and we then develop a differentiable solver inspired by the probabilistic perspective of the pairwise affinities. Aiming to obtain better matching results, the probabilistic solver refines the estimated assignments in an iterative manner to impose both discrete and one-to-one matching constraints. The proposed method is evaluated on three popularly tested benchmarks (Pascal VOC, Willow Object and SPair-71k), and it outperforms all previous state-of-the-arts on all benchmarks. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2112.11366 [pdf, other]

Contrastive Object Detection Using Knowledge Graph Embeddings

Authors: Christopher Lang, Alexander Braun, Abhinav Valada

Abstract: Object recognition for the most part has been approached as a one-hot problem that treats classes to be discrete and unrelated. Each image region has to be assigned to one member of a set of objects, including a background class, disregarding any similarities in the object types. In this work, we compare the error statistics of the class embeddings learned from a one-hot approach with semantically… ▽ More Object recognition for the most part has been approached as a one-hot problem that treats classes to be discrete and unrelated. Each image region has to be assigned to one member of a set of objects, including a background class, disregarding any similarities in the object types. In this work, we compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs that are widely applied in open world object detection. Extensive experimental results on multiple knowledge-embeddings as well as distance metrics indicate that knowledge-based class representations result in more semantically grounded misclassifications while performing on par compared to one-hot methods on the challenging COCO and Cityscapes object detection benchmarks. We generalize our findings to multiple object detection architectures by proposing a knowledge-embedded design for keypoint-based and transformer-based object detection architectures. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2111.06162 [pdf, other]

Clicking Matters:Towards Interactive Human Parsing

Authors: Yutong Gao, Liqian Liang, Congyan Lang, Songhe Feng, Yidong Li, Yunchao Wei

Abstract: In this work, we focus on Interactive Human Parsing (IHP), which aims to segment a human image into multiple human body parts with guidance from users' interactions. This new task inherits the class-aware property of human parsing, which cannot be well solved by traditional interactive image segmentation approaches that are generally class-agnostic. To tackle this new task, we first exploit user c… ▽ More In this work, we focus on Interactive Human Parsing (IHP), which aims to segment a human image into multiple human body parts with guidance from users' interactions. This new task inherits the class-aware property of human parsing, which cannot be well solved by traditional interactive image segmentation approaches that are generally class-agnostic. To tackle this new task, we first exploit user clicks to identify different human parts in the given image. These clicks are subsequently transformed into semantic-aware localization maps, which are concatenated with the RGB image to form the input of the segmentation network and generate the initial parsing result. To enable the network to better perceive user's purpose during the correction process, we investigate several principal ways for the refinement, and reveal that random-sampling-based click augmentation is the best way for promoting the correction effectiveness. Furthermore, we also propose a semantic-perceiving loss (SP-loss) to augment the training, which can effectively exploit the semantic relationships of clicks for better optimization. To the best knowledge, this work is the first attempt to tackle the human parsing task under the interactive setting. Our IHP solution achieves 85\% mIoU on the benchmark LIP, 80\% mIoU on PASCAL-Person-Part and CIHP, 75\% mIoU on Helen with only 1.95, 3.02, 2.84 and 1.09 clicks per class respectively. These results demonstrate that we can simply acquire high-quality human parsing masks with only a few human effort. We hope this work can motivate more researchers to develop data-efficient solutions to IHP in the future. △ Less

Submitted 16 December, 2021; v1 submitted 11 November, 2021; originally announced November 2021.

Comments: Human parsing, interactive segmentation, semantic segmentation

arXiv:2110.11264 [pdf, ps, other]

MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification

Authors: Yajun Gao, Tengfei Liang, Yi Jin, Xiaoyan Gu, Wu Liu, Yidong Li, Congyan Lang

Abstract: The RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality. Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space, which ignore the single space of each modality in the shallow layers. To solve i… ▽ More The RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality. Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space, which ignore the single space of each modality in the shallow layers. To solve it, in this paper, we present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space. Firstly, based on the observation that edge information is modality-invariant, we propose an edge features enhancement module to enhance the modality-sharable features in each single-modality space. Specifically, we design a perceptual edge features (PEF) loss after the edge fusion strategy analysis. According to our knowledge, this is the first work that proposes explicit optimization in the single-modality feature space on cross-modality ReID task. Moreover, to increase the difference between cross-modality distance and class distance, we introduce a novel cross-modality contrastive-center (CMCC) loss into the modality-joint constraints in the common feature space. The PEF loss and CMCC loss jointly optimize the model in an end-to-end manner, which markedly improves the network's performance. Extensive experiments demonstrate that the proposed model significantly outperforms state-of-the-art methods on both the SYSU-MM01 and RegDB datasets. △ Less

Submitted 21 October, 2021; originally announced October 2021.

arXiv:2110.01931 [pdf, other]

doi 10.1109/TGRS.2022.3183022

Anchor-free Oriented Proposal Generator for Object Detection

Authors: Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han

Abstract: Oriented object detection is a practical and challenging task in remote sensing image interpretation. Nowadays, oriented detectors mostly use horizontal boxes as intermedium to derive oriented boxes from them. However, the horizontal boxes are inclined to get small Intersection-over-Unions (IoUs) with ground truths, which may have some undesirable effects, such as introducing redundant noise, mism… ▽ More Oriented object detection is a practical and challenging task in remote sensing image interpretation. Nowadays, oriented detectors mostly use horizontal boxes as intermedium to derive oriented boxes from them. However, the horizontal boxes are inclined to get small Intersection-over-Unions (IoUs) with ground truths, which may have some undesirable effects, such as introducing redundant noise, mismatching with ground truths, detracting from the robustness of detectors, etc. In this paper, we propose a novel Anchor-free Oriented Proposal Generator (AOPG) that abandons horizontal box-related operations from the network architecture. AOPG first produces coarse oriented boxes by a Coarse Location Module (CLM) in an anchor-free manner and then refines them into high-quality oriented proposals. After AOPG, we apply a Fast R-CNN head to produce the final detection results. Furthermore, the shortage of large-scale datasets is also a hindrance to the development of oriented object detection. To alleviate the data insufficiency, we release a new dataset on the basis of our DIOR dataset and name it DIOR-R. Massive experiments demonstrate the effectiveness of AOPG. Particularly, without bells and whistles, we achieve the accuracy of 64.41%, 75.24% and 96.22% mAP on the DIOR-R, DOTA and HRSC2016 datasets respectively. Code and models are available at https://github.com/jbwang1997/AOPG. △ Less

Submitted 26 April, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

arXiv:2109.10474 [pdf, other]

doi 10.1371/journal.pcbi.1010594

Rapid detection and recognition of whole brain activity in a freely behaving Caenorhabditis elegans

Authors: Yuxiang Wu, Shang Wu, Xin Wang, Chengtian Lang, Quanshi Zhang, Quan Wen, Tianqi Xu

Abstract: Advanced volumetric imaging methods and genetically encoded activity indicators have permitted a comprehensive characterization of whole brain activity at single neuron resolution in \textit{Caenorhabditis elegans}. The constant motion and deformation of the nematode nervous system, however, impose a great challenge for consistent identification of densely packed neurons in a behaving animal. Here… ▽ More Advanced volumetric imaging methods and genetically encoded activity indicators have permitted a comprehensive characterization of whole brain activity at single neuron resolution in \textit{Caenorhabditis elegans}. The constant motion and deformation of the nematode nervous system, however, impose a great challenge for consistent identification of densely packed neurons in a behaving animal. Here, we propose a cascade solution for long-term and rapid recognition of head ganglion neurons in a freely moving \textit{C. elegans}. First, potential neuronal regions from a stack of fluorescence images are detected by a deep learning algorithm. Second, 2-dimensional neuronal regions are fused into 3-dimensional neuron entities. Third, by exploiting the neuronal density distribution surrounding a neuron and relative positional information between neurons, a multi-class artificial neural network transforms engineered neuronal feature vectors into digital neuronal identities. With a small number of training samples, our bottom-up approach is able to process each volume - $1024 \times 1024 \times 18$ in voxels - in less than 1 second and achieves an accuracy of $91\%$ in neuronal detection and above $80\%$ in neuronal tracking over a long video recording. Our work represents a step towards rapid and fully automated algorithms for decoding whole brain activity underlying naturalistic behaviors. △ Less

Submitted 15 September, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

Journal ref: PLOS Computational Biology 18(10): e1010594, 2022

arXiv:2109.00240 [pdf, other]

Joint Graph Learning and Matching for Semantic Feature Correspondence

Authors: He Liu, Tao Wang, Yidong Li, Congyan Lang, Yi Jin, Haibin Ling

Abstract: In recent years, powered by the learned discriminative representation via graph neural network (GNN) models, deep graph matching methods have made great progresses in the task of matching semantic features. However, these methods usually rely on heuristically generated graph patterns, which may introduce unreliable relationships to hurt the matching performance. In this paper, we propose a joint \… ▽ More In recent years, powered by the learned discriminative representation via graph neural network (GNN) models, deep graph matching methods have made great progresses in the task of matching semantic features. However, these methods usually rely on heuristically generated graph patterns, which may introduce unreliable relationships to hurt the matching performance. In this paper, we propose a joint \emph{graph learning and matching} network, named GLAM, to explore reliable graph structures for boosting graph matching. GLAM adopts a pure attention-based framework for both graph learning and graph matching. Specifically, it employs two types of attention mechanisms, self-attention and cross-attention for the task. The self-attention discovers the relationships between features and to further update feature representations over the learnt structures; and the cross-attention computes cross-graph correlations between the two feature sets to be matched for feature reconstruction. Moreover, the final matching solution is directly derived from the output of the cross-attention layer, without employing a specific matching decision module. The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k), and it outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks. Furthermore, the graph patterns learnt by our model are validated to be able to remarkably enhance previous deep graph matching methods by replacing their handcrafted graph structures with the learnt ones. △ Less

Submitted 17 November, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

arXiv:2102.13360 [pdf, other]

A Universal Model for Cross Modality Mapping by Relational Reasoning

Authors: Zun Li, Congyan Lang, Liqian Liang, Tao Wang, Songhe Feng, Jun Wu, Yidong Li

Abstract: With the aim of matching a pair of instances from two different modalities, cross modality mapping has attracted growing attention in the computer vision community. Existing methods usually formulate the mapping function as the similarity measure between the pair of instance features, which are embedded to a common space. However, we observe that the relationships among the instances within a sing… ▽ More With the aim of matching a pair of instances from two different modalities, cross modality mapping has attracted growing attention in the computer vision community. Existing methods usually formulate the mapping function as the similarity measure between the pair of instance features, which are embedded to a common space. However, we observe that the relationships among the instances within a single modality (intra relations) and those between the pair of heterogeneous instances (inter relations) are insufficiently explored in previous approaches. Motivated by this, we redefine the mapping function with relational reasoning via graph modeling, and further propose a GCN-based Relational Reasoning Network (RR-Net) in which inter and intra relations are efficiently computed to universally resolve the cross modality mapping problem. Concretely, we first construct two kinds of graph, i.e., Intra Graph and Inter Graph, to respectively model intra relations and inter relations. Then RR-Net updates all the node features and edge features in an iterative manner for learning intra and inter relations simultaneously. Last, RR-Net outputs the probabilities over the edges which link a pair of heterogeneous instances to estimate the mapping results. Extensive experiments on three example tasks, i.e., image classification, social recommendation and sound recognition, clearly demonstrate the superiority and universality of our proposed model. △ Less

Submitted 26 February, 2021; originally announced February 2021.

Comments: 10 pages, 4 figures

arXiv:2101.12578 [pdf, other]

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Authors: Fan-Keng Sun, Christopher I. Lang, Duane S. Boning

Abstract: An increasing body of research focuses on using neural networks to model time series. A common assumption in training neural networks via maximum likelihood estimation on time series is that the errors across time steps are uncorrelated. However, errors are actually autocorrelated in many cases due to the temporality of the data, which makes such maximum likelihood estimations inaccurate. In this… ▽ More An increasing body of research focuses on using neural networks to model time series. A common assumption in training neural networks via maximum likelihood estimation on time series is that the errors across time steps are uncorrelated. However, errors are actually autocorrelated in many cases due to the temporality of the data, which makes such maximum likelihood estimations inaccurate. In this paper, in order to adjust for autocorrelated errors, we propose to learn the autocorrelation coefficient jointly with the model parameters. In our experiments, we verify the effectiveness of our approach on time series forecasting. Results across a wide range of real-world datasets with various state-of-the-art models show that our method enhances performance in almost all cases. Based on these results, we suggest empirical critical values to determine the severity of autocorrelated errors. We also analyze several aspects of our method to demonstrate its advantages. Finally, other time series tasks are also considered to validate that our method is not restricted to only forecasting. △ Less

Submitted 8 October, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

arXiv:2011.00139 [pdf, other]

doi 10.1109/ICSP48669.2020.9320928

EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising

Authors: Tengfei Liang, Yi Jin, Yidong Li, Tao Wang, Songhe Feng, Congyan Lang

Abstract: In the past few decades, to reduce the risk of X-ray in computed tomography (CT), low-dose CT image denoising has attracted extensive attention from researchers, which has become an important research issue in the field of medical images. In recent years, with the rapid development of deep learning technology, many algorithms have emerged to apply convolutional neural networks to this task, achiev… ▽ More In the past few decades, to reduce the risk of X-ray in computed tomography (CT), low-dose CT image denoising has attracted extensive attention from researchers, which has become an important research issue in the field of medical images. In recent years, with the rapid development of deep learning technology, many algorithms have emerged to apply convolutional neural networks to this task, achieving promising results. However, there are still some problems such as low denoising efficiency, over-smoothed result, etc. In this paper, we propose the Edge enhancement based Densely connected Convolutional Neural Network (EDCNN). In our network, we design an edge enhancement module using the proposed novel trainable Sobel convolution. Based on this module, we construct a model with dense connections to fuse the extracted edge information and realize end-to-end image denoising. Besides, when training the model, we introduce a compound loss that combines MSE loss and multi-scales perceptual loss to solve the over-smoothed problem and attain a marked improvement in image quality after denoising. Compared with the existing low-dose CT image denoising algorithms, our proposed model has a better performance in preserving details and suppressing noise. △ Less

Submitted 30 October, 2020; originally announced November 2020.

Comments: 8 pages, 7 figures, 3 tables

Journal ref: 2020 15th IEEE International Conference on Signal Processing (ICSP). 1 (2020) 193-198

arXiv:2005.10738 [pdf, ps, other]

Multi-agent model for risk prediction in surgery

Authors: Bruno Perez, Julien Henriet, Christophe Lang, Laurent Philippe

Abstract: Risk management resulting from the actions and states of the different elements making up a operating room is a major concern during a surgical procedure. Agent-based simulation shows an interest through its interaction concepts, interactivity and autonomy of different simulator entities. We want in our study to implement a generator of alerts to listen the evolution of different settings applied… ▽ More Risk management resulting from the actions and states of the different elements making up a operating room is a major concern during a surgical procedure. Agent-based simulation shows an interest through its interaction concepts, interactivity and autonomy of different simulator entities. We want in our study to implement a generator of alerts to listen the evolution of different settings applied to the simulator of agents (human fatigue, material efficiency, infection rate ...). This article presents our model, its implementation and the first results obtained. It should be noted that this study also made it possible to identify several scientific obstacles, such as the integration of different levels of abstraction, the coupling of species, the coexistence of several scales in the same environment and the deduction of unpredictable alerts. Case-based reasoning (CBR) is a beginning of response relative to the last lock mentioned and will be discussed in this paper. △ Less

Submitted 22 July, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

arXiv:2002.10864 [pdf, other]

doi 10.1109/TIP.2021.3072811

Cross-layer Feature Pyramid Network for Salient Object Detection

Authors: Zun Li, Congyan Lang, Junhao Liew, Qibin Hou, Yidong Li, Jiashi Feng

Abstract: Feature pyramid network (FPN) based models, which fuse the semantics and salient details in a progressive manner, have been proven highly effective in salient object detection. However, it is observed that these models often generate saliency maps with incomplete object structures or unclear object boundaries, due to the \emph{indirect} information propagation among distant layers that makes such… ▽ More Feature pyramid network (FPN) based models, which fuse the semantics and salient details in a progressive manner, have been proven highly effective in salient object detection. However, it is observed that these models often generate saliency maps with incomplete object structures or unclear object boundaries, due to the \emph{indirect} information propagation among distant layers that makes such fusion structure less effective. In this work, we propose a novel Cross-layer Feature Pyramid Network (CFPN), in which direct cross-layer communication is enabled to improve the progressive fusion in salient object detection. Specifically, the proposed network first aggregates multi-scale features from different layers into feature maps that have access to both the high- and low-level information. Then, it distributes the aggregated features to all the involved layers to gain access to richer context. In this way, the distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information. Extensive experimental results over six widely used salient object detection benchmarks and with three popular backbones clearly demonstrate that CFPN can accurately locate fairly complete salient regions and effectively segment the object boundaries. △ Less

Submitted 25 February, 2020; originally announced February 2020.

Comments: 10 pages, 7 figures

arXiv:2001.02107 [pdf]

doi 10.1093/database/bay071

Leveraging Prior Knowledge for Protein-Protein Interaction Extraction with Memory Network

Authors: Huiwei Zhou, Zhuang Liu, Shixian Ning, Yunlong Yang, Chengkun Lang, Yingyu Lin, Kun Ma

Abstract: Automatically extracting Protein-Protein Interactions (PPI) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) for PPI extraction, which leverages prior knowledge about protein-protein pairs with memory networks. The proposed MNM captures important context clues related to knowledge representations lea… ▽ More Automatically extracting Protein-Protein Interactions (PPI) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) for PPI extraction, which leverages prior knowledge about protein-protein pairs with memory networks. The proposed MNM captures important context clues related to knowledge representations learned from knowledge bases. Both entity embeddings and relation embeddings of prior knowledge are effective in improving the PPI extraction model, leading to a new state-of-the-art performance on the BioCreative VI PPI dataset. The paper also shows that multiple computational layers over an external memory are superior to long short-term memory networks with the local memories. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: Published on Database-The Journal of Biological Databases and Curation, 11 pages, 5 figures

Journal ref: Database-The Journal of Biological Databases and Curation, 2018, 2018: bay071

arXiv:2001.02091 [pdf]

doi 10.1016/j.jbi.2019.103234

Knowledge-aware Attention Network for Protein-Protein Interaction Extraction

Authors: Huiwei Zhou, Zhuang Liu1, Shixian Ning, Chengkun Lang, Yingyu Lin, Lei Du

Abstract: Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. However, many of the current PPI extraction methods need extensive feature engineering and cannot make full use of the prior knowledge in knowledge bases (KB). KBs contain huge amounts of structured information about entities and relationships, therefore pla… ▽ More Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. However, many of the current PPI extraction methods need extensive feature engineering and cannot make full use of the prior knowledge in knowledge bases (KB). KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in PPI extraction. This paper proposes a knowledge-aware attention network (KAN) to fuse prior knowledge about protein-protein pairs and context information for PPI extraction. The proposed model first adopts a diagonal-disabled multi-head attention mechanism to encode context sequence along with knowledge representations learned from KB. Then a novel multi-dimensional attention mechanism is used to select the features that can best describe the encoded context. Experiment results on the BioCreative VI PPI dataset show that the proposed approach could acquire knowledge-aware dependencies between different words in a sequence and lead to a new state-of-the-art performance. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: Published on Journal of Biomedical Informatics, 14 pages, 5 figures

Journal ref: Journal of Biomedical Informatics, 2019, 96: 103234

arXiv:2001.00604 [pdf, other]

doi 10.1145/3308560.3316485

Detecting Areas of Potential High Prevalence of Chagas in Argentina

Authors: Antonio Vazquez Brust, Tomas Olego, German Rosati, Carolina Lang, Guillermo Bozzoli, Diego Weinberg, Roberto Chuit, Martin A. Minnoni, Carlos Sarraute

Abstract: A map of potential prevalence of Chagas disease (ChD) with high spatial disaggregation is presented. It aims to detect areas outside the Gran Chaco ecoregion (hyperendemic for the ChD), characterized by high affinity with ChD and high health vulnerability. To quantify potential prevalence, we developed several indicators: an Affinity Index which quantifies the degree of linkage between endemic a… ▽ More A map of potential prevalence of Chagas disease (ChD) with high spatial disaggregation is presented. It aims to detect areas outside the Gran Chaco ecoregion (hyperendemic for the ChD), characterized by high affinity with ChD and high health vulnerability. To quantify potential prevalence, we developed several indicators: an Affinity Index which quantifies the degree of linkage between endemic areas of ChD and the rest of the country. We also studied favorable habitability conditions for Triatoma infestans, looking for areas where the predominant materials of floors, roofs and internal ceilings favor the presence of the disease vector. We studied determinants of a more general nature that can be encompassed under the concept of Health Vulnerability Index. These determinants are associated with access to health providers and the socio-economic level of different segments of the population. Finally we constructed a Chagas Potential Prevalence Index (ChPPI) which combines the affinity index, the health vulnerability index, and the population density. We show and discuss the maps obtained. These maps are intended to assist public health specialists, decision makers of public health policies and public officials in the development of cost-effective strategies to improve access to diagnosis and treatment of ChD. △ Less

Submitted 2 January, 2020; originally announced January 2020.

Comments: Proceedings of the 2019 World Wide Web Conference. May 13-17, 2019. San Francisco, CA, USA

arXiv:2001.00596 [pdf, other]

Computing Accessibility Metrics for Argentina

Authors: Carolina Lang, Tobias Carreira, German Cesar Dima, Lucila Berniell, Carlos Sarraute

Abstract: We present a tool to calculate distances and travel times between a set of origins and a set of destinations, using different modes of transport in Argentina. The input data for the tool is a set of destinations (a geo-referenced list of points of city amenities or "opportunities", such as firms, schools, hospitals, parks, banks or retail, etc.) and a set of origins characterized by their geograph… ▽ More We present a tool to calculate distances and travel times between a set of origins and a set of destinations, using different modes of transport in Argentina. The input data for the tool is a set of destinations (a geo-referenced list of points of city amenities or "opportunities", such as firms, schools, hospitals, parks, banks or retail, etc.) and a set of origins characterized by their geographic coordinates that could be interpreted as households or other. The tool determines, from each origin, which is the closest destination, depending on the distance or travel time and the mode of transport (on foot, by bicycle, by car, and by public transport). The sets of origins and destinations are large sets, which can contain up to several thousand points. We applied and developed algorithms to improve the scalability of the different parts of the procedure. For the public transportation network, we pre-processed the reachable lines from each point and used quad-trees to determine the distance between said points and the bus line's path. A second objective of this project was to rely only on open data, such as Open Street Map (OSM) data, together with making this tool open source. Therefore, the successful development and implementation of this tool is potentially beneficial to both public sector agencies as well as NGOs and other civil society organizations that focus their work on the design and implementation of public policies, aimed at improving accessibility in cities as a way to reduce spatial inequalities and social exclusion. △ Less

Submitted 2 January, 2020; originally announced January 2020.

Comments: Data For Policy International Conference. June 11-12, 2019, London, UK

arXiv:2001.00295 [pdf]

doi 10.1016/j.jbi.2018.07.007

Chemical-induced Disease Relation Extraction with Dependency Information and Prior Knowledge

Authors: Huiwei Zhou, Shixian Ning, Yunlong Yang, Zhuang Liu, Chengkun Lang, Yingyu Lin

Abstract: Chemical-disease relation (CDR) extraction is significantly important to various areas of biomedical research and health care. Nowadays, many large-scale biomedical knowledge bases (KBs) containing triples about entity pairs and their relations have been built. KBs are important resources for biomedical relation extraction. However, previous research pays little attention to prior knowledge. In ad… ▽ More Chemical-disease relation (CDR) extraction is significantly important to various areas of biomedical research and health care. Nowadays, many large-scale biomedical knowledge bases (KBs) containing triples about entity pairs and their relations have been built. KBs are important resources for biomedical relation extraction. However, previous research pays little attention to prior knowledge. In addition, the dependency tree contains important syntactic and semantic information, which helps to improve relation extraction. So how to effectively use it is also worth studying. In this paper, we propose a novel convolutional attention network (CAN) for CDR extraction. Firstly, we extract the shortest dependency path (SDP) between chemical and disease pairs in a sentence, which includes a sequence of words, dependency directions, and dependency relation tags. Then the convolution operations are performed on the SDP to produce deep semantic dependency features. After that, an attention mechanism is employed to learn the importance/weight of each semantic dependency vector related to knowledge representations learned from KBs. Finally, in order to combine dependency information and prior knowledge, the concatenation of weighted semantic dependency representations and knowledge representations is fed to the softmax layer for classification. Experiments on the BioCreative V CDR dataset show that our method achieves comparable performance with the state-of-the-art systems, and both dependency information and prior knowledge play important roles in CDR extraction task. △ Less

Submitted 1 January, 2020; originally announced January 2020.

Comments: Published on Journal of Biomedical Informatics, 13 pages

Journal ref: Journal of Biomedical Informatics, 2018, 84:171-178

arXiv:1912.10604 [pdf]

doi 10.1109/TCBB.2018.2838661

Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction

Authors: Huiwei Zhou, Yunlong Yang, Shixian Ning, Zhuang Liu, Chengkun Lang, Yingyu Lin, Degen Huang

Abstract: Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in che… ▽ More Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention to the prior knowledge existing in KBs. This paper proposes a neural network-based attention model (NAM) for CDR extraction, which makes full use of context information in documents and prior knowledge in KBs. For a pair of entities in a document, an attention mechanism is employed to select important context words with respect to the relation representations learned from KBs. Experiments on the BioCreative V CDR dataset show that combining context and knowledge representations through the attention mechanism, could significantly improve the CDR extraction performance while achieve comparable results with state-of-the-art systems. △ Less

Submitted 22 December, 2019; originally announced December 2019.

Comments: Published on IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11 pages, 5 figures

Journal ref: IEEE/ACM TCBB,2018,16(6):1879-1889

arXiv:1912.10590 [pdf]

doi 10.1186/s12859-019-2873-7

Knowledge-guided Convolutional Networks for Chemical-Disease Relation Extraction

Authors: Huiwei Zhou, Chengkun Lang, Zhuang Liu, Shixian Ning, Yingyu Lin, Lei Du

Abstract: Background: Automatic extraction of chemical-disease relations (CDR) from unstructured text is of essential importance for disease treatment and drug development. Meanwhile, biomedical experts have built many highly-structured knowledge bases (KBs), which contain prior knowledge about chemicals and diseases. Prior knowledge provides strong support for CDR extraction. How to make full use of it is… ▽ More Background: Automatic extraction of chemical-disease relations (CDR) from unstructured text is of essential importance for disease treatment and drug development. Meanwhile, biomedical experts have built many highly-structured knowledge bases (KBs), which contain prior knowledge about chemicals and diseases. Prior knowledge provides strong support for CDR extraction. How to make full use of it is worth studying. Results: This paper proposes a novel model called "Knowledge-guided Convolutional Networks (KCN)" to leverage prior knowledge for CDR extraction. The proposed model first learns knowledge representations including entity embeddings and relation embeddings from KBs. Then, entity embeddings are used to control the propagation of context features towards a chemical-disease pair with gated convolutions. After that, relation embeddings are employed to further capture the weighted context features by a shared attention pooling. Finally, the weighted context features containing additional knowledge information are used for CDR extraction. Experiments on the BioCreative V CDR dataset show that the proposed KCN achieves 71.28% F1-score, which outperforms most of the state-of-the-art systems. Conclusions: This paper proposes a novel CDR extraction model KCN to make full use of prior knowledge. Experimental results demonstrate that KCN could effectively integrate prior knowledge and contexts for the performance improvement. △ Less

Submitted 22 December, 2019; originally announced December 2019.

Comments: Published on BMC Bioinformatics, 16 pages, 5 figures

Journal ref: BMC Bioinformatics, 2019, 20(1):260

arXiv:1912.05147 [pdf]

doi 10.1016/j.compbiolchem.2019.107146

Improving Neural Protein-Protein Interaction Extraction with Knowledge Selection

Authors: Huiwei Zhou, Xuefei Li, Weihong Yao, Zhuang Liu, Shixian Ning, Chengkun Lang, Lei Du

Abstract: Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. Meanwhile, knowledge bases (KBs) contain huge amounts of structured information of protein entities and their relations, which can be encoded in entity and relation embeddings to help PPI extraction. However, the prior knowledge of protein-protein pairs must… ▽ More Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. Meanwhile, knowledge bases (KBs) contain huge amounts of structured information of protein entities and their relations, which can be encoded in entity and relation embeddings to help PPI extraction. However, the prior knowledge of protein-protein pairs must be selectively used so that it is suitable for different contexts. This paper proposes a Knowledge Selection Model (KSM) to fuse the selected prior knowledge and context information for PPI extraction. Firstly, two Transformers encode the context sequence of a protein pair according to each protein embedding, respectively. Then, the two outputs are fed to a mutual attention to capture the important context features towards the protein pair. Next, the context features are used to distill the relation embedding by a knowledge selector. Finally, the selected relation embedding and the context features are concatenated for PPI extraction. Experiments on the BioCreative VI PPI dataset show that KSM achieves a new state-of-the-art performance (38.08% F1-score) by adding knowledge selection. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Published in Computational Biology and Chemistry; 14 pages, 2 figures

Journal ref: Computational Biology and Chemistry, 2019, 83: 107146

arXiv:1911.11536 [pdf, other]

Electricity Load Forecasting -- An Evaluation of Simple 1D-CNN Network Structures

Authors: Christian Lang, Florian Steinborn, Oliver Steffens, Elmar W. Lang

Abstract: This paper presents a convolutional neural network (CNN) which can be used for forecasting electricity load profiles 36 hours into the future. In contrast to well established CNN architectures, the input data is one-dimensional. A parameter scanning of network parameters is conducted in order to gain information about the influence of the kernel size, number of filters, and dense size. The results… ▽ More This paper presents a convolutional neural network (CNN) which can be used for forecasting electricity load profiles 36 hours into the future. In contrast to well established CNN architectures, the input data is one-dimensional. A parameter scanning of network parameters is conducted in order to gain information about the influence of the kernel size, number of filters, and dense size. The results show that a good forecast quality can already be achieved with basic CNN architectures.The method works not only for smooth sum loads of many hundred consumers, but also for the load of apartment buildings. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: Presented at the ITISE 2019 in Granada

Journal ref: Proceedings of Papers - ITISE 2019

arXiv:1906.00551 [pdf, ps, other]

HERA: Partial Label Learning by Combining Heterogeneous Loss with Sparse and Low-Rank Regularization

Authors: Gengyu Lyu, Songhe Feng, Yi Jin, Guojun Dai, Congyan Lang, Yidong Li

Abstract: Partial Label Learning (PLL) aims to learn from the data where each training instance is associated with a set of candidate labels, among which only one is correct. Most existing methods deal with such problem by either treating each candidate label equally or identifying the ground-truth label iteratively. In this paper, we propose a novel PLL approach called HERA, which simultaneously incorporat… ▽ More Partial Label Learning (PLL) aims to learn from the data where each training instance is associated with a set of candidate labels, among which only one is correct. Most existing methods deal with such problem by either treating each candidate label equally or identifying the ground-truth label iteratively. In this paper, we propose a novel PLL approach called HERA, which simultaneously incorporates the HeterogEneous Loss and the SpaRse and Low-rAnk procedure to estimate the labeling confidence for each instance while training the model. Specifically, the heterogeneous loss integrates the strengths of both the pairwise ranking loss and the pointwise reconstruction loss to provide informative label ranking and reconstruction information for label identification, while the embedded sparse and low-rank scheme constrains the sparsity of ground-truth label matrix and the low rank of noise label matrix to explore the global label relevance among the whole training data for improving the learning model. Extensive experiments on both artificial and real-world data sets demonstrate that our method can achieve superior or comparable performance against the state-of-the-art methods. △ Less

Submitted 2 June, 2019; originally announced June 2019.

arXiv:1901.08362 [pdf, other]

Deep Reasoning with Multi-Scale Context for Salient Object Detection

Authors: Zun Li, Congyan Lang, Yunpeng Chen, Junhao Liew, Jiashi Feng

Abstract: To detect salient objects accurately, existing methods usually design complex backbone network architectures to learn and fuse powerful features. However, the saliency inference module that performs saliency prediction from the fused features receives much less attention on its architecture design and typically adopts only a few fully convolutional layers. In this paper, we find the limited capaci… ▽ More To detect salient objects accurately, existing methods usually design complex backbone network architectures to learn and fuse powerful features. However, the saliency inference module that performs saliency prediction from the fused features receives much less attention on its architecture design and typically adopts only a few fully convolutional layers. In this paper, we find the limited capacity of the saliency inference module indeed makes a fundamental performance bottleneck, and enhancing its capacity is critical for obtaining better saliency prediction. Correspondingly, we propose a deep yet light-weight saliency inference module that adopts a multi-dilated depth-wise convolution architecture. Such a deep inference module, though with simple architecture, can directly perform reasoning about salient objects from the multi-scale convolutional features fast, and give superior salient object detection performance with less computational cost. To our best knowledge, we are the first to reveal the importance of the inference module for salient object detection, and present a novel architecture design with attractive efficiency and accuracy. Extensive experimental evaluations demonstrate that our simple framework performs favorably compared with the state-of-the-art methods with complex backbone design. △ Less

Submitted 25 March, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

Comments: 10 pages, 8 figures, 3 table

arXiv:1901.03073 [pdf, ps, other]

GM-PLL: Graph Matching based Partial Label Learning

Authors: Gengyu Lyu, Songhe Feng, Tao Wang, Congyan Lang, Yidong Li

Abstract: Partial Label Learning (PLL) aims to learn from the data where each training example is associated with a set of candidate labels, among which only one is correct. The key to deal with such problem is to disambiguate the candidate label sets and obtain the correct assignments between instances and their candidate labels. In this paper, we interpret such assignments as instance-to-label matchings,… ▽ More Partial Label Learning (PLL) aims to learn from the data where each training example is associated with a set of candidate labels, among which only one is correct. The key to deal with such problem is to disambiguate the candidate label sets and obtain the correct assignments between instances and their candidate labels. In this paper, we interpret such assignments as instance-to-label matchings, and reformulate the task of PLL as a matching selection problem. To model such problem, we propose a novel Graph Matching based Partial Label Learning (GM-PLL) framework, where Graph Matching (GM) scheme is incorporated owing to its excellent capability of exploiting the instance and label relationship. Meanwhile, since conventional one-to-one GM algorithm does not satisfy the constraint of PLL problem that multiple instances may correspond to the same label, we extend a traditional one-to-one probabilistic matching algorithm to the many-to-one constraint, and make the proposed framework accommodate to the PLL problem. Moreover, we also propose a relaxed matching prediction model, which can improve the prediction accuracy via GM strategy. Extensive experiments on both artificial and real-world data sets demonstrate that the proposed method can achieve superior or comparable performance against the state-of-the-art methods. △ Less

Submitted 10 January, 2019; originally announced January 2019.

arXiv:1808.03350 [pdf, other]

Uncovering the Spread of Chagas Disease in Argentina and Mexico

Authors: Juan de Monasterio, Alejo Salles, Carolina Lang, Diego Weinberg, Martin Minnoni, Matias Travizano, Carlos Sarraute

Abstract: Chagas disease is a neglected disease, and information about its geographical spread is very scarse. We analyze here mobility and calling patterns in order to identify potential risk zones for the disease, by using public health information and mobile phone records. Geolocalized call records are rich in social and mobility information, which can be used to infer whether an individual has lived in… ▽ More Chagas disease is a neglected disease, and information about its geographical spread is very scarse. We analyze here mobility and calling patterns in order to identify potential risk zones for the disease, by using public health information and mobile phone records. Geolocalized call records are rich in social and mobility information, which can be used to infer whether an individual has lived in an endemic area. We present two case studies in Latin American countries. Our objective is to generate risk maps which can be used by public health campaign managers to prioritize detection campaigns and target specific areas. Finally, we analyze the value of mobile phone data to infer long-term migrations, which play a crucial role in the geographical spread of Chagas disease. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: Published in NetMob 2017 (Fifth Conference on the Scientific Analysis of Mobile Phone Datasets), Milan, Italy. April 5, 2017

ACM Class: J.4; H.2.8

arXiv:1804.07759 [pdf, ps, other]

A Self-paced Regularization Framework for Partial-Label Learning

Authors: Gengyu Lyu, Songhe Feng, Congyang Lang

Abstract: Partial label learning (PLL) aims to solve the problem where each training instance is associated with a set of candidate labels, one of which is the correct label. Most PLL algorithms try to disambiguate the candidate label set, by either simply treating each candidate label equally or iteratively identifying the true label. Nonetheless, existing algorithms usually treat all labels and instances… ▽ More Partial label learning (PLL) aims to solve the problem where each training instance is associated with a set of candidate labels, one of which is the correct label. Most PLL algorithms try to disambiguate the candidate label set, by either simply treating each candidate label equally or iteratively identifying the true label. Nonetheless, existing algorithms usually treat all labels and instances equally, and the complexities of both labels and instances are not taken into consideration during the learning stage. Inspired by the successful application of self-paced learning strategy in machine learning field, we integrate the self-paced regime into the partial label learning framework and propose a novel Self-Paced Partial-Label Learning (SP-PLL) algorithm, which could control the learning process to alleviate the problem by ranking the priorities of the training examples together with their candidate labels during each learning iteration. Extensive experiments and comparisons with other baseline methods demonstrate the effectiveness and robustness of the proposed method. △ Less

Submitted 8 May, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

arXiv:1707.01149 [pdf, other]

doi 10.1109/ASONAM.2016.7752298

Analyzing the Spread of Chagas Disease with Mobile Phone Data

Authors: Juan de Monasterio, Alejo Salles, Carolina Lang, Diego Weinberg, Martin Minnoni, Matias Travizano, Carlos Sarraute

Abstract: We use mobile phone records for the analysis of mobility patterns and the detection of possible risk zones of Chagas disease in two Latin American countries. We show that geolocalized call records are rich in social and individual information, which can be used to infer whether an individual has lived in an endemic area. We present two case studies, in Argentina and in Mexico, using data provided… ▽ More We use mobile phone records for the analysis of mobility patterns and the detection of possible risk zones of Chagas disease in two Latin American countries. We show that geolocalized call records are rich in social and individual information, which can be used to infer whether an individual has lived in an endemic area. We present two case studies, in Argentina and in Mexico, using data provided by mobile phone companies from each country. The risk maps that we generate can be used by health campaign managers to target specific areas and allocate resources more effectively. △ Less

Submitted 4 July, 2017; originally announced July 2017.

Comments: 6 pages, 6 figures

ACM Class: J.4; H.2.8

Journal ref: IEEE/ACM ASONAM pp. 607-612, San Francisco CA, USA, August 18-21 (2016)

arXiv:1707.01032 [pdf, other]

The City Pulse of Buenos Aires

Authors: Carlos Sarraute, Carolina Lang, Nicolas B. Ponieman, Sebastian Anapolsky

Abstract: Cell phone technology generates massive amounts of data. Although this data has been gathered for billing and logging purposes, today it has a much higher value, because its volume makes it very useful for big data analyses. In this project, we analyze the viability of using cell phone records to lower the cost of urban and transportation planning, in particular, to find out how people travel in a… ▽ More Cell phone technology generates massive amounts of data. Although this data has been gathered for billing and logging purposes, today it has a much higher value, because its volume makes it very useful for big data analyses. In this project, we analyze the viability of using cell phone records to lower the cost of urban and transportation planning, in particular, to find out how people travel in a specific city (in this case, Buenos Aires, in Argentina). We use anonymized cell phone data to estimate the distribution of the population in the city using different periods of time. We compare those results with traditional methods (urban polling) using data from Buenos Aires origin-destination surveys. Traditional polling methods have a much smaller sample, in the order of tens of thousands (or even less for smaller cities), to maintain reasonable costs. Furthermore, these studies are performed at most once per decade, in the best cases, in Argentina and many other countries. Our objective is to prove that new methods based on cell phone data are reliable, and can be used indirectly to keep a real-time track of the flow of people among different parts of a city. We also go further to explore new possibilities opened by these methods. △ Less

Submitted 1 August, 2017; v1 submitted 4 July, 2017; originally announced July 2017.

Comments: Published in NetMob 2015 (Fourth Conference on the Scientific Analysis of Mobile Phone Datasets), MIT Media Lab, Cambridge, USA, 8-10 April 2015

ACM Class: J.4; H.2.8

arXiv:1705.07206 [pdf, other]

Multiple-Human Parsing in the Wild

Authors: Jianshu Li, Jian Zhao, Yunchao Wei, Congyan Lang, Yidong Li, Terence Sim, Shuicheng Yan, Jiashi Feng

Abstract: Human parsing is attracting increasing research attention. In this work, we aim to push the frontier of human parsing by introducing the problem of multi-human parsing in the wild. Existing works on human parsing mainly tackle single-person scenarios, which deviates from real-world applications where multiple persons are present simultaneously with interaction and occlusion. To address the multi-h… ▽ More Human parsing is attracting increasing research attention. In this work, we aim to push the frontier of human parsing by introducing the problem of multi-human parsing in the wild. Existing works on human parsing mainly tackle single-person scenarios, which deviates from real-world applications where multiple persons are present simultaneously with interaction and occlusion. To address the multi-human parsing problem, we introduce a new multi-human parsing (MHP) dataset and a novel multi-human parsing model named MH-Parser. The MHP dataset contains multiple persons captured in real-world scenes with pixel-level fine-grained semantic annotations in an instance-aware setting. The MH-Parser generates global parsing maps and person instance masks simultaneously in a bottom-up fashion with the help of a new Graph-GAN model. We envision that the MHP dataset will serve as a valuable data resource to develop new multi-human parsing models, and the MH-Parser offers a strong baseline to drive future research for multi-human parsing in the wild. △ Less

Submitted 14 March, 2018; v1 submitted 19 May, 2017; originally announced May 2017.

Comments: The first two authors are with equal contribution

arXiv:1610.00656 [pdf, other]

doi 10.1371/journal.pone.0189795

The Statistical Mechanics of Human Weight Change

Authors: John C. Lang, Hans De Sterck, Daniel M. Abrams

Abstract: In the context of the global obesity epidemic, it is important to know who becomes obese and why. However, the processes that determine the changing shape of Body Mass Index (BMI) distributions in high-income societies are not well-understood. Here we establish the statistical mechanics of human weight change, providing a fundamental new understanding of human weight distributions. By compiling an… ▽ More In the context of the global obesity epidemic, it is important to know who becomes obese and why. However, the processes that determine the changing shape of Body Mass Index (BMI) distributions in high-income societies are not well-understood. Here we establish the statistical mechanics of human weight change, providing a fundamental new understanding of human weight distributions. By compiling and analysing the largest data set so far of year-over-year BMI changes, we find, strikingly, that heavy people on average strongly decrease their weight year-over-year, and light people increase their weight. This drift towards the centre of the BMI distribution is balanced by diffusion resulting from random fluctuations in diet and physical activity that are, notably, proportional in size to BMI. We formulate a stochastic mathematical model for BMI dynamics, deriving a theoretical shape for the BMI distribution and offering a mechanism to explain the ongoing right-skewed broadening of BMI distributions over time. The model also provides new quantitative support for the hypothesis that peer-to-peer social influence plays a measurable role in BMI dynamics. More broadly, our results demonstrate a remarkable analogy with drift-diffusion mechanisms that are well-known from the physical sciences and finance. △ Less

Submitted 29 September, 2016; originally announced October 2016.

arXiv:1501.04091 [pdf, other]

doi 10.1093/comnet/cnv030

A Hierarchy of Linear Threshold Models for the Spread of Political Revolutions on Social Networks

Authors: John C. Lang, Hans De Sterck

Abstract: We study a linear threshold agent-based model (ABM) for the spread of political revolutions on social networks using empirical network data. We propose new techniques for building a hierarchy of simplified ordinary differential equation (ODE) based models that aim to capture essential features of the ABM, including effects of the actual networks, and give insight in the parameter regime transition… ▽ More We study a linear threshold agent-based model (ABM) for the spread of political revolutions on social networks using empirical network data. We propose new techniques for building a hierarchy of simplified ordinary differential equation (ODE) based models that aim to capture essential features of the ABM, including effects of the actual networks, and give insight in the parameter regime transitions of the ABM. We relate the ABM and the hierarchy of models to a population-level compartmental ODE model that we proposed previously for the spread of political revolutions [1], which is shown to be mathematically consistent with the proposed ABM and provides a way to analyze the global behaviour of the ABM. This consistency with the linear threshold ABM also provides further justification a posteriori for the compartmental model of [1]. Extending concepts from epidemiological modelling, we define a basic reproduction number $R_0$ for the linear threshold ABM and apply it to predict ABM behaviour on empirical networks. In small-scale numerical tests we investigate experimentally the differences in spreading behaviour that occur under the linear threshold ABM model when applied to some empirical online and offline social networks, searching for quantitative evidence that political revolutions may be facilitated by the modern online social networks of social media. △ Less

Submitted 16 January, 2015; originally announced January 2015.

MSC Class: 91D10; 91D30; 70G60; 37M99

arXiv:1407.2188 [pdf, other]

doi 10.1186/s12889-015-2576-6

The influence of societal individualism on a century of tobacco use: modelling the prevalence of smoking

Authors: John C. Lang, Daniel M. Abrams, Hans De Sterck

Abstract: Smoking of tobacco is predicted to cause approximately six million deaths worldwide in 2014. Responding effectively to this epidemic requires a thorough understanding of how smoking behaviour is transmitted and modified. Here, we present a new mathematical model of the social dynamics that cause cigarette smoking to spread in a population. Our model predicts that more individualistic societies wil… ▽ More Smoking of tobacco is predicted to cause approximately six million deaths worldwide in 2014. Responding effectively to this epidemic requires a thorough understanding of how smoking behaviour is transmitted and modified. Here, we present a new mathematical model of the social dynamics that cause cigarette smoking to spread in a population. Our model predicts that more individualistic societies will show faster adoption and cessation of smoking. Evidence from a new century-long composite data set on smoking prevalence in 25 countries supports the model, with direct implications for public health interventions around the world. Our results suggest that differences in culture between societies can measurably affect the temporal dynamics of a social spreading process, and that these effects can be understood via a quantitative mathematical model matched to observations. △ Less

Submitted 8 July, 2014; originally announced July 2014.

MSC Class: 91D10 (Primary)

Journal ref: BMC Public Health 15 (1280), 1-13 (2015)

Showing 1–50 of 50 results for author: Lang, C