-
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models
Authors:
Boheng Sheng,
Jiacheng Yao,
Meicong Zhang,
Guoxiu He
Abstract:
Large language models (LLMs) often struggle to accurately read and comprehend extremely long texts. Current methods for improvement typically rely on splitting long contexts into fixed-length chunks. However, fixed truncation risks separating semantically relevant content, leading to ambiguity and compromising accurate understanding. To overcome this limitation, we propose a straightforward approa…
▽ More
Large language models (LLMs) often struggle to accurately read and comprehend extremely long texts. Current methods for improvement typically rely on splitting long contexts into fixed-length chunks. However, fixed truncation risks separating semantically relevant content, leading to ambiguity and compromising accurate understanding. To overcome this limitation, we propose a straightforward approach for dynamically separating and selecting chunks of long context, facilitating a more streamlined input for LLMs. In particular, we compute semantic similarities between adjacent sentences, using lower similarities to adaptively divide long contexts into variable-length chunks. We further train a question-aware classifier to select sensitive chunks that are critical for answering specific questions. Experimental results on both single-hop and multi-hop question-answering benchmarks show that the proposed approach consistently outperforms strong baselines. Notably, it maintains robustness across a wide range of input lengths, handling sequences of up to 256k tokens. Our datasets and code are available at the following link: https://github.com/ECNU-Text-Computing/DCS
△ Less
Submitted 3 June, 2025; v1 submitted 31 May, 2025;
originally announced June 2025.
-
3DPX: Single Panoramic X-ray Analysis Guided by 3D Oral Structure Reconstruction
Authors:
Xiaoshuang Li,
Zimo Huang,
Mingyuan Meng,
Eduardo Delamare,
Dagan Feng,
Lei Bi,
Bin Sheng,
Lingyong Jiang,
Bo Li,
Jinman Kim
Abstract:
Panoramic X-ray (PX) is a prevalent modality in dentistry practice owing to its wide availability and low cost. However, as a 2D projection of a 3D structure, PX suffers from anatomical information loss and PX diagnosis is limited compared to that with 3D imaging modalities. 2D-to-3D reconstruction methods have been explored for the ability to synthesize the absent 3D anatomical information from 2…
▽ More
Panoramic X-ray (PX) is a prevalent modality in dentistry practice owing to its wide availability and low cost. However, as a 2D projection of a 3D structure, PX suffers from anatomical information loss and PX diagnosis is limited compared to that with 3D imaging modalities. 2D-to-3D reconstruction methods have been explored for the ability to synthesize the absent 3D anatomical information from 2D PX for use in PX image analysis. However, there are challenges in leveraging such 3D synthesized reconstructions. First, inferring 3D depth from 2D images remains a challenging task with limited accuracy. The second challenge is the joint analysis of 2D PX with its 3D synthesized counterpart, with the aim to maximize the 2D-3D synergy while minimizing the errors arising from the synthesized image. In this study, we propose a new method termed 3DPX - PX image analysis guided by 2D-to-3D reconstruction, to overcome these challenges. 3DPX consists of (i) a novel progressive reconstruction network to improve 2D-to-3D reconstruction and, (ii) a contrastive-guided bidirectional multimodality alignment module for 3D-guided 2D PX classification and segmentation tasks. The reconstruction network progressively reconstructs 3D images with knowledge imposed on the intermediate reconstructions at multiple pyramid levels and incorporates Multilayer Perceptrons to improve semantic understanding. The downstream networks leverage the reconstructed images as 3D anatomical guidance to the PX analysis through feature alignment, which increases the 2D-3D synergy with bidirectional feature projection and decease the impact of potential errors with contrastive guidance. Extensive experiments on two oral datasets involving 464 studies demonstrate that 3DPX outperforms the state-of-the-art methods in various tasks including 2D-to-3D reconstruction, PX classification and lesion segmentation.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Fluid Antenna-enabled Integrated Sensing, Communication, and Computing Systems
Authors:
Yiping Zuo,
Jiajia Guo,
Weicong Chen,
Weibei Fan,
Biyun Sheng,
Fu Xiao,
Shi Jin
Abstract:
The current integrated sensing, communication, and computing (ISCC) systems face significant challenges in both efficiency and resource utilization. To tackle these issues, we propose a novel fluid antenna (FA)-enabled ISCC system, specifically designed for vehicular networks. We develop detailed models for the communication and sensing processes to support this architecture. An integrated latency…
▽ More
The current integrated sensing, communication, and computing (ISCC) systems face significant challenges in both efficiency and resource utilization. To tackle these issues, we propose a novel fluid antenna (FA)-enabled ISCC system, specifically designed for vehicular networks. We develop detailed models for the communication and sensing processes to support this architecture. An integrated latency optimization problem is formulated to jointly optimize computing resources, receive combining matrices, and antenna positions. To tackle this complex problem, we decompose it into three sub-problems and analyze each separately. A mixed optimization algorithm is then designed to address the overall problem comprehensively. Numerical results demonstrate the rapid convergence of the proposed algorithm. Compared with baseline schemes, the FA-enabled vehicle ISCC system significantly improves resource utilization and reduces latency for communication, sensing, and computation.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification
Authors:
Junzhou Chen,
Zirui Zhang,
Jing Yu,
Heqiang Huang,
Ronghui Zhang,
Xuemiao Xu,
Bin Sheng,
Hong Yan
Abstract:
Driver distraction remains a leading cause of traffic accidents, posing a critical threat to road safety globally. As intelligent transportation systems evolve, accurate and real-time identification of driver distraction has become essential. However, existing methods struggle to capture both global contextual and fine-grained local features while contending with noisy labels in training datasets.…
▽ More
Driver distraction remains a leading cause of traffic accidents, posing a critical threat to road safety globally. As intelligent transportation systems evolve, accurate and real-time identification of driver distraction has become essential. However, existing methods struggle to capture both global contextual and fine-grained local features while contending with noisy labels in training datasets. To address these challenges, we propose DSDFormer, a novel framework that integrates the strengths of Transformer and Mamba architectures through a Dual State Domain Attention (DSDA) mechanism, enabling a balance between long-range dependencies and detailed feature extraction for robust driver behavior recognition. Additionally, we introduce Temporal Reasoning Confident Learning (TRCL), an unsupervised approach that refines noisy labels by leveraging spatiotemporal correlations in video sequences. Our model achieves state-of-the-art performance on the AUC-V1, AUC-V2, and 100-Driver datasets and demonstrates real-time processing efficiency on the NVIDIA Jetson AGX Orin platform. Extensive experimental results confirm that DSDFormer and TRCL significantly improve both the accuracy and robustness of driver distraction detection, offering a scalable solution to enhance road safety.
△ Less
Submitted 12 September, 2024; v1 submitted 9 September, 2024;
originally announced September 2024.
-
Serp-Mamba: Advancing High-Resolution Retinal Vessel Segmentation with Selective State-Space Model
Authors:
Hongqiu Wang,
Yixian Chen,
Wu Chen,
Huihui Xu,
Haoyu Zhao,
Bin Sheng,
Huazhu Fu,
Guang Yang,
Lei Zhu
Abstract:
Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images capture high-resolution views of the retina with typically 200 spanning degrees. Accurate segmentation of vessels in UWF-SLO images is essential for detecting and diagnosing fundus disease. Recent studies have revealed that the selective State Space Model (SSM) in Mamba performs well in modeling long-range dependencies, which is cruci…
▽ More
Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images capture high-resolution views of the retina with typically 200 spanning degrees. Accurate segmentation of vessels in UWF-SLO images is essential for detecting and diagnosing fundus disease. Recent studies have revealed that the selective State Space Model (SSM) in Mamba performs well in modeling long-range dependencies, which is crucial for capturing the continuity of elongated vessel structures. Inspired by this, we propose the first Serpentine Mamba (Serp-Mamba) network to address this challenging task. Specifically, we recognize the intricate, varied, and delicate nature of the tubular structure of vessels. Furthermore, the high-resolution of UWF-SLO images exacerbates the imbalance between the vessel and background categories. Based on the above observations, we first devise a Serpentine Interwoven Adaptive (SIA) scan mechanism, which scans UWF-SLO images along curved vessel structures in a snake-like crawling manner. This approach, consistent with vascular texture transformations, ensures the effective and continuous capture of curved vascular structure features. Second, we propose an Ambiguity-Driven Dual Recalibration (ADDR) module to address the category imbalance problem intensified by high-resolution images. Our ADDR module delineates pixels by two learnable thresholds and refines ambiguous pixels through a dual-driven strategy, thereby accurately distinguishing vessels and background regions. Experiment results on three datasets demonstrate the superior performance of our Serp-Mamba on high-resolution vessel segmentation. We also conduct a series of ablation studies to verify the impact of our designs. Our code shall be released upon publication of this work.
△ Less
Submitted 18 September, 2024; v1 submitted 6 September, 2024;
originally announced September 2024.
-
3DPX: Progressive 2D-to-3D Oral Image Reconstruction with Hybrid MLP-CNN Networks
Authors:
Xiaoshuang Li,
Mingyuan Meng,
Zimo Huang,
Lei Bi,
Eduardo Delamare,
Dagan Feng,
Bin Sheng,
Jinman Kim
Abstract:
Panoramic X-ray (PX) is a prevalent modality in dental practice for its wide availability and low cost. However, as a 2D projection image, PX does not contain 3D anatomical information, and therefore has limited use in dental applications that can benefit from 3D information, e.g., tooth angular misa-lignment detection and classification. Reconstructing 3D structures directly from 2D PX has recent…
▽ More
Panoramic X-ray (PX) is a prevalent modality in dental practice for its wide availability and low cost. However, as a 2D projection image, PX does not contain 3D anatomical information, and therefore has limited use in dental applications that can benefit from 3D information, e.g., tooth angular misa-lignment detection and classification. Reconstructing 3D structures directly from 2D PX has recently been explored to address limitations with existing methods primarily reliant on Convolutional Neural Networks (CNNs) for direct 2D-to-3D mapping. These methods, however, are unable to correctly infer depth-axis spatial information. In addition, they are limited by the in-trinsic locality of convolution operations, as the convolution kernels only capture the information of immediate neighborhood pixels. In this study, we propose a progressive hybrid Multilayer Perceptron (MLP)-CNN pyra-mid network (3DPX) for 2D-to-3D oral PX reconstruction. We introduce a progressive reconstruction strategy, where 3D images are progressively re-constructed in the 3DPX with guidance imposed on the intermediate recon-struction result at each pyramid level. Further, motivated by the recent ad-vancement of MLPs that show promise in capturing fine-grained long-range dependency, our 3DPX integrates MLPs and CNNs to improve the semantic understanding during reconstruction. Extensive experiments on two large datasets involving 464 studies demonstrate that our 3DPX outperforms state-of-the-art 2D-to-3D oral reconstruction methods, including standalone MLP and transformers, in reconstruction quality, and also im-proves the performance of downstream angular misalignment classification tasks.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON
Authors:
John Chong Min Tan,
Prince Saroj,
Bharat Runwal,
Hardik Maheshwari,
Brian Lim Yi Sheng,
Richard Cottrill,
Alankrit Chona,
Ambuj Kumar,
Mehul Motani
Abstract:
TaskGen is an open-sourced agentic framework which uses an Agent to solve an arbitrary task by breaking them down into subtasks. Each subtask is mapped to an Equipped Function or another Agent to execute. In order to reduce verbosity (and hence token usage), TaskGen uses StrictJSON that ensures JSON output from the Large Language Model (LLM), along with additional features such as type checking an…
▽ More
TaskGen is an open-sourced agentic framework which uses an Agent to solve an arbitrary task by breaking them down into subtasks. Each subtask is mapped to an Equipped Function or another Agent to execute. In order to reduce verbosity (and hence token usage), TaskGen uses StrictJSON that ensures JSON output from the Large Language Model (LLM), along with additional features such as type checking and iterative error correction. Key to the philosophy of TaskGen is the management of information/memory on a need-to-know basis. We empirically evaluate TaskGen on various environments such as 40x40 dynamic maze navigation with changing obstacle locations (100% solve rate), TextWorld escape room solving with dense rewards and detailed goals (96% solve rate), web browsing (69% of actions successful), solving the MATH dataset (71% solve rate over 100 Level-5 problems), Retrieval Augmented Generation on NaturalQuestions dataset (F1 score of 47.03%)
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Spider: A Unified Framework for Context-dependent Concept Segmentation
Authors:
Xiaoqi Zhao,
Youwei Pang,
Wei Ji,
Baicheng Sheng,
Jiaming Zuo,
Lihe Zhang,
Huchuan Lu
Abstract:
Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovatio…
▽ More
Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovation. Since there is a strong coupling relationship between foreground and background context in CD tasks, existing methods require to train separate models in their focused domains. This restricts their real-world CD concept understanding towards artificial general intelligence (AGI). We propose a unified model with a single set of parameters, Spider, which only needs to be trained once. With the help of the proposed concept filter driven by the image-mask group prompt, Spider is able to understand and distinguish diverse strong context-dependent concepts to accurately capture the Prompter's intention. Without bells and whistles, Spider significantly outperforms the state-of-the-art specialized models in 8 different context-dependent segmentation tasks, including 4 natural scenes (salient, camouflaged, and transparent objects and shadow) and 4 medical lesions (COVID-19, polyp, breast, and skin lesion with color colonoscopy, CT, ultrasound, and dermoscopy modalities). Besides, Spider shows obvious advantages in continuous learning. It can easily complete the training of new tasks by fine-tuning parameters less than 1\% and bring a tolerable performance degradation of less than 5\% for all old tasks. The source code will be publicly available at \href{https://github.com/Xiaoqi-Zhao-DLUT/Spider-UniCDSeg}{Spider-UniCDSeg}.
△ Less
Submitted 28 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Spatially Correlated RIS-Aided Secure Massive MIMO Under CSI and Hardware Imperfections
Authors:
Dan Yang,
Jindan Xu,
Wei Xu,
Bin Sheng,
Xiaohu You,
Chau Yuen,
Marco Di Renzo
Abstract:
This paper investigates the integration of a reconfigurable intelligent surface (RIS) into a secure multiuser massive multiple-input multiple-output (MIMO) system in the presence of transceiver hardware impairments (HWI), imperfect channel state information (CSI), and spatially correlated channels. We first introduce a linear minimum-mean-square error estimation algorithm for the aggregate channel…
▽ More
This paper investigates the integration of a reconfigurable intelligent surface (RIS) into a secure multiuser massive multiple-input multiple-output (MIMO) system in the presence of transceiver hardware impairments (HWI), imperfect channel state information (CSI), and spatially correlated channels. We first introduce a linear minimum-mean-square error estimation algorithm for the aggregate channel by considering the impact of transceiver HWI and RIS phase-shift errors. Then, we derive a lower bound for the achievable ergodic secrecy rate in the presence of a multi-antenna eavesdropper when artificial noise (AN) is employed at the base station (BS). In addition, the obtained expressions of the ergodic secrecy rate are further simplified in some noteworthy special cases to obtain valuable insights. To counteract the effects of HWI, we present a power allocation optimization strategy between the confidential signals and AN, which admits a fixed-point equation solution. Our analysis reveals that a non-zero ergodic secrecy rate is preserved if the total transmit power decreases no faster than $1/N$, where $N$ is the number of RIS elements. Moreover, the ergodic secrecy rate grows logarithmically with the number of BS antennas $M$ and approaches a certain limit in the asymptotic regime $N\rightarrow\infty$. Simulation results are provided to verify the derived analytical results. They reveal the impact of key design parameters on the secrecy rate. It is shown that, with the proposed power allocation strategy, the secrecy rate loss due to HWI can be counteracted by increasing the number of low-cost RIS elements.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Fluid Antenna for Mobile Edge Computing
Authors:
Yiping Zuo,
Jiajia Guo,
Biyun Sheng,
Chen Dai,
Fu Xiao,
Shi Jin
Abstract:
In the evolving environment of mobile edge computing (MEC), optimizing system performance to meet the growing demand for low-latency computing services is a top priority. Integrating fluidic antenna (FA) technology into MEC networks provides a new approach to address this challenge. This letter proposes an FA-enabled MEC scheme that aims to minimize the total system delay by leveraging the mobilit…
▽ More
In the evolving environment of mobile edge computing (MEC), optimizing system performance to meet the growing demand for low-latency computing services is a top priority. Integrating fluidic antenna (FA) technology into MEC networks provides a new approach to address this challenge. This letter proposes an FA-enabled MEC scheme that aims to minimize the total system delay by leveraging the mobility of FA to enhance channel conditions and improve computational offloading efficiency. By establishing an optimization problem focusing on the joint optimization of computation offloading and antenna positioning, we introduce an alternating iterative algorithm based on the interior point method and particle swarm optimization (IPPSO). Numerical results demonstrate the advantages of our proposed scheme compared to traditional fixed antenna positions, showing significant improvements in transmission rates and reductions in delays. The proposed IPPSO algorithm exhibits robust convergence properties, further validating the effectiveness of our method.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Scene-aware Human Pose Generation using Transformer
Authors:
Jieteng Yao,
Junjie Chen,
Li Niu,
Bin Sheng
Abstract:
Affordance learning considers the interaction opportunities for an actor in the scene and thus has wide application in scene understanding and intelligent robotics. In this paper, we focus on contextual affordance learning, i.e., using affordance as context to generate a reasonable human pose in a scene. Existing scene-aware human pose generation methods could be divided into two categories depend…
▽ More
Affordance learning considers the interaction opportunities for an actor in the scene and thus has wide application in scene understanding and intelligent robotics. In this paper, we focus on contextual affordance learning, i.e., using affordance as context to generate a reasonable human pose in a scene. Existing scene-aware human pose generation methods could be divided into two categories depending on whether using pose templates. Our proposed method belongs to the template-based category, which benefits from the representative pose templates. Moreover, inspired by recent transformer-based methods, we associate each query embedding with a pose template, and use the interaction between query embeddings and scene feature map to effectively predict the scale and offsets for each pose template. In addition, we employ knowledge distillation to facilitate the offset learning given the predicted scale. Comprehensive experiments on Sitcom dataset demonstrate the effectiveness of our method.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Deep Dynamic Epidemiological Modelling for COVID-19 Forecasting in Multi-level Districts
Authors:
Ruhan Liu,
Jiajia Li,
Yang Wen,
Huating Li,
Ping Zhang,
Bin Sheng,
David Dagan Feng
Abstract:
Objective: COVID-19 has spread worldwide and made a huge influence across the world. Modeling the infectious spread situation of COVID-19 is essential to understand the current condition and to formulate intervention measurements. Epidemiological equations based on the SEIR model simulate disease development. The traditional parameter estimation method to solve SEIR equations could not precisely f…
▽ More
Objective: COVID-19 has spread worldwide and made a huge influence across the world. Modeling the infectious spread situation of COVID-19 is essential to understand the current condition and to formulate intervention measurements. Epidemiological equations based on the SEIR model simulate disease development. The traditional parameter estimation method to solve SEIR equations could not precisely fit real-world data due to different situations, such as social distancing policies and intervention strategies. Additionally, learning-based models achieve outstanding fitting performance, but cannot visualize mechanisms. Methods: Thus, we propose a deep dynamic epidemiological (DDE) method that combines epidemiological equations and deep-learning advantages to obtain high accuracy and visualization. The DDE contains deep networks to fit the effect function to simulate the ever-changing situations based on the neural ODE method in solving variants' equations, ensuring the fitting performance of multi-level areas. Results: We introduce four SEIR variants to fit different situations in different countries and regions. We compare our DDE method with traditional parameter estimation methods (Nelder-Mead, BFGS, Powell, Truncated Newton Conjugate-Gradient, Neural ODE) in fitting the real-world data in the cases of countries (the USA, Columbia, South Africa) and regions (Wuhan in China, Piedmont in Italy). Our DDE method achieves the best Mean Square Error and Pearson coefficient in all five areas. Further, compared with the state-of-art learning-based approaches, the DDE outperforms all techniques, including LSTM, RNN, GRU, Random Forest, Extremely Random Trees, and Decision Tree. Conclusion: DDE presents outstanding predictive ability and visualized display of the changes in infection rates in different regions and countries.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
TransMRSR: Transformer-based Self-Distilled Generative Prior for Brain MRI Super-Resolution
Authors:
Shan Huang,
Xiaohong Liu,
Tao Tan,
Menghan Hu,
Xiaoer Wei,
Tingli Chen,
Bin Sheng
Abstract:
Magnetic resonance images (MRI) acquired with low through-plane resolution compromise time and cost. The poor resolution in one orientation is insufficient to meet the requirement of high resolution for early diagnosis of brain disease and morphometric study. The common Single image super-resolution (SISR) solutions face two main challenges: (1) local detailed and global anatomical structural info…
▽ More
Magnetic resonance images (MRI) acquired with low through-plane resolution compromise time and cost. The poor resolution in one orientation is insufficient to meet the requirement of high resolution for early diagnosis of brain disease and morphometric study. The common Single image super-resolution (SISR) solutions face two main challenges: (1) local detailed and global anatomical structural information combination; and (2) large-scale restoration when applied for reconstructing thick-slice MRI into high-resolution (HR) iso-tropic data. To address these problems, we propose a novel two-stage network for brain MRI SR named TransMRSR based on the convolutional blocks to extract local information and transformer blocks to capture long-range dependencies. TransMRSR consists of three modules: the shallow local feature extraction, the deep non-local feature capture, and the HR image reconstruction. We perform a generative task to encapsulate diverse priors into a generative network (GAN), which is the decoder sub-module of the deep non-local feature capture part, in the first stage. The pre-trained GAN is used for the second stage of SR task. We further eliminate the potential latent space shift caused by the two-stage training strategy through the self-distilled truncation trick. The extensive experiments show that our method achieves superior performance to other SSIR methods on both public and private datasets. Code is released at https://github.com/goddesshs/TransMRSR.git .
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images
Authors:
Bo Qian,
Hao Chen,
Xiangning Wang,
Haoxuan Che,
Gitaek Kwon,
Jaeyoung Kim,
Sungjin Choi,
Seoyoung Shin,
Felix Krause,
Markus Unterdechler,
Junlin Hou,
Rui Feng,
Yihao Li,
Mostafa El Habib Daho,
Qiang Wu,
Ping Zhang,
Xiaokang Yang,
Yiyu Cai,
Weiping Jia,
Huating Li,
Bin Sheng
Abstract:
Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s…
▽ More
Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and scientific benchmarking for diabetic retinopathy analysis using UW-OCTA images, we organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading. The scientific community responded positively to the challenge, with 11, 12, and 13 teams from geographically diverse institutes submitting different solutions in these three tasks, respectively. This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge. The obtained results from top algorithms indicate the importance of data augmentation, model architecture and ensemble of networks in improving the performance of deep learning models. These findings have the potential to enable new developments in diabetic retinopathy analysis. The challenge remains open for post-challenge registrations and submissions for benchmarking future methodology developments.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Secure Communication for Spatially Correlated Massive MIMO with Low-Resolution DACs
Authors:
Dan Yang,
Jindan Xu,
Wei Xu,
Ning Wang,
Bin Sheng,
A. Lee Swindlehurst
Abstract:
In this paper, the performance of a secure massive multiple-input multiple-output (MIMO) system adopting low-resolution digital-to-analog converters (DACs) is analyzed over spatially correlated wireless channels. A tight lower bound for the achievable secrecy rate is derived with artificial noise (AN) transmitted in the null space of the user channels. Using the analytical results, the impact of s…
▽ More
In this paper, the performance of a secure massive multiple-input multiple-output (MIMO) system adopting low-resolution digital-to-analog converters (DACs) is analyzed over spatially correlated wireless channels. A tight lower bound for the achievable secrecy rate is derived with artificial noise (AN) transmitted in the null space of the user channels. Using the analytical results, the impact of spatial correlation on the secrecy rate is explicitly evaluated in the presence of low-resolution DACs. The analytical observations reveal that using low-resolution DACs can be beneficial to the secrecy performance compared with ideal DACs, when the channels are strongly correlated and optimal power allocation is not employed.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Closed-form Approximation for Performance Bound of Finite Blocklength Massive MIMO Transmission
Authors:
Xiaohu You,
Bin Sheng,
Yongming Huang,
Wei Xu,
Chuan Zhang,
Dongming Wang,
Pengcheng Zhu,
Chen Ji
Abstract:
Ultra-reliable low latency communications (uRLLC) is adopted in the fifth generation (5G) mobile networks to better support mission-critical applications that demand high level of reliability and low latency. With the aid of well-established multiple-input multiple-output (MIMO) information theory, uRLLC in the future 6G is expected to provide enhanced capability towards extreme connectivity. Sinc…
▽ More
Ultra-reliable low latency communications (uRLLC) is adopted in the fifth generation (5G) mobile networks to better support mission-critical applications that demand high level of reliability and low latency. With the aid of well-established multiple-input multiple-output (MIMO) information theory, uRLLC in the future 6G is expected to provide enhanced capability towards extreme connectivity. Since the latency constraint can be represented equivalently by blocklength, channel coding theory at finite block-length plays an important role in the theoretic analysis of uRLLC. On the basis of Polyanskiy's and Yang's asymptotic results, we first derive the exact close-form expressions for the expectation and variance of channel dispersion. Then, the bound of average maximal achievable rate is given for massive MIMO systems in ideal independent and identically distributed fading channels. This is the study to reveal the underlying connections among the fundamental parameters in MIMO transmissions in a concise and complete close-form formula. Most importantly, the inversely proportional law observed therein implies that the latency can be further reduced at expense of spatial degrees of freedom.
△ Less
Submitted 19 July, 2023; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Solving Routing Problems via Important Cuts
Authors:
Bin Sheng,
Gregory Gutin
Abstract:
We introduce a novel approach of using important cuts which allowed us to design significantly faster fixed-parameter tractable (FPT) algorithms for the following routing problems: the Mixed Chinese Postman Problem parameterized by the number of directed edges (Gutin et al., JCSS 2017), the Minimum Shared Edges problem (MSE) parameterized by the number p of paths between two specified vertices (Fl…
▽ More
We introduce a novel approach of using important cuts which allowed us to design significantly faster fixed-parameter tractable (FPT) algorithms for the following routing problems: the Mixed Chinese Postman Problem parameterized by the number of directed edges (Gutin et al., JCSS 2017), the Minimum Shared Edges problem (MSE) parameterized by the number p of paths between two specified vertices (Fluschnik et al., JCSS 2019), and the Weighted Min Cut Prevention problem (Gruttemeier et al., WG 2021). The Minimum Vulnerability problem (MV) is a generalization of MSE (Assadi et al., Algorithmica 2014). The only known FPT algorithm for MV parameterized by p (the same parameter as for MSE) was for chordal graphs (Aoki et al., JCO 2018). We design an FPT algorithm for MV on all undirected graphs.
△ Less
Submitted 8 June, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Spatiotemporal 2-D Channel Coding for Very Low Latency Reliable MIMO Transmission
Authors:
Xiaohu You,
Chuan Zhang,
Bin Sheng,
Yongming Huang,
Chen Ji,
Yifei Shen,
Wenyue Zhou,
Jian Liu
Abstract:
To fully support vertical industries, 5G and its corresponding channel coding are expected to meet requirements of different applications. However, for applications of 5G and beyond 5G (B5G) such as URLLC, the transmission latency is required to be much shorter than that in eMBB. Therefore, the resulting channel code length reduces drastically. In this case, the traditional 1-D channel coding suff…
▽ More
To fully support vertical industries, 5G and its corresponding channel coding are expected to meet requirements of different applications. However, for applications of 5G and beyond 5G (B5G) such as URLLC, the transmission latency is required to be much shorter than that in eMBB. Therefore, the resulting channel code length reduces drastically. In this case, the traditional 1-D channel coding suffers a lot from the performance degradation and fails to deliver strong reliability with very low latency. To remove this bottleneck, new channel coding scheme beyond the existing 1-D one is in urgent need. By making full use of the spacial freedom of massive MIMO systems, this paper devotes itself in proposing a spatiotemporal 2-D channel coding for very low latency reliable transmission. For a very short time-domain code length $N^{\text{time}}=16$, $64 \times 128$ MIMO system employing the proposed spatiotemporal 2-D coding scheme successfully shows more than $3$\,dB performance gain at $\text{FER}=10^{-3}$, compared to the 1-D time-domain channel coding. It is noted that the proposed coding scheme is suitable for different channel codes and enjoys high flexibility to adapt to difference scenarios. By appropriately selecting the code rate, code length, and the number of codewords in the time and space domains, the proposed coding scheme can achieve a good trade-off between the transmission latency and reliability.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Input-Specific Robustness Certification for Randomized Smoothing
Authors:
Ruoxin Chen,
Jie Li,
Junchi Yan,
Ping Li,
Bin Sheng
Abstract:
Although randomized smoothing has demonstrated high certified robustness and superior scalability to other certified defenses, the high computational overhead of the robustness certification bottlenecks the practical applicability, as it depends heavily on the large sample approximation for estimating the confidence interval. In existing works, the sample size for the confidence interval is univer…
▽ More
Although randomized smoothing has demonstrated high certified robustness and superior scalability to other certified defenses, the high computational overhead of the robustness certification bottlenecks the practical applicability, as it depends heavily on the large sample approximation for estimating the confidence interval. In existing works, the sample size for the confidence interval is universally set and agnostic to the input for prediction. This Input-Agnostic Sampling (IAS) scheme may yield a poor Average Certified Radius (ACR)-runtime trade-off which calls for improvement. In this paper, we propose Input-Specific Sampling (ISS) acceleration to achieve the cost-effectiveness for robustness certification, in an adaptive way of reducing the sampling size based on the input characteristic. Furthermore, our method universally controls the certified radius decline from the ISS sample size reduction. The empirical results on CIFAR-10 and ImageNet show that ISS can speed up the certification by more than three times at a limited cost of 0.05 certified radius. Meanwhile, ISS surpasses IAS on the average certified radius across the extensive hyperparameter settings. Specifically, ISS achieves ACR=0.958 on ImageNet ($σ=1.0$) in 250 minutes, compared to ACR=0.917 by IAS under the same condition. We release our code in \url{https://github.com/roy-ch/Input-Specific-Certification}.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Solving the r-pseudoforest Deletion Problem in Time Independent of r
Authors:
Bin Sheng
Abstract:
The feedback vertex set problem is one of the most studied parameterized problems. Several generalizations of the problem have been studied where one is to delete vertices to obtain graphs close to acyclic. In this paper, we give an FPT algorithm for the problem of deleting at most $k$ vertices to get an $r$-pseudoforest. A graph is an $r$-pseudoforest if we can delete at most $r$ edges from each…
▽ More
The feedback vertex set problem is one of the most studied parameterized problems. Several generalizations of the problem have been studied where one is to delete vertices to obtain graphs close to acyclic. In this paper, we give an FPT algorithm for the problem of deleting at most $k$ vertices to get an $r$-pseudoforest. A graph is an $r$-pseudoforest if we can delete at most $r$ edges from each component to get a forest. Philip et al. introduced this problem and gave an $O^*(c_{r}^{k})$ algorithm for it, where $c_r$ depends on $r$ double exponentially. In comparison, our algorithm runs in time $O^*((10k)^{k})$, independent of $r$.
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
A Framework of Randomized Selection Based Certified Defenses Against Data Poisoning Attacks
Authors:
Ruoxin Chen,
Jie Li,
Chentao Wu,
Bin Sheng,
Ping Li
Abstract:
Neural network classifiers are vulnerable to data poisoning attacks, as attackers can degrade or even manipulate their predictions thorough poisoning only a few training samples. However, the robustness of heuristic defenses is hard to measure. Random selection based defenses can achieve certified robustness by averaging the classifiers' predictions on the sub-datasets sampled from the training se…
▽ More
Neural network classifiers are vulnerable to data poisoning attacks, as attackers can degrade or even manipulate their predictions thorough poisoning only a few training samples. However, the robustness of heuristic defenses is hard to measure. Random selection based defenses can achieve certified robustness by averaging the classifiers' predictions on the sub-datasets sampled from the training set. This paper proposes a framework of random selection based certified defenses against data poisoning attacks. Specifically, we prove that the random selection schemes that satisfy certain conditions are robust against data poisoning attacks. We also derive the analytical form of the certified radius for the qualified random selection schemes. The certified radius of bagging derived by our framework is tighter than the previous work. Our framework allows users to improve robustness by leveraging prior knowledge about the training set and the poisoning model. Given higher level of prior knowledge, we can achieve higher certified accuracy both theoretically and practically. According to the experiments on three benchmark datasets: MNIST 1/7, MNIST, and CIFAR-10, our method outperforms the state-of-the-art.
△ Less
Submitted 13 October, 2020; v1 submitted 18 September, 2020;
originally announced September 2020.
-
Constrained Multi-shape Evolution for Overlapping Cytoplasm Segmentation
Authors:
Youyi Song,
Lei Zhu,
Baiying Lei,
Bin Sheng,
Qi Dou,
Jing Qin,
Kup-Sze Choi
Abstract:
Segmenting overlapping cytoplasm of cells in cervical smear images is a clinically essential task, for quantitatively measuring cell-level features in order to diagnose cervical cancer. This task, however, remains rather challenging, mainly due to the deficiency of intensity (or color) information in the overlapping region. Although shape prior-based models that compensate intensity deficiency by…
▽ More
Segmenting overlapping cytoplasm of cells in cervical smear images is a clinically essential task, for quantitatively measuring cell-level features in order to diagnose cervical cancer. This task, however, remains rather challenging, mainly due to the deficiency of intensity (or color) information in the overlapping region. Although shape prior-based models that compensate intensity deficiency by introducing prior shape information (shape priors) about cytoplasm are firmly established, they often yield visually implausible results, mainly because they model shape priors only by limited shape hypotheses about cytoplasm, exploit cytoplasm-level shape priors alone, and impose no shape constraint on the resulting shape of the cytoplasm. In this paper, we present a novel and effective shape prior-based approach, called constrained multi-shape evolution, that segments all overlapping cytoplasms in the clump simultaneously by jointly evolving each cytoplasm's shape guided by the modeled shape priors. We model local shape priors (cytoplasm--level) by an infinitely large shape hypothesis set which contains all possible shapes of the cytoplasm. In the shape evolution, we compensate intensity deficiency for the segmentation by introducing not only the modeled local shape priors but also global shape priors (clump--level) modeled by considering mutual shape constraints of cytoplasms in the clump. We also constrain the resulting shape in each evolution to be in the built shape hypothesis set, for further reducing implausible segmentation results. We evaluated the proposed method in two typical cervical smear datasets, and the extensive experimental results show that the proposed method is effective to segment overlapping cytoplasm, consistently outperforming the state-of-the-art methods.
△ Less
Submitted 15 April, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.
-
FPT algorithms for generalized feedback vertex set problems
Authors:
Bin Sheng
Abstract:
An r-pseudoforest is a graph in which each component can be made into a forest by deleting at most r edges, and a d-quasi-forest is a graph in which each component can be made into a forest by deleting at most d vertices. In this paper, we study the parameterized tractability of deleting minimum number of vertices to obtain r-pseudoforest and d-quasi-forest, generalizing the well studied feedback…
▽ More
An r-pseudoforest is a graph in which each component can be made into a forest by deleting at most r edges, and a d-quasi-forest is a graph in which each component can be made into a forest by deleting at most d vertices. In this paper, we study the parameterized tractability of deleting minimum number of vertices to obtain r-pseudoforest and d-quasi-forest, generalizing the well studied feedback vertex set problem. We first provide improved FPT algorithm and kernelization results for the r-pseudoforest deletion problem and then we show that the d-quasi-forest deletion problem is also FPT.
△ Less
Submitted 14 December, 2019;
originally announced December 2019.
-
How Sequence-to-Sequence Models Perceive Language Styles?
Authors:
Ruozi Huang,
Mi Zhang,
Xudong Pan,
Beina Sheng
Abstract:
Style is ubiquitous in our daily language uses, while what is language style to learning machines? In this paper, by exploiting the second-order statistics of semantic vectors of different corpora, we present a novel perspective on this question via style matrix, i.e. the covariance matrix of semantic vectors, and explain for the first time how Sequence-to-Sequence models encode style information…
▽ More
Style is ubiquitous in our daily language uses, while what is language style to learning machines? In this paper, by exploiting the second-order statistics of semantic vectors of different corpora, we present a novel perspective on this question via style matrix, i.e. the covariance matrix of semantic vectors, and explain for the first time how Sequence-to-Sequence models encode style information innately in its semantic vectors. As an application, we devise a learning-free text style transfer algorithm, which explicitly constructs a pair of transfer operators from the style matrices for style transfer. Moreover, our algorithm is also observed to be flexible enough to transfer out-of-domain sentences. Extensive experimental evidence justifies the informativeness of style matrix and the competitive performance of our proposed style transfer algorithm with the state-of-the-art methods.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Dynamic Region Division for Adaptive Learning Pedestrian Counting
Authors:
Gaoqi He,
Zhenwei Ma,
Binhao Huang,
Bin Sheng,
Yubo Yuan
Abstract:
Accurate pedestrian counting algorithm is critical to eliminate insecurity in the congested public scenes. However, counting pedestrians in crowded scenes often suffer from severe perspective distortion. In this paper, basing on the straight-line double region pedestrian counting method, we propose a dynamic region division algorithm to keep the completeness of counting objects. Utilizing the obje…
▽ More
Accurate pedestrian counting algorithm is critical to eliminate insecurity in the congested public scenes. However, counting pedestrians in crowded scenes often suffer from severe perspective distortion. In this paper, basing on the straight-line double region pedestrian counting method, we propose a dynamic region division algorithm to keep the completeness of counting objects. Utilizing the object bounding boxes obtained by YoloV3 and expectation division line of the scene, the boundary for nearby region and distant one is generated under the premise of retaining whole head. Ulteriorly, appropriate learning models are applied to count pedestrians in each obtained region. In the distant region, a novel inception dilated convolutional neural network is proposed to solve the problem of choosing dilation rate. In the nearby region, YoloV3 is used for detecting the pedestrian in multi-scale. Accordingly, the total number of pedestrians in each frame is obtained by fusing the result in nearby and distant regions. A typical subway pedestrian video dataset is chosen to conduct experiment in this paper. The result demonstrate that proposed algorithm is superior to existing machine learning based methods in general performance.
△ Less
Submitted 11 August, 2019;
originally announced August 2019.
-
Rethinking Uplink Hybrid Processing: When is Pure Analog Processing Suggested?
Authors:
Jingbo Du,
Wei Xu,
Bin Sheng,
Chunming Zhao
Abstract:
In this correspondence, we analytically characterize the benefit of digital processing in uplink massive multiple-input multiple-output (MIMO) with sub-connected hybrid architecture. By assuming that the number of radio frequency (RF) chains is equal to that of users, we characterize achievable rates of both pure analog detection and hybrid detection under the i.i.d. Rayleigh fading channel model.…
▽ More
In this correspondence, we analytically characterize the benefit of digital processing in uplink massive multiple-input multiple-output (MIMO) with sub-connected hybrid architecture. By assuming that the number of radio frequency (RF) chains is equal to that of users, we characterize achievable rates of both pure analog detection and hybrid detection under the i.i.d. Rayleigh fading channel model. From the derived expressions, we discover that the analog processing can outperform the hybrid processing using the maximal ratio combining (MRC) or zero-forcing (ZF) criterion in cases under some engineering assumptions. Performance comparison of the schemes are presented under tests with various numbers of users and numbers of antennas at the base station.
△ Less
Submitted 28 March, 2019; v1 submitted 16 March, 2019;
originally announced March 2019.
-
Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks
Authors:
Zhengyu Yang,
Danlin Jia,
Stratis Ioannidis,
Ningfang Mi,
Bo Sheng
Abstract:
In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. Data-parallel computing frameworks, such as Apache Spark, are widely used to perform such data processing at scale. Specifically, Spark leverages distributed memory to cache the intermediate results, represented as Resilient Distributed Datasets (RDDs).…
▽ More
In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. Data-parallel computing frameworks, such as Apache Spark, are widely used to perform such data processing at scale. Specifically, Spark leverages distributed memory to cache the intermediate results, represented as Resilient Distributed Datasets (RDDs). This gives Spark an advantage over other parallel frameworks for implementations of iterative machine learning and data mining algorithms, by avoiding repeated computation or hard disk accesses to retrieve RDDs. By default, caching decisions are left at the programmer's discretion, and the LRU policy is used for evicting RDDs when the cache is full. However, when the objective is to minimize total work, LRU is woefully inadequate, leading to arbitrarily suboptimal caching decisions. In this paper, we design an algorithm for multi-stage big data processing platforms to adaptively determine and cache the most valuable intermediate datasets that can be reused in the future. Our solution automates the decision of which RDDs to cache: this amounts to identifying nodes in a direct acyclic graph (DAG) representing computations whose outputs should persist in the memory. Our experiment results show that our proposed cache optimization solution can improve the performance of machine learning applications on Spark decreasing the total work to recompute RDDs by 12%.
△ Less
Submitted 7 May, 2018; v1 submitted 27 April, 2018;
originally announced April 2018.
-
A Game-theoretic Framework for Revenue Sharing in Edge-Cloud Computing System
Authors:
Zhi Cao,
Honggang Zhang,
Benyuan Liu,
Bo Sheng
Abstract:
We introduce a game-theoretic framework to ex- plore revenue sharing in an Edge-Cloud computing system, in which computing service providers at the edge of the Internet (edge providers) and computing service providers at the cloud (cloud providers) co-exist and collectively provide computing resources to clients (e.g., end users or applications) at the edge. Different from traditional cloud comput…
▽ More
We introduce a game-theoretic framework to ex- plore revenue sharing in an Edge-Cloud computing system, in which computing service providers at the edge of the Internet (edge providers) and computing service providers at the cloud (cloud providers) co-exist and collectively provide computing resources to clients (e.g., end users or applications) at the edge. Different from traditional cloud computing, the providers in an Edge-Cloud system are independent and self-interested. To achieve high system-level efficiency, the manager of the system adopts a task distribution mechanism to maximize the total revenue received from clients and also adopts a revenue sharing mechanism to split the received revenue among computing servers (and hence service providers). Under those system-level mechanisms, service providers attempt to game with the system in order to maximize their own utilities, by strategically allocating their resources (e.g., computing servers).
Our framework models the competition among the providers in an Edge-Cloud system as a non-cooperative game. Our simulations and experiments on an emulation system have shown the existence of Nash equilibrium in such a game. We find that revenue sharing mechanisms have a significant impact on the system-level efficiency at Nash equilibria, and surprisingly the revenue sharing mechanism based directly on actual contributions can result in significantly worse system efficiency than Shapley value sharing mechanism and Ortmann proportional sharing mechanism. Our framework provides an effective economics approach to understanding and designing efficient Edge-Cloud computing systems.
△ Less
Submitted 4 October, 2018; v1 submitted 27 November, 2017;
originally announced November 2017.
-
The Euler and Chinese Postman Problems on 2-Arc-Colored Digraphs
Authors:
Bin Sheng,
Ruijuan Li,
Gregory Gutin
Abstract:
The famous Chinese Postman Problem (CPP) is polynomial time solvable on both undirected and directed graphs. Gutin et al. [Discrete Applied Math 217 (2016)] generalized these results by proving that CPP on $c$-edge-colored graphs is polynomial time solvable for every $c\geq 2$. In CPP on weighted edge-colored graphs $G$, we wish to find a minimum weight properly colored closed walk containing all…
▽ More
The famous Chinese Postman Problem (CPP) is polynomial time solvable on both undirected and directed graphs. Gutin et al. [Discrete Applied Math 217 (2016)] generalized these results by proving that CPP on $c$-edge-colored graphs is polynomial time solvable for every $c\geq 2$. In CPP on weighted edge-colored graphs $G$, we wish to find a minimum weight properly colored closed walk containing all edges of $G$ (a walk is properly colored if every two consecutive edges are of different color, including the last and first edges in a closed walk). In this paper, we consider CPP on arc-colored digraphs (for properly colored closed directed walks), and provide a polynomial-time algorithm for the problem on weighted 2-arc-colored digraphs. This is a somewhat surprising result since it is NP-complete to decide whether a 2-arc-colored digraph has a properly colored directed cycle [Gutin et al., Discrete Math 191 (1998)]. To obtain the polynomial-time algorithm, we characterize 2-arc-colored digraphs containing properly colored Euler trails.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.
-
An improved kernel for the cycle contraction problem
Authors:
Bin Sheng,
Yuefang Sun
Abstract:
The problem of modifying a given graph to satisfy certain properties has been one of the central topics in parameterized tractability study. In this paper, we study the cycle contraction problem, which makes a graph into a cycle by edge contractions. The problem has been studied {by Belmonte et al. [IPEC 2013]} who obtained a linear kernel with at most $6k+6$ vertices. We provide an improved kerne…
▽ More
The problem of modifying a given graph to satisfy certain properties has been one of the central topics in parameterized tractability study. In this paper, we study the cycle contraction problem, which makes a graph into a cycle by edge contractions. The problem has been studied {by Belmonte et al. [IPEC 2013]} who obtained a linear kernel with at most $6k+6$ vertices. We provide an improved kernel with at most $5k+4$ vertices for it in this paper.
△ Less
Submitted 18 June, 2017;
originally announced June 2017.
-
EDOS: Edge Assisted Offloading System for Mobile Devices
Authors:
Hank H. Harvey,
Ying Mao,
Yantian Hou,
Bo Sheng
Abstract:
Offloading resource-intensive jobs to the cloud and nearby users is a promising approach to enhance mobile devices. This paper investigates a hybrid offloading system that takes both infrastructure-based networks and Ad-hoc networks into the scope. Specifically, we propose EDOS, an edge assisted offloading system that consists of two major components, an Edge Assistant (EA) and Offload Agent (OA).…
▽ More
Offloading resource-intensive jobs to the cloud and nearby users is a promising approach to enhance mobile devices. This paper investigates a hybrid offloading system that takes both infrastructure-based networks and Ad-hoc networks into the scope. Specifically, we propose EDOS, an edge assisted offloading system that consists of two major components, an Edge Assistant (EA) and Offload Agent (OA). EA runs on the routers/towers to manage registered remote cloud servers and local service providers and OA operates on the users' devices to discover the services in proximity. We present the system with a suite of protocols to collect the potential service providers and algorithms to allocate tasks according to user-specified constraints. To evaluate EDOS, we prototype it on commercial mobile devices and evaluate it with both experiments on a small-scale testbed and simulations. The results show that EDOS is effective and efficient for offloading jobs.
△ Less
Submitted 10 December, 2021; v1 submitted 21 May, 2017;
originally announced May 2017.
-
Deep Colorization
Authors:
Zezhou Cheng,
Qingxiong Yang,
Bin Sheng
Abstract:
This paper investigates into the colorization problem which converts a grayscale image to a colorful version. This is a very difficult problem and normally requires manual adjustment to achieve artifact-free quality. For instance, it normally requires human-labelled color scribbles on the grayscale target image or a careful selection of colorful reference images (e.g., capturing the same scene in…
▽ More
This paper investigates into the colorization problem which converts a grayscale image to a colorful version. This is a very difficult problem and normally requires manual adjustment to achieve artifact-free quality. For instance, it normally requires human-labelled color scribbles on the grayscale target image or a careful selection of colorful reference images (e.g., capturing the same scene in the grayscale target image). Unlike the previous methods, this paper aims at a high-quality fully-automatic colorization method. With the assumption of a perfect patch matching technique, the use of an extremely large-scale reference database (that contains sufficient color images) is the most reliable solution to the colorization problem. However, patch matching noise will increase with respect to the size of the reference database in practice. Inspired by the recent success in deep learning techniques which provide amazing modeling of large-scale data, this paper re-formulates the colorization problem so that deep learning techniques can be directly employed. To ensure artifact-free quality, a joint bilateral filtering based post-processing step is proposed. We further develop an adaptive image clustering technique to incorporate the global image information. Numerous experiments demonstrate that our method outperforms the state-of-art algorithms both in terms of quality and speed.
△ Less
Submitted 30 April, 2016;
originally announced May 2016.
-
Crowd Counting via Weighted VLAD on Dense Attribute Feature Maps
Authors:
Biyun Sheng,
Chunhua Shen,
Guosheng Lin,
Jun Li,
Wankou Yang,
Changyin Sun
Abstract:
Crowd counting is an important task in computer vision, which has many applications in video surveillance. Although the regression-based framework has achieved great improvements for crowd counting, how to improve the discriminative power of image representation is still an open problem. Conventional holistic features used in crowd counting often fail to capture semantic attributes and spatial cue…
▽ More
Crowd counting is an important task in computer vision, which has many applications in video surveillance. Although the regression-based framework has achieved great improvements for crowd counting, how to improve the discriminative power of image representation is still an open problem. Conventional holistic features used in crowd counting often fail to capture semantic attributes and spatial cues of the image. In this paper, we propose integrating semantic information into learning locality-aware feature sets for accurate crowd counting. First, with the help of convolutional neural network (CNN), the original pixel space is mapped onto a dense attribute feature map, where each dimension of the pixel-wise feature indicates the probabilistic strength of a certain semantic class. Then, locality-aware features (LAF) built on the idea of spatial pyramids on neighboring patches are proposed to explore more spatial context and local information. Finally, the traditional VLAD encoding method is extended to a more generalized form in which diverse coefficient weights are taken into consideration. Experimental results validate the effectiveness of our presented method.
△ Less
Submitted 28 April, 2016;
originally announced April 2016.
-
Acyclicity in Edge-Colored Graphs
Authors:
Gregory Gutin,
Mark Jones,
Bin Sheng,
Magnus Wahlstrom,
Anders Yeo
Abstract:
A walk $W$ in edge-colored graphs is called properly colored (PC) if every pair of consecutive edges in $W$ is of different color. We introduce and study five types of PC acyclicity in edge-colored graphs such that graphs of PC acyclicity of type $i$ is a proper superset of graphs of acyclicity of type $i+1$, $i=1,2,3,4.$ The first three types are equivalent to the absence of PC cycles, PC trails,…
▽ More
A walk $W$ in edge-colored graphs is called properly colored (PC) if every pair of consecutive edges in $W$ is of different color. We introduce and study five types of PC acyclicity in edge-colored graphs such that graphs of PC acyclicity of type $i$ is a proper superset of graphs of acyclicity of type $i+1$, $i=1,2,3,4.$ The first three types are equivalent to the absence of PC cycles, PC trails, and PC walks, respectively. While graphs of types 1, 2 and 3 can be recognized in polynomial time, the problem of recognizing graphs of type 4 is, somewhat surprisingly, NP-hard even for 2-edge-colored graphs (i.e., when only two colors are used). The same problem with respect to type 5 is polynomial-time solvable for all edge-colored graphs. Using the five types, we investigate the border between intractability and tractability for the problems of finding the maximum number of internally vertex disjoint PC paths between two vertices and the minimum number of vertices to meet all PC paths between two vertices.
△ Less
Submitted 12 September, 2016; v1 submitted 8 January, 2016;
originally announced January 2016.
-
Chinese Postman Problem on Edge-Colored Multigraphs
Authors:
Gregory Gutin,
Mark Jones,
Bin Sheng,
Magnus Wahlström,
Anders Yeo
Abstract:
It is well-known that the Chinese postman problem on undirected and directed graphs is polynomial-time solvable. We extend this result to edge-colored multigraphs. Our result is in sharp contrast to the Chinese postman problem on mixed graphs, i.e., graphs with directed and undirected edges, for which the problem is NP-hard.
It is well-known that the Chinese postman problem on undirected and directed graphs is polynomial-time solvable. We extend this result to edge-colored multigraphs. Our result is in sharp contrast to the Chinese postman problem on mixed graphs, i.e., graphs with directed and undirected edges, for which the problem is NP-hard.
△ Less
Submitted 12 September, 2016; v1 submitted 19 December, 2015;
originally announced December 2015.
-
Linear-Vertex Kernel for the Problem of Packing $r$-Stars into a Graph without Long Induced Paths
Authors:
Florian Barbero,
Gregory Gutin,
Mark Jones,
Bin Sheng,
Anders Yeo
Abstract:
Let integers $r\ge 2$ and $d\ge 3$ be fixed. Let ${\cal G}_d$ be the set of graphs with no induced path on $d$ vertices. We study the problem of packing $k$ vertex-disjoint copies of $K_{1,r}$ ($k\ge 2$) into a graph $G$ from parameterized preprocessing, i.e., kernelization, point of view. We show that every graph $G\in {\cal G}_d$ can be reduced, in polynomial time, to a graph $G'\in {\cal G}_d$…
▽ More
Let integers $r\ge 2$ and $d\ge 3$ be fixed. Let ${\cal G}_d$ be the set of graphs with no induced path on $d$ vertices. We study the problem of packing $k$ vertex-disjoint copies of $K_{1,r}$ ($k\ge 2$) into a graph $G$ from parameterized preprocessing, i.e., kernelization, point of view. We show that every graph $G\in {\cal G}_d$ can be reduced, in polynomial time, to a graph $G'\in {\cal G}_d$ with $O(k)$ vertices such that $G$ has at least $k$ vertex-disjoint copies of $K_{1,r}$ if and only if $G'$ has. Such a result is known for arbitrary graphs $G$ when $r=2$ and we conjecture that it holds for every $r\ge 2$.
△ Less
Submitted 13 October, 2015;
originally announced October 2015.
-
Tree-based Visualization and Optimization for Image Collection
Authors:
Xintong Han,
Chongyang Zhang,
Weiyao Lin,
Mingliang Xu,
Bin Sheng,
Tao Mei
Abstract:
The visualization of an image collection is the process of displaying a collection of images on a screen under some specific layout requirements. This paper focuses on an important problem that is not well addressed by the previous methods: visualizing image collections into arbitrary layout shapes while arranging images according to user-defined semantic or visual correlations (e.g., color or obj…
▽ More
The visualization of an image collection is the process of displaying a collection of images on a screen under some specific layout requirements. This paper focuses on an important problem that is not well addressed by the previous methods: visualizing image collections into arbitrary layout shapes while arranging images according to user-defined semantic or visual correlations (e.g., color or object category). To this end, we first propose a property-based tree construction scheme to organize images of a collection into a tree structure according to user-defined properties. In this way, images can be adaptively placed with the desired semantic or visual correlations in the final visualization layout. Then, we design a two-step visualization optimization scheme to further optimize image layouts. As a result, multiple layout effects including layout shape and image overlap ratio can be effectively controlled to guarantee a satisfactory visualization. Finally, we also propose a tree-transfer scheme such that visualization layouts can be adaptively changed when users select different "images of interest". We demonstrate the effectiveness of our proposed approach through the comparisons with state-of-the-art visualization techniques.
△ Less
Submitted 17 July, 2015;
originally announced July 2015.
-
Facial Expression Cloning with Elastic and Muscle Models
Authors:
Yihao Zhang,
Weiyao Lin,
Bing Zhou,
Zhenzhong Chen,
Bin Sheng,
Jianxin Wu,
Wenjun Zhang
Abstract:
Expression cloning plays an important role in facial expression synthesis. In this paper, a novel algorithm is proposed for facial expression cloning. The proposed algorithm first introduces a new elastic model to balance the global and local warping effects, such that the impacts from facial feature diversity among people can be minimized, and thus more effective geometric warping results can be…
▽ More
Expression cloning plays an important role in facial expression synthesis. In this paper, a novel algorithm is proposed for facial expression cloning. The proposed algorithm first introduces a new elastic model to balance the global and local warping effects, such that the impacts from facial feature diversity among people can be minimized, and thus more effective geometric warping results can be achieved. Furthermore, a muscle-distribution-based (MD) model is proposed, which utilizes the muscle distribution of the human face and results in more accurate facial illumination details. In addition, we also propose a new distance-based metric to automatically select the optimal parameters such that the global and local warping effects in the elastic model can be suitably balanced. Experimental results show that our proposed algorithm outperforms the existing methods.
△ Less
Submitted 28 February, 2015;
originally announced March 2015.
-
A Heat-Map-based Algorithm for Recognizing Group Activities in Videos
Authors:
Weiyao Lin,
Hang Chu,
Jianxin Wu,
Bin Sheng,
Zhenzhong Chen
Abstract:
In this paper, a new heat-map-based (HMB) algorithm is proposed for group activity recognition. The proposed algorithm first models human trajectories as series of "heat sources" and then applies a thermal diffusion process to create a heat map (HM) for representing the group activities. Based on this heat map, a new key-point based (KPB) method is used for handling the alignments among heat maps…
▽ More
In this paper, a new heat-map-based (HMB) algorithm is proposed for group activity recognition. The proposed algorithm first models human trajectories as series of "heat sources" and then applies a thermal diffusion process to create a heat map (HM) for representing the group activities. Based on this heat map, a new key-point based (KPB) method is used for handling the alignments among heat maps with different scales and rotations. And a surface-fitting (SF) method is also proposed for recognizing group activities. Our proposed HM feature can efficiently embed the temporal motion information of the group activities while the proposed KPB and SF methods can effectively utilize the characteristics of the heat map for activity recognition. Experimental results demonstrate the effectiveness of our proposed algorithms.
△ Less
Submitted 21 February, 2015;
originally announced February 2015.
-
A new network-based algorithm for human activity recognition in video
Authors:
Weiyao Lin,
Yuanzhe Chen,
Jianxin Wu,
Hanli Wang,
Bin Sheng,
Hongxiang Li
Abstract:
In this paper, a new network-transmission-based (NTB) algorithm is proposed for human activity recognition in videos. The proposed NTB algorithm models the entire scene as an error-free network. In this network, each node corresponds to a patch of the scene and each edge represents the activity correlation between the corresponding patches. Based on this network, we further model people in the sce…
▽ More
In this paper, a new network-transmission-based (NTB) algorithm is proposed for human activity recognition in videos. The proposed NTB algorithm models the entire scene as an error-free network. In this network, each node corresponds to a patch of the scene and each edge represents the activity correlation between the corresponding patches. Based on this network, we further model people in the scene as packages while human activities can be modeled as the process of package transmission in the network. By analyzing these specific "package transmission" processes, various activities can be effectively detected. The implementation of our NTB algorithm into abnormal activity detection and group activity recognition are described in detail in the paper. Experimental results demonstrate the effectiveness of our proposed algorithm.
△ Less
Submitted 21 February, 2015;
originally announced February 2015.
-
Parameterized and Approximation Algorithms for the Load Coloring Problem
Authors:
F. Barbero,
G. Gutin,
M. Jones,
B. Sheng
Abstract:
Let $c, k$ be two positive integers and let $G=(V,E)$ be a graph. The $(c,k)$-Load Coloring Problem (denoted $(c,k)$-LCP) asks whether there is a $c$-coloring $\varphi: V \rightarrow [c]$ such that for every $i \in [c]$, there are at least $k$ edges with both endvertices colored $i$. Gutin and Jones (IPL 2014) studied this problem with $c=2$. They showed $(2,k)$-LCP to be fixed parameter tractable…
▽ More
Let $c, k$ be two positive integers and let $G=(V,E)$ be a graph. The $(c,k)$-Load Coloring Problem (denoted $(c,k)$-LCP) asks whether there is a $c$-coloring $\varphi: V \rightarrow [c]$ such that for every $i \in [c]$, there are at least $k$ edges with both endvertices colored $i$. Gutin and Jones (IPL 2014) studied this problem with $c=2$. They showed $(2,k)$-LCP to be fixed parameter tractable (FPT) with parameter $k$ by obtaining a kernel with at most $7k$ vertices. In this paper, we extend the study to any fixed $c$ by giving both a linear-vertex and a linear-edge kernel. In the particular case of $c=2$, we obtain a kernel with less than $4k$ vertices and less than $8k$ edges. These results imply that for any fixed $c\ge 2$, $(c,k)$-LCP is FPT and that the optimization version of $(c,k)$-LCP (where $k$ is to be maximized) has an approximation algorithm with a constant ratio for any fixed $c\ge 2$.
△ Less
Submitted 18 December, 2014; v1 submitted 9 December, 2014;
originally announced December 2014.
-
Parameterized Complexity of the $k$-Arc Chinese Postman Problem
Authors:
Gregory Gutin,
Mark Jones,
Bin Sheng
Abstract:
In the Mixed Chinese Postman Problem (MCPP), given an edge-weighted mixed graph $G$ ($G$ may have both edges and arcs), our aim is to find a minimum weight closed walk traversing each edge and arc at least once. The MCPP parameterized by the number of edges was known to be fixed-parameter tractable using a simple argument. Solving an open question of van Bevern et al., we prove that the MCPP param…
▽ More
In the Mixed Chinese Postman Problem (MCPP), given an edge-weighted mixed graph $G$ ($G$ may have both edges and arcs), our aim is to find a minimum weight closed walk traversing each edge and arc at least once. The MCPP parameterized by the number of edges was known to be fixed-parameter tractable using a simple argument. Solving an open question of van Bevern et al., we prove that the MCPP parameterized by the number of arcs is also fixed-parameter tractable. Our proof is more involved and, in particular, uses a well-known result of Marx, O'Sullivan and Razgon (2013) on the treewidth of torso graphs with respect to small separators. We obtain a small cut analog of this result, and use it to construct a tree decomposition which, despite not having bounded width, has other properties allowing us to design a fixed-parameter algorithm.
△ Less
Submitted 12 September, 2016; v1 submitted 6 March, 2014;
originally announced March 2014.
-
Parameterized Directed $k$-Chinese Postman Problem and $k$ Arc-Disjoint Cycles Problem on Euler Digraphs
Authors:
Gregory Gutin,
Mark Jones,
Bin Sheng,
Magnus Wahlstrom
Abstract:
In the Directed $k$-Chinese Postman Problem ($k$-DCPP), we are given a connected weighted digraph $G$ and asked to find $k$ non-empty closed directed walks covering all arcs of $G$ such that the total weight of the walks is minimum. Gutin, Muciaccia and Yeo (Theor. Comput. Sci. 513 (2013) 124--128) asked for the parameterized complexity of $k$-DCPP when $k$ is the parameter. We prove that the $k$-…
▽ More
In the Directed $k$-Chinese Postman Problem ($k$-DCPP), we are given a connected weighted digraph $G$ and asked to find $k$ non-empty closed directed walks covering all arcs of $G$ such that the total weight of the walks is minimum. Gutin, Muciaccia and Yeo (Theor. Comput. Sci. 513 (2013) 124--128) asked for the parameterized complexity of $k$-DCPP when $k$ is the parameter. We prove that the $k$-DCPP is fixed-parameter tractable.
We also consider a related problem of finding $k$ arc-disjoint directed cycles in an Euler digraph, parameterized by $k$. Slivkins (ESA 2003) showed that this problem is W[1]-hard for general digraphs. Generalizing another result by Slivkins, we prove that the problem is fixed-parameter tractable for Euler digraphs. The corresponding problem on vertex-disjoint cycles in Euler digraphs remains W[1]-hard even for Euler digraphs.
△ Less
Submitted 10 February, 2014;
originally announced February 2014.