-
PolyBERT: Fine-Tuned Poly Encoder BERT-Based Model for Word Sense Disambiguation
Authors:
Linhan Xia,
Mingzhan Yang,
Guohui Yuan,
Shengnan Tao,
Yujing Qiu,
Guo Yu,
Kai Lei
Abstract:
Mainstream Word Sense Disambiguation (WSD) approaches have employed BERT to extract semantics from both context and definitions of senses to determine the most suitable sense of a target word, achieving notable performance. However, there are two limitations in these approaches. First, previous studies failed to balance the representation of token-level (local) and sequence-level (global) semantic…
▽ More
Mainstream Word Sense Disambiguation (WSD) approaches have employed BERT to extract semantics from both context and definitions of senses to determine the most suitable sense of a target word, achieving notable performance. However, there are two limitations in these approaches. First, previous studies failed to balance the representation of token-level (local) and sequence-level (global) semantics during feature extraction, leading to insufficient semantic representation and a performance bottleneck. Second, these approaches incorporated all possible senses of each target word during the training phase, leading to unnecessary computational costs. To overcome these limitations, this paper introduces a poly-encoder BERT-based model with batch contrastive learning for WSD, named PolyBERT. Compared with previous WSD methods, PolyBERT has two improvements: (1) A poly-encoder with a multi-head attention mechanism is utilized to fuse token-level (local) and sequence-level (global) semantics, rather than focusing on just one. This approach enriches semantic representation by balancing local and global semantics. (2) To avoid redundant training inputs, Batch Contrastive Learning (BCL) is introduced. BCL utilizes the correct senses of other target words in the same batch as negative samples for the current target word, which reduces training inputs and computational cost. The experimental results demonstrate that PolyBERT outperforms baseline WSD methods such as Huang's GlossBERT and Blevins's BEM by 2\% in F1-score. In addition, PolyBERT with BCL reduces GPU hours by 37.6\% compared with PolyBERT without BCL.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
MLLMs are Deeply Affected by Modality Bias
Authors:
Xu Zheng,
Chenfei Liao,
Yuqian Fu,
Kaiyu Lei,
Yuanhuiyi Lyu,
Lutao Jiang,
Bin Ren,
Jialei Chen,
Jiawen Wang,
Chengxin Li,
Linfeng Zhang,
Danda Pani Paudel,
Xuanjing Huang,
Yu-Gang Jiang,
Nicu Sebe,
Dacheng Tao,
Luc Van Gool,
Xuming Hu
Abstract:
Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images. MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs. This position paper argues that MLLMs are deeply affected by modality bias. Firstly, we diagnose the current state of m…
▽ More
Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images. MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs. This position paper argues that MLLMs are deeply affected by modality bias. Firstly, we diagnose the current state of modality bias, highlighting its manifestations across various tasks. Secondly, we propose a systematic research road-map related to modality bias in MLLMs. Thirdly, we identify key factors of modality bias in MLLMs and offer actionable suggestions for future research to mitigate it. To substantiate these findings, we conduct experiments that demonstrate the influence of each factor: 1. Data Characteristics: Language data is compact and abstract, while visual data is redundant and complex, creating an inherent imbalance in learning dynamics. 2. Imbalanced Backbone Capabilities: The dominance of pretrained language models in MLLMs leads to overreliance on language and neglect of visual information. 3. Training Objectives: Current objectives often fail to promote balanced cross-modal alignment, resulting in shortcut learning biased toward language. These findings highlight the need for balanced training strategies and model architectures to better integrate multiple modalities in MLLMs. We call for interdisciplinary efforts to tackle these challenges and drive innovation in MLLM research. Our work provides a fresh perspective on modality bias in MLLMs and offers insights for developing more robust and generalizable multimodal systems-advancing progress toward Artificial General Intelligence.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Authors:
Zehan Wang,
Ke Lei,
Chen Zhu,
Jiawei Huang,
Sashuai Zhou,
Luping Liu,
Xize Cheng,
Shengpeng Ji,
Zhenhui Ye,
Tao Jin,
Zhou Zhao
Abstract:
Text-to-audio (T2A) generation has achieved remarkable progress in generating a variety of audio outputs from language prompts. However, current state-of-the-art T2A models still struggle to satisfy human preferences for prompt-following and acoustic quality when generating complex multi-event audio. To improve the performance of the model in these high-level applications, we propose to enhance th…
▽ More
Text-to-audio (T2A) generation has achieved remarkable progress in generating a variety of audio outputs from language prompts. However, current state-of-the-art T2A models still struggle to satisfy human preferences for prompt-following and acoustic quality when generating complex multi-event audio. To improve the performance of the model in these high-level applications, we propose to enhance the basic capabilities of the model with AI feedback learning. First, we introduce fine-grained AI audio scoring pipelines to: 1) verify whether each event in the text prompt is present in the audio (Event Occurrence Score), 2) detect deviations in event sequences from the language description (Event Sequence Score), and 3) assess the overall acoustic and harmonic quality of the generated audio (Acoustic&Harmonic Quality). We evaluate these three automatic scoring pipelines and find that they correlate significantly better with human preferences than other evaluation metrics. This highlights their value as both feedback signals and evaluation metrics. Utilizing our robust scoring pipelines, we construct a large audio preference dataset, T2A-FeedBack, which contains 41k prompts and 249k audios, each accompanied by detailed scores. Moreover, we introduce T2A-EpicBench, a benchmark that focuses on long captions, multi-events, and story-telling scenarios, aiming to evaluate the advanced capabilities of T2A models. Finally, we demonstrate how T2A-FeedBack can enhance current state-of-the-art audio model. With simple preference tuning, the audio generation model exhibits significant improvements in both simple (AudioCaps test set) and complex (T2A-EpicBench) scenarios.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Seed1.5-VL Technical Report
Authors:
Dong Guo,
Faming Wu,
Feida Zhu,
Fuxing Leng,
Guang Shi,
Haobin Chen,
Haoqi Fan,
Jian Wang,
Jianyu Jiang,
Jiawei Wang,
Jingji Chen,
Jingjia Huang,
Kang Lei,
Liping Yuan,
Lishu Luo,
Pengfei Liu,
Qinghao Ye,
Rui Qian,
Shen Yan,
Shixiong Zhao,
Shuai Peng,
Shuangye Li,
Sihang Yuan,
Sijin Wu,
Tianheng Cheng
, et al. (172 additional authors not shown)
Abstract:
We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati…
▽ More
We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible at https://www.volcengine.com/ (Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428)
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
Authors:
Chenfei Liao,
Kaiyu Lei,
Xu Zheng,
Junha Moon,
Zhixiong Wang,
Yixuan Wang,
Danda Pani Paudel,
Luc Van Gool,
Xuming Hu
Abstract:
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities. Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality. Robustness has thus become essential for practical MMSS applications. However, the absenc…
▽ More
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities. Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality. Robustness has thus become essential for practical MMSS applications. However, the absence of standardized benchmarks for evaluating robustness hinders further advancement. To address this, we first survey existing MMSS literature and categorize representative methods to provide a structured overview. We then introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM). From a probabilistic standpoint, we model modality failure under two conditions: (1) all damaged combinations are equally probable; (2) each modality fails independently following a Bernoulli distribution. Based on these, we propose four metrics-$mIoU^{Avg}_{EMM}$, $mIoU^{E}_{EMM}$, $mIoU^{Avg}_{RMM}$, and $mIoU^{E}_{RMM}$-to assess model robustness under EMM and RMM. This work provides the first dedicated benchmark for MMSS robustness, offering new insights and tools to advance the field. Source code is available at https://github.com/Chenfei-Liao/Multi-Modal-Semantic-Segmentation-Robustness-Benchmark.
△ Less
Submitted 10 April, 2025; v1 submitted 24 March, 2025;
originally announced March 2025.
-
Pseudorapidity density distributions of charged particles and transverse momentum spectra of identified particles in pp collisions in PACIAE 4.0 model
Authors:
Z. Xie,
A. K. Lei,
H. Zheng,
W. C. Zhang,
D. M. Zhou,
Z. L. She,
Y. L. Yan,
B. H. Sa
Abstract:
The pseudorapidity density distributions of charged particles and the transverse momentum spectra of identified particles in proton-proton (pp) collisions at the center-of-mass energies ranging from $\sqrt{s}=200$ GeV to 13 TeV have been systematically studied using the newly released parton and cascade model PACIAE 4.0 based on PYTHIA 8.3. The available experimental data are well reproduced acros…
▽ More
The pseudorapidity density distributions of charged particles and the transverse momentum spectra of identified particles in proton-proton (pp) collisions at the center-of-mass energies ranging from $\sqrt{s}=200$ GeV to 13 TeV have been systematically studied using the newly released parton and cascade model PACIAE 4.0 based on PYTHIA 8.3. The available experimental data are well reproduced across all analyzed aspects. This theoretical method can be easily extended to anywhere the experimental data for pp collisions are currently unavailable. Furthermore, since pp collisions serve as the baseline for heavy-ion collisions, our results can provide a valuable resource for both experimentalists and theorists.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
On-Chip Vectorial Structured Light Manipulation via Inverse Design
Authors:
Xiaobin Lin,
Maoliang Wei,
Kunhao Lei,
Zijia Wang,
Chi Wang,
Hui Ma,
Yuting Ye,
Qiwei Zhan,
Da Li,
Shixun Dai,
Baile Zhang,
Xiaoyong Hu,
Lan Li,
Erping Li,
Hongtao Lin
Abstract:
On-chip structured light, with potentially infinite complexity, has emerged as a linchpin in the realm of integrated photonics. However, the realization of arbitrarily tailoring a multitude of light field dimensions in complex media remains a challenge1, Through associating physical light fields and mathematical function spaces by introducing a mapping operator, we proposed a data-driven inverse d…
▽ More
On-chip structured light, with potentially infinite complexity, has emerged as a linchpin in the realm of integrated photonics. However, the realization of arbitrarily tailoring a multitude of light field dimensions in complex media remains a challenge1, Through associating physical light fields and mathematical function spaces by introducing a mapping operator, we proposed a data-driven inverse design method to precisely manipulate between any two structured light fields in the on-chip high-dimensional Hilbert space. To illustrate, light field conversion in on-chip topological photonics was achieved. High-performance topological coupling devices with minimal insertion loss and customizable topological routing devices were designed and realized. Our method provides a new paradigm to enable precise manipulation over the on-chip vectorial structured light and paves the way for the realization of complex photonic functions.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Field-free switching of perpendicular magnetization by cooperation of planar Hall and orbital Hall effects
Authors:
Zelalem Abebe Bekele,
Yuan-Yuan Jiang,
Kun Lei,
Xiukai Lan,
Xiangyu Liu,
Hui Wen,
Ding-Fu Shao,
Kaiyou Wang
Abstract:
Spin-orbit torques (SOTs) generated through the conventional spin Hall effect and/or Rashba-Edelstein effect are promising for manipulating magnetization. However, this approach typically exhibits non-deterministic and inefficient behaviour when it comes to switching perpendicular ferromagnets. This limitation posed a challenge for write-in operations in high-density magnetic memory devices. Here,…
▽ More
Spin-orbit torques (SOTs) generated through the conventional spin Hall effect and/or Rashba-Edelstein effect are promising for manipulating magnetization. However, this approach typically exhibits non-deterministic and inefficient behaviour when it comes to switching perpendicular ferromagnets. This limitation posed a challenge for write-in operations in high-density magnetic memory devices. Here, we determine an effective solution to overcome this challenge by simultaneously leveraging both a planar Hall effect (PHE) and an orbital Hall effect (OHE). Using a representative Co/PtGd/Mo trilayer SOT device, we demonstrate that the PHE of Co is enhanced by the interfacial coupling of Co/PtGd, giving rise to a finite out-of-plane damping-like torque within the Co layer. Simultaneously, the OHE in Mo layer induces a strong out-of-plane orbital current, significantly amplifying the in-plane damping-like torque through orbital-to-spin conversion. While either the PHE or OHE alone proves insufficient for reversing the perpendicular magnetization of Co, their collaborative action enables high-efficiency field-free deterministic switching. Our work provides a straightforward strategy to realize high-speed and low-power spintronics.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Learning Visual Quadrupedal Loco-Manipulation from Demonstrations
Authors:
Zhengmao He,
Kun Lei,
Yanjie Ze,
Koushil Sreenath,
Zhongyu Li,
Huazhe Xu
Abstract:
Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four…
▽ More
Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four limbs, each possessing 3 degrees of freedom (DoFs). Hence, we aim to empower a quadruped robot to execute real-world manipulation tasks using only its legs. We decompose the loco-manipulation process into a low-level reinforcement learning (RL)-based controller and a high-level Behavior Cloning (BC)-based planner. By parameterizing the manipulation trajectory, we synchronize the efforts of the upper and lower layers, thereby leveraging the advantages of both RL and BC. Our approach is validated through simulations and real-world experiments, demonstrating the robot's ability to perform tasks that demand mobility and high precision, such as lifting a basket from the ground while moving, closing a dishwasher, pressing a button, and pushing a door. Project website: https://zhengmaohe.github.io/leg-manip
△ Less
Submitted 2 August, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
Authors:
Shiwei Liu,
Guanchen Tao,
Yifei Zou,
Derek Chow,
Zichen Fan,
Kauna Lei,
Bangfei Pan,
Dennis Sylvester,
Gregory Kielian,
Mehdi Saligane
Abstract:
The self-attention mechanism distinguishes transformer-based large language models (LLMs) apart from convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon remains challenging due to the extensive use of Softmax in self-attention. In addition to the non-linearity, the low arithmetic intensity significantly limits processing pa…
▽ More
The self-attention mechanism distinguishes transformer-based large language models (LLMs) apart from convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon remains challenging due to the extensive use of Softmax in self-attention. In addition to the non-linearity, the low arithmetic intensity significantly limits processing parallelism, especially when working with longer contexts. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design that serves as an efficient alternative to Softmax. ConSmax utilizes differentiable normalization parameters to eliminate the need for maximum searching and denominator summation in Softmax. This approach enables extensive parallelization while still executing the essential functions of Softmax. Moreover, a scalable ConSmax hardware design with a bitwidth-split look-up table (LUT) can achieve lossless non-linear operations and support mixed-precision computing. Experimental results show that ConSmax achieves a minuscule power consumption of 0.2mW and an area of 0.0008mm^2 at 1250MHz working frequency in 16nm FinFET technology. For open-source contribution, we further implement our design with the OpenROAD toolchain under SkyWater's 130nm CMOS technology. The corresponding power is 2.69mW and the area is 0.007mm^2. ConSmax achieves 3.35x power savings and 2.75x area savings in 16nm technology, and 3.15x power savings and 4.14x area savings with the open-source EDA toolchain. In the meantime, it also maintains comparable accuracy on the GPT-2 model and the WikiText103 dataset. The project is available at https://github.com/ReaLLMASIC/ConSmax
△ Less
Submitted 14 November, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization
Authors:
Kun Lei,
Zhengmao He,
Chenhao Lu,
Kaizhe Hu,
Yang Gao,
Huazhe Xu
Abstract:
Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we…
▽ More
Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we propose Uni-o4, which utilizes an on-policy objective for both offline and online learning. Owning to the alignment of objectives in two phases, the RL agent can transfer between offline and online learning seamlessly. This property enhances the flexibility of the learning paradigm, allowing for arbitrary combinations of pretraining, fine-tuning, offline, and online learning. In the offline phase, specifically, Uni-o4 leverages diverse ensemble policies to address the mismatch issues between the estimated behavior policy and the offline dataset. Through a simple offline policy evaluation (OPE) approach, Uni-o4 can achieve multi-step policy improvement safely. We demonstrate that by employing the method above, the fusion of these two paradigms can yield superior offline initialization as well as stable and rapid online fine-tuning capabilities. Through real-world robot tasks, we highlight the benefits of this paradigm for rapid deployment in challenging, previously unseen real-world environments. Additionally, through comprehensive evaluations using numerous simulated benchmarks, we substantiate that our method achieves state-of-the-art performance in both offline and offline-to-online fine-tuning learning. Our website: https://lei-kun.github.io/uni-o4/ .
△ Less
Submitted 17 March, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
"Zero change" platform for monolithic back-end-of-line integration of phase change materials in silicon photonics
Authors:
Maoliang Wei,
Kai Xu,
Bo Tang,
Junying Li,
Yiting Yun,
Peng Zhang,
Yingchun Wu,
Kangjian Bao,
Kunhao Lei,
Zequn Chen,
Hui Ma,
Chunlei Sun,
Ruonan Liu,
Ming Li,
Lan Li,
Hongtao Lin
Abstract:
Monolithic integration of novel materials for unprecedented device functions without modifying the existing photonic component library is the key to advancing heterogeneous silicon photonic integrated circuits. To achieve this, the introduction of a silicon nitride etching stop layer at selective area, coupled with low-loss oxide trench to waveguide surface, enables the incorporation of various fu…
▽ More
Monolithic integration of novel materials for unprecedented device functions without modifying the existing photonic component library is the key to advancing heterogeneous silicon photonic integrated circuits. To achieve this, the introduction of a silicon nitride etching stop layer at selective area, coupled with low-loss oxide trench to waveguide surface, enables the incorporation of various functional materials without disrupting the reliability of foundry-verified devices. As an illustration, two distinct chalcogenide phase change materials (PCM) with remarkable nonvolatile modulation capabilities, namely Sb2Se3 and Ge2Sb2Se4Te1, were monolithic back-end-of-line integrated into silicon photonics. The PCM enables compact phase and intensity tuning units with zero-static power consumption. Taking advantage of these building blocks, the phase error of a push-pull Mach-Zehnder interferometer optical switch could be trimmed by a nonvolatile phase shifter with a 48% peak power consumption reduction. Mirco-ring filters with a rejection ratio >25dB could be applied for >5-bit wavelength selective intensity modulation, and waveguide-based >7-bit intensity-modulation photonic attenuators could achieve >39dB broadband attenuation. The advanced "Zero change" back-end-of-line integration platform could not only facilitate the integration of PCMs for integrated reconfigurable photonics but also open up the possibilities for integrating other excellent optoelectronic materials in the future silicon photonic process design kits.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
WSTac: Interactive Surface Perception based on Whisker-Inspired and Self-Illuminated Vision-Based Tactile Sensor
Authors:
Kai Chong Lei,
Kit Wa Sou,
Wang Sing Chan,
Jiayi Yan,
Siqi Ping,
Dengfeng Peng,
Wenbo Ding,
Xiao-Ping Zhang
Abstract:
Modern Visual-Based Tactile Sensors (VBTSs) use cost-effective cameras to track elastomer deformation, but struggle with ambient light interference. Solutions typically involve using internal LEDs and blocking external light, thus adding complexity. Creating a VBTS resistant to ambient light with just a camera and an elastomer remains a challenge. In this work, we introduce WStac, a self-illuminat…
▽ More
Modern Visual-Based Tactile Sensors (VBTSs) use cost-effective cameras to track elastomer deformation, but struggle with ambient light interference. Solutions typically involve using internal LEDs and blocking external light, thus adding complexity. Creating a VBTS resistant to ambient light with just a camera and an elastomer remains a challenge. In this work, we introduce WStac, a self-illuminating VBTS comprising a mechanoluminescence (ML) whisker elastomer, camera, and 3D printed parts. The ML whisker elastomer, inspired by the touch sensitivity of vibrissae, offers both light isolation and high ML intensity under stress, thereby removing the necessity for additional LED modules. With the incorporation of machine learning, the sensor effectively utilizes the dynamic contact variations of 25 whiskers to successfully perform tasks like speed regression, directional identification, and texture classification. Videos are available at: https://sites.google.com/view/wstac/.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
HVDetFusion: A Simple and Robust Camera-Radar Fusion Framework
Authors:
Kai Lei,
Zhan Chen,
Shuman Jia,
Xiaoteng Zhang
Abstract:
In the field of autonomous driving, 3D object detection is a very important perception module. Although the current SOTA algorithm combines Camera and Lidar sensors, limited by the high price of Lidar, the current mainstream landing schemes are pure Camera sensors or Camera+Radar sensors. In this study, we propose a new detection algorithm called HVDetFusion, which is a multi-modal detection algor…
▽ More
In the field of autonomous driving, 3D object detection is a very important perception module. Although the current SOTA algorithm combines Camera and Lidar sensors, limited by the high price of Lidar, the current mainstream landing schemes are pure Camera sensors or Camera+Radar sensors. In this study, we propose a new detection algorithm called HVDetFusion, which is a multi-modal detection algorithm that not only supports pure camera data as input for detection, but also can perform fusion input of radar data and camera data. The camera stream does not depend on the input of Radar data, thus addressing the downside of previous methods. In the pure camera stream, we modify the framework of Bevdet4D for better perception and more efficient inference, and this stream has the whole 3D detection output. Further, to incorporate the benefits of Radar signals, we use the prior information of different object positions to filter the false positive information of the original radar data, according to the positioning information and radial velocity information recorded by the radar sensors to supplement and fuse the BEV features generated by the original camera data, and the effect is further improved in the process of fusion training. Finally, HVDetFusion achieves the new state-of-the-art 67.4\% NDS on the challenging nuScenes test set among all camera-radar 3D object detectors. The code is available at https://github.com/HVXLab/HVDetFusion
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Behavior Proximal Policy Optimization
Authors:
Zifeng Zhuang,
Kun Lei,
Jinxin Liu,
Donglin Wang,
Yilang Guo
Abstract:
Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-critic methods perform poorly due to the overestimation of out-of-distribution state-action pairs. Thus, various additional augmentations are proposed to keep the learned policy close to the offline dataset (or the behavior policy). In this work, starting from the analysis of offline monotonic policy impro…
▽ More
Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-critic methods perform poorly due to the overestimation of out-of-distribution state-action pairs. Thus, various additional augmentations are proposed to keep the learned policy close to the offline dataset (or the behavior policy). In this work, starting from the analysis of offline monotonic policy improvement, we get a surprising finding that some online on-policy algorithms are naturally able to solve offline RL. Specifically, the inherent conservatism of these on-policy algorithms is exactly what the offline RL method needs to overcome the overestimation. Based on this, we propose Behavior Proximal Policy Optimization (BPPO), which solves offline RL without any extra constraint or regularization introduced compared to PPO. Extensive experiments on the D4RL benchmark indicate this extremely succinct method outperforms state-of-the-art offline RL algorithms. Our implementation is available at https://github.com/Dragon-Zhuang/BPPO.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Automated MRI Field of View Prescription from Region of Interest Prediction by Intra-stack Attention Neural Network
Authors:
Ke Lei,
Ali B. Syed,
Xucheng Zhu,
John M. Pauly,
Shreyas S. Vasanawala
Abstract:
Manual prescription of the field of view (FOV) by MRI technologists is variable and prolongs the scanning process. Often, the FOV is too large or crops critical anatomy. We propose a deep-learning framework, trained by radiologists' supervision, for automating FOV prescription. An intra-stack shared feature extraction network and an attention network are used to process a stack of 2D image inputs…
▽ More
Manual prescription of the field of view (FOV) by MRI technologists is variable and prolongs the scanning process. Often, the FOV is too large or crops critical anatomy. We propose a deep-learning framework, trained by radiologists' supervision, for automating FOV prescription. An intra-stack shared feature extraction network and an attention network are used to process a stack of 2D image inputs to generate output scalars defining the location of a rectangular region of interest (ROI). The attention mechanism is used to make the model focus on the small number of informative slices in a stack. Then the smallest FOV that makes the neural network predicted ROI free of aliasing is calculated by an algebraic operation derived from MR sampling theory. We retrospectively collected 595 cases between February 2018 and February 2022. The framework's performance is examined quantitatively with intersection over union (IoU) and pixel error on position, and qualitatively with a reader study. We use the t-test for comparing quantitative results from all models and a radiologist. The proposed model achieves an average IoU of 0.867 and average ROI position error of 9.06 out of 512 pixels on 80 test cases, significantly better (P<0.05) than two baseline models and not significantly different from a radiologist (P>0.12). Finally, the FOV given by the proposed framework achieves an acceptance rate of 92% from an experienced radiologist.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Fault diagnosis for three-phase PWM rectifier based on deep feedforward network with transient synthetic features
Authors:
Kou Lei,
Liu Chuang,
Cai Guo-Wei,
Zhang Zhe,
Zhou Jia-Ning,
Wang Xue-Mei
Abstract:
Three-phase PWM rectifiers are adopted extensively in industry because of their excellent properties and potential advantages. However, while the IGBT has an open-circuit fault, the system does not crash suddenly, the performance will be reduced for instance voltages fluctuation and current harmonics. A fault diagnosis method based on deep feedforward network with transient synthetic features is p…
▽ More
Three-phase PWM rectifiers are adopted extensively in industry because of their excellent properties and potential advantages. However, while the IGBT has an open-circuit fault, the system does not crash suddenly, the performance will be reduced for instance voltages fluctuation and current harmonics. A fault diagnosis method based on deep feedforward network with transient synthetic features is proposed to reduce the dependence on the fault mathematical models in this paper, which mainly uses the transient phase current to train the deep feedforward network classifier. Firstly, the features of fault phase current are analyzed in this paper. Secondly, the historical fault data after feature synthesis is employed to train the deep feedforward network classifier, and the average fault diagnosis accuracy can reach 97.85% for transient synthetic fault data, the classifier trained by the transient synthetic features obtained more than 1% gain in performance compared with original transient features. Finally, the online fault diagnosis experiments show that the method can accurately locate the fault IGBTs, and the final diagnosis result is determined by multiple groups results, which has the ability to increase the accuracy and reliability of the diagnosis results. (c) 2020 ISA. Published by Elsevier Ltd. All rights reserved.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Manipulating electron waves in graphene using carbon nanotube gating
Authors:
Shiang-Bin Chiu,
Alina Mreńca-Kolasińska,
Ka Long Lei,
Ching-Hung Chiu,
Wun-Hao Kang,
Szu-Chao Chen,
Ming-Hao Liu
Abstract:
Graphene with its dispersion relation resembling that of photons offers ample opportunities for applications in electron optics. The spacial variation of carrier density by external gates can be used to create electron waveguides, in analogy to optical fiber, with additional confinement of the carriers in bipolar junctions leading to the formation of few transverse guiding modes. We show that wave…
▽ More
Graphene with its dispersion relation resembling that of photons offers ample opportunities for applications in electron optics. The spacial variation of carrier density by external gates can be used to create electron waveguides, in analogy to optical fiber, with additional confinement of the carriers in bipolar junctions leading to the formation of few transverse guiding modes. We show that waveguides created by gating graphene with carbon nanotubes (CNTs) allow obtaining sharp conductance plateaus, and propose applications in the Aharonov-Bohm and two-path interferometers, and a pointlike source for injection of carriers in graphene. Other applications can be extended to Bernal-stacked or twisted bilayer graphene or two-dimensional electron gas. Thanks to their versatility, CNT-induced waveguides open various possibilities for electron manipulation in graphene-based devices.
△ Less
Submitted 17 May, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Optimal Layout Plan of Stands at the Macao Food Festival via Minimizing the Electrostatic Potential Energy with the Effective Charge as Popularity of Stands
Authors:
Ka Ian Im,
In Kio Choi,
Pak Kio Lei,
Hou Fai Chan,
U In Ian,
Wei Shan Lee
Abstract:
We proposed a mathematical model for designing the layout diagram of stand locations at the Macao Food Festival. The optimal layout diagram may be defined in such a way that, while requiring the distance between every pair of stands should not be too far away from each other, the crowd control is well managed so that people may patronize stands more effectively. More popular stands may have larger…
▽ More
We proposed a mathematical model for designing the layout diagram of stand locations at the Macao Food Festival. The optimal layout diagram may be defined in such a way that, while requiring the distance between every pair of stands should not be too far away from each other, the crowd control is well managed so that people may patronize stands more effectively. More popular stands may have larger patronage, resulting in higher pedestrian flow nearby. Therefore, to avoid customers from packing shoulder to shoulder around more popular stands, we may treat every stand as a charged particle carrying an effective charge: the more popular a stand is, the higher the effective charge it carries. Under this assumption, the problem is then converted to the minimization problem of Coulomb electrostatic potential energy on a specific configuration of charge locations, with which the global minimum may be found by the Simulated Annealing and Metropolis Algorithm. Electrostatic energy density is interpreted as density of customers, while electric field the reversed crowd flow. Therefore, at a certain location we are able to predict the customer density by calculating the energy density and the net crowd flow with electric field lines. We also concluded that even though the required computation time to obtain a configuration of stand locations with the energy value close to the global minimum with a tolerable difference may be irrelevant to the randomly generated initial configuration of stand locations, setting up an appropriate initial configuration could be one of the key issues to find out the actual global minimum.
△ Less
Submitted 16 February, 2022; v1 submitted 17 January, 2022;
originally announced January 2022.
-
Atomistic View of Homogeneous Nucleation of Water into Polymorphic Ices
Authors:
Maodong Li,
Jun Zhang,
Niu Haiyang,
Yao Kun Lei,
Xu Han,
Lijiang Yang,
Zhiqiang Ye,
Yi Isaac Yang,
Yi Qin Gao
Abstract:
Water is one of the most abundant substances on Earth, and ice, i.e., solid water, has more than 18 known phases. Normally ice in nature exists only as Ice Ih, Ice Ic, or a stacking disordered mixture of both. Although many theoretical efforts have been devoted to understanding the thermodynamics of different ice phases at ambient temperature and pressure, there still remains many puzzles. We simu…
▽ More
Water is one of the most abundant substances on Earth, and ice, i.e., solid water, has more than 18 known phases. Normally ice in nature exists only as Ice Ih, Ice Ic, or a stacking disordered mixture of both. Although many theoretical efforts have been devoted to understanding the thermodynamics of different ice phases at ambient temperature and pressure, there still remains many puzzles. We simulated the reversible transitions between water and different ice phases by performing full atom molecular dynamics simulations. Using the enhanced sampling method MetaITS with the two selected X-ray diffraction peak intensities as collective variables, the ternary phase diagrams of liquid water, ice Ih, ice Ic at multiple were obtained. We also present a simple physical model which successfully explains the thermodynamic stability of ice. Our results agree with experiments and leads to a deeper understanding of the ice nucleation mechanism.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Artifact- and content-specific quality assessment for MRI with image rulers
Authors:
Ke Lei,
John M. Pauly,
Shreyas S. Vasanawala
Abstract:
In clinical practice MR images are often first seen by radiologists long after the scan. If image quality is inadequate either patients have to return for an additional scan, or a suboptimal interpretation is rendered. An automatic image quality assessment (IQA) would enable real-time remediation. Existing IQA works for MRI give only a general quality score, agnostic to the cause of and solution t…
▽ More
In clinical practice MR images are often first seen by radiologists long after the scan. If image quality is inadequate either patients have to return for an additional scan, or a suboptimal interpretation is rendered. An automatic image quality assessment (IQA) would enable real-time remediation. Existing IQA works for MRI give only a general quality score, agnostic to the cause of and solution to low-quality scans. Furthermore, radiologists' image quality requirements vary with the scan type and diagnostic task. Therefore, the same score may have different implications for different scans. We propose a framework with multi-task CNN model trained with calibrated labels and inferenced with image rulers. Labels calibrated by human inputs follow a well-defined and efficient labeling task. Image rulers address varying quality standards and provide a concrete way of interpreting raw scores from the CNN. The model supports assessments of two of the most common artifacts in MRI: noise and motion. It achieves accuracies of around 90%, 6% better than the best previous method examined, and 3% better than human experts on noise assessment. Our experiments show that label calibration, image rulers, and multi-task training improve the model's performance and generalizability.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
Data-driven Smart Ponzi Scheme Detection
Authors:
Yuzhi Liang,
Weijing Wu,
Kai Lei,
Feiyang Wang
Abstract:
A smart Ponzi scheme is a new form of economic crime that uses Ethereum smart contract account and cryptocurrency to implement Ponzi scheme. The smart Ponzi scheme has harmed the interests of many investors, but researches on smart Ponzi scheme detection is still very limited. The existing smart Ponzi scheme detection methods have the problems of requiring many human resources in feature engineeri…
▽ More
A smart Ponzi scheme is a new form of economic crime that uses Ethereum smart contract account and cryptocurrency to implement Ponzi scheme. The smart Ponzi scheme has harmed the interests of many investors, but researches on smart Ponzi scheme detection is still very limited. The existing smart Ponzi scheme detection methods have the problems of requiring many human resources in feature engineering and poor model portability. To solve these problems, we propose a data-driven smart Ponzi scheme detection system in this paper. The system uses dynamic graph embedding technology to automatically learn the representation of an account based on multi-source and multi-modal data related to account transactions. Compared with traditional methods, the proposed system requires very limited human-computer interaction. To the best of our knowledge, this is the first work to implement smart Ponzi scheme detection through dynamic graph embedding. Experimental results show that this method is significantly better than the existing smart Ponzi scheme detection methods.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Solve routing problems with a residual edge-graph attention neural network
Authors:
Kun Lei,
Peng Guo,
Yi Wang,
Xiao Wu,
Wenchao Zhao
Abstract:
For NP-hard combinatorial optimization problems, it is usually difficult to find high-quality solutions in polynomial time. The design of either an exact algorithm or an approximate algorithm for these problems often requires significantly specialized knowledge. Recently, deep learning methods provide new directions to solve such problems. In this paper, an end-to-end deep reinforcement learning f…
▽ More
For NP-hard combinatorial optimization problems, it is usually difficult to find high-quality solutions in polynomial time. The design of either an exact algorithm or an approximate algorithm for these problems often requires significantly specialized knowledge. Recently, deep learning methods provide new directions to solve such problems. In this paper, an end-to-end deep reinforcement learning framework is proposed to solve this type of combinatorial optimization problems. This framework can be applied to different problems with only slight changes of input (for example, for a traveling salesman problem (TSP), the input is the two-dimensional coordinates of nodes; while for a capacity-constrained vehicle routing problem (CVRP), the input is simply changed to three-dimensional vectors including the two-dimensional coordinates and the customer demands of nodes), masks and decoder context vectors. The proposed framework is aiming to improve the models in literacy in terms of the neural network model and the training algorithm. The solution quality of TSP and the CVRP up to 100 nodes are significantly improved via our framework. Specifically, the average optimality gap is reduced from 4.53\% (reported best \cite{R22}) to 3.67\% for TSP with 100 nodes and from 7.34\% (reported best \cite{R22}) to 6.68\% for CVRP with 100 nodes when using the greedy decoding strategy. Furthermore, our framework uses about 1/3$\sim$3/4 training samples compared with other existing learning methods while achieving better results. The results performed on randomly generated instances and the benchmark instances from TSPLIB and CVRPLIB confirm that our framework has a linear running time on the problem size (number of nodes) during the testing phase, and has a good generalization performance from random instance training to real-world instance testing.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Relabel the Noise: Joint Extraction of Entities and Relations via Cooperative Multiagents
Authors:
Daoyuan Chen,
Yaliang Li,
Kai Lei,
Ying Shen
Abstract:
Distant supervision based methods for entity and relation extraction have received increasing popularity due to the fact that these methods require light human annotation efforts. In this paper, we consider the problem of \textit{shifted label distribution}, which is caused by the inconsistency between the noisy-labeled training set subject to external knowledge graph and the human-annotated test…
▽ More
Distant supervision based methods for entity and relation extraction have received increasing popularity due to the fact that these methods require light human annotation efforts. In this paper, we consider the problem of \textit{shifted label distribution}, which is caused by the inconsistency between the noisy-labeled training set subject to external knowledge graph and the human-annotated test set, and exacerbated by the pipelined entity-then-relation extraction manner with noise propagation. We propose a joint extraction approach to address this problem by re-labeling noisy instances with a group of cooperative multiagents. To handle noisy instances in a fine-grained manner, each agent in the cooperative group evaluates the instance by calculating a continuous confidence score from its own perspective; To leverage the correlations between these two extraction tasks, a confidence consensus module is designed to gather the wisdom of all agents and re-distribute the noisy training set with confidence-scored labels. Further, the confidences are used to adjust the training losses of extractors. Experimental results on two real-world datasets verify the benefits of re-labeling noisy instance, and show that the proposed model significantly outperforms the state-of-the-art entity and relation extraction methods.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Exploring and Distilling Cross-Modal Information for Image Captioning
Authors:
Fenglin Liu,
Xuancheng Ren,
Yuanxin Liu,
Kai Lei,
Xu Sun
Abstract:
Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. Based on the Transformer, to perform effective…
▽ More
Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. Based on the Transformer, to perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our Transformer-based model achieves a CIDEr score of 129.3 in offline COCO evaluation on the COCO testing set with remarkable efficiency in terms of accuracy, speed, and parameter budget.
△ Less
Submitted 15 March, 2020; v1 submitted 28 February, 2020;
originally announced February 2020.
-
Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling
Authors:
Suping Zhou,
Jia Jia,
Yufeng Yin,
Xiang Li,
Yang Yao,
Ying Zhang,
Zeyang Ye,
Kehua Lei,
Yan Huang,
Jialie Shen
Abstract:
Teaching style plays an influential role in helping students to achieve academic success. In this paper, we explore a new problem of effectively understanding teachers' teaching styles. Specifically, we study 1) how to quantitatively characterize various teachers' teaching styles for various teachers and 2) how to model the subtle relationship between cross-media teaching related data (speech, fac…
▽ More
Teaching style plays an influential role in helping students to achieve academic success. In this paper, we explore a new problem of effectively understanding teachers' teaching styles. Specifically, we study 1) how to quantitatively characterize various teachers' teaching styles for various teachers and 2) how to model the subtle relationship between cross-media teaching related data (speech, facial expressions and body motions, content et al.) and teaching styles. Using the adjectives selected from more than 10,000 feedback questionnaires provided by an educational enterprise, a novel concept called Teaching Style Semantic Space (TSSS) is developed based on the pleasure-arousal dimensional theory to describe teaching styles quantitatively and comprehensively. Then a multi-task deep learning based model, Attention-based Multi-path Multi-task Deep Neural Network (AMMDNN), is proposed to accurately and robustly capture the internal correlations between cross-media features and TSSS. Based on the benchmark dataset, we further develop a comprehensive data set including 4,541 full-annotated cross-modality teaching classes. Our experimental results demonstrate that the proposed AMMDNN outperforms (+0.0842 in terms of the concordance correlation coefficient (CCC) on average) baseline methods. To further demonstrate the advantages of the proposed TSSS and our model, several interesting case studies are carried out, such as teaching styles comparison among different teachers and courses, and leveraging the proposed method for teaching quality analysis.
△ Less
Submitted 17 November, 2019;
originally announced November 2019.
-
Wasserstein GANs for MR Imaging: from Paired to Unpaired Training
Authors:
Ke Lei,
Morteza Mardani,
John M. Pauly,
Shreyas S. Vasanawala
Abstract:
Lack of ground-truth MR images impedes the common supervised training of neural networks for image reconstruction. To cope with this challenge, this paper leverages unpaired adversarial training for reconstruction networks, where the inputs are undersampled k-space and naively reconstructed images from one dataset, and the labels are high-quality images from another dataset. The reconstruction net…
▽ More
Lack of ground-truth MR images impedes the common supervised training of neural networks for image reconstruction. To cope with this challenge, this paper leverages unpaired adversarial training for reconstruction networks, where the inputs are undersampled k-space and naively reconstructed images from one dataset, and the labels are high-quality images from another dataset. The reconstruction networks consist of a generator which suppresses the input image artifacts, and a discriminator using a pool of (unpaired) labels to adjust the reconstruction quality. The generator is an unrolled neural network -- a cascade of convolutional and data consistency layers. The discriminator is also a multilayer CNN that plays the role of a critic scoring the quality of reconstructed images based on the Wasserstein distance. Our experiments with knee MRI datasets demonstrate that the proposed unpaired training enables diagnostic-quality reconstruction when high-quality image labels are not available for the input types of interest, or when the amount of labels is small. In addition, our adversarial training scheme can achieve better image quality (as rated by expert radiologists) compared with the paired training schemes with pixel-wise loss.
△ Less
Submitted 7 September, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
GCN-GAN: A Non-linear Temporal Link Prediction Model for Weighted Dynamic Networks
Authors:
Kai Lei,
Meng Qin,
Bo Bai,
Gong Zhang,
Min Yang
Abstract:
In this paper, we generally formulate the dynamics prediction problem of various network systems (e.g., the prediction of mobility, traffic and topology) as the temporal link prediction task. Different from conventional techniques of temporal link prediction that ignore the potential non-linear characteristics and the informative link weights in the dynamic network, we introduce a novel non-linear…
▽ More
In this paper, we generally formulate the dynamics prediction problem of various network systems (e.g., the prediction of mobility, traffic and topology) as the temporal link prediction task. Different from conventional techniques of temporal link prediction that ignore the potential non-linear characteristics and the informative link weights in the dynamic network, we introduce a novel non-linear model GCN-GAN to tackle the challenging temporal link prediction task of weighted dynamic networks. The proposed model leverages the benefits of the graph convolutional network (GCN), long short-term memory (LSTM) as well as the generative adversarial network (GAN). Thus, the dynamics, topology structure and evolutionary patterns of weighted dynamic networks can be fully exploited to improve the temporal link prediction performance. Concretely, we first utilize GCN to explore the local topological characteristics of each single snapshot and then employ LSTM to characterize the evolving features of the dynamic networks. Moreover, GAN is used to enhance the ability of the model to generate the next weighted network snapshot, which can effectively tackle the sparsity and the wide-value-range problem of edge weights in real-life dynamic networks. To verify the model's effectiveness, we conduct extensive experiments on four datasets of different network systems and application scenarios. The experimental results demonstrate that our model achieves impressive results compared to the state-of-the-art competitors.
△ Less
Submitted 26 January, 2019;
originally announced January 2019.
-
Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering
Authors:
Yang Deng,
Yuexiang Xie,
Yaliang Li,
Min Yang,
Nan Du,
Wei Fan,
Kai Lei,
Ying Shen
Abstract:
Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the…
▽ More
Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the following motivations. First, both answer selection and KBQA can be regarded as a ranking problem, with one at text-level while the other at knowledge-level. Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection. To fulfill the goal of jointly learning these two tasks, we propose a novel multi-task learning scheme that utilizes multi-view attention learned from various perspectives to enable these tasks to interact with each other as well as learn more comprehensive sentence representations. The experiments conducted on several real-world datasets demonstrate the effectiveness of the proposed method, and the performance of answer selection and KBQA is improved. Also, the multi-view attention scheme is proved to be effective in assembling attentive information from different representational perspectives.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.
-
A Knowledge Graph Based Solution for Entity Discovery and Linking in Open-Domain Questions
Authors:
Kai Lei,
Bing Zhang,
Yong Liu,
Yang Deng,
Dongyu Zhang,
Ying Shen
Abstract:
Named entity discovery and linking is the fundamental and core component of question answering. In Question Entity Discovery and Linking (QEDL) problem, traditional methods are challenged because multiple entities in one short question are difficult to be discovered entirely and the incomplete information in short text makes entity linking hard to implement. To overcome these difficulties, we prop…
▽ More
Named entity discovery and linking is the fundamental and core component of question answering. In Question Entity Discovery and Linking (QEDL) problem, traditional methods are challenged because multiple entities in one short question are difficult to be discovered entirely and the incomplete information in short text makes entity linking hard to implement. To overcome these difficulties, we proposed a knowledge graph based solution for QEDL and developed a system consists of Question Entity Discovery (QED) module and Entity Linking (EL) module. The method of QED module is a tradeoff and ensemble of two methods. One is the method based on knowledge graph retrieval, which could extract more entities in questions and guarantee the recall rate, the other is the method based on Conditional Random Field (CRF), which improves the precision rate. The EL module is treated as a ranking problem and Learning to Rank (LTR) method with features such as semantic similarity, text similarity and entity popularity is utilized to extract and make full use of the information in short texts. On the official dataset of a shared QEDL evaluation task, our approach could obtain 64.44% F1 score of QED and 64.86% accuracy of EL, which ranks the 2nd place and indicates its practical use for QEDL problem.
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding
Authors:
Ying Shen,
Qiang Zhang,
Jin Zhang,
Jiyue Huang,
Yuming Lu,
Kai Lei
Abstract:
Automatic text classification (TC) research can be used for real-world problems such as the classification of in-patient discharge summaries and medical text reports, which is beneficial to make medical documents more understandable to doctors. However, in electronic medical records (EMR), the texts containing sentences are shorter than that in general domain, which leads to the lack of semantic f…
▽ More
Automatic text classification (TC) research can be used for real-world problems such as the classification of in-patient discharge summaries and medical text reports, which is beneficial to make medical documents more understandable to doctors. However, in electronic medical records (EMR), the texts containing sentences are shorter than that in general domain, which leads to the lack of semantic features and the ambiguity of semantic. To tackle this challenge, we propose to add word-cluster embedding to deep neural network for improving short text classification. Concretely, we first use hierarchical agglomerative clustering to cluster the word vectors in the semantic space. Then we calculate the cluster center vector which represents the implicit topic information of words in the cluster. Finally, we expand word vector with cluster center vector, and implement classifiers using CNN and LSTM respectively. To evaluate the performance of our proposed method, we conduct experiments on public data sets TREC and the medical short sentences data sets which is constructed and released by us. The experimental results demonstrate that our proposed method outperforms state-of-the-art baselines in short sentence classification on both medical domain and general domain.
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
MedSim: A Novel Semantic Similarity Measure in Bio-medical Knowledge Graphs
Authors:
Kai Lei,
Kaiqi Yuan,
Qiang Zhang,
Ying Shen
Abstract:
We present MedSim, a novel semantic SIMilarity method based on public well-established bio-MEDical knowledge graphs (KGs) and large-scale corpus, to study the therapeutic substitution of antibiotics. Besides hierarchy and corpus of KGs, MedSim further interprets medicine characteristics by constructing multi-dimensional medicine-specific feature vectors. Dataset of 528 antibiotic pairs scored by d…
▽ More
We present MedSim, a novel semantic SIMilarity method based on public well-established bio-MEDical knowledge graphs (KGs) and large-scale corpus, to study the therapeutic substitution of antibiotics. Besides hierarchy and corpus of KGs, MedSim further interprets medicine characteristics by constructing multi-dimensional medicine-specific feature vectors. Dataset of 528 antibiotic pairs scored by doctors is applied for evaluation and MedSim has produced statistically significant improvement over other semantic similarity methods. Furthermore, some promising applications of MedSim in drug substitution and drug abuse prevention are presented in case study.
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
An enhanced computational feature selection method for medical synonym identification via bilingualism and multi-corpus training
Authors:
K. Lei,
S. Si,
D. Wen,
Y. Shen
Abstract:
Medical synonym identification has been an important part of medical natural language processing (NLP). However, in the field of Chinese medical synonym identification, there are problems like low precision and low recall rate. To solve the problem, in this paper, we propose a method for identifying Chinese medical synonyms. We first selected 13 features including Chinese and English features. The…
▽ More
Medical synonym identification has been an important part of medical natural language processing (NLP). However, in the field of Chinese medical synonym identification, there are problems like low precision and low recall rate. To solve the problem, in this paper, we propose a method for identifying Chinese medical synonyms. We first selected 13 features including Chinese and English features. Then we studied the synonym identification results of each feature alone and different combinations of the features. Through the comparison among identification results, we present an optimal combination of features for Chinese medical synonym identification. Experiments show that our selected features have achieved 97.37% precision rate, 96.00% recall rate and 97.33% F1 score.
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data
Authors:
Yang Deng,
Yaliang Li,
Ying Shen,
Nan Du,
Wei Fan,
Min Yang,
Kai Lei
Abstract:
Knowledge Graph (KG) contains entities and the relations between entities. Due to its representation ability, KG has been successfully applied to support many medical/healthcare tasks. However, in the medical domain, knowledge holds under certain conditions. For example, symptom \emph{runny nose} highly indicates the existence of disease \emph{whooping cough} when the patient is a baby rather than…
▽ More
Knowledge Graph (KG) contains entities and the relations between entities. Due to its representation ability, KG has been successfully applied to support many medical/healthcare tasks. However, in the medical domain, knowledge holds under certain conditions. For example, symptom \emph{runny nose} highly indicates the existence of disease \emph{whooping cough} when the patient is a baby rather than the people at other ages. Such conditions for medical knowledge are crucial for decision-making in various medical applications, which is missing in existing medical KGs. In this paper, we aim to discovery medical knowledge conditions from texts to enrich KGs.
Electronic Medical Records (EMRs) are systematized collection of clinical data and contain detailed information about patients, thus EMRs can be a good resource to discover medical knowledge conditions. Unfortunately, the amount of available EMRs is limited due to reasons such as regularization. Meanwhile, a large amount of medical question answering (QA) data is available, which can greatly help the studied task. However, the quality of medical QA data is quite diverse, which may degrade the quality of the discovered medical knowledge conditions. In the light of these challenges, we propose a new truth discovery method, MedTruth, for medical knowledge condition discovery, which incorporates prior source quality information into the source reliability estimation procedure, and also utilizes the knowledge triple information for trustworthy information computation. We conduct series of experiments on real-world medical datasets to demonstrate that the proposed method can discover meaningful and accurate conditions for medical knowledge by leveraging both EMR and QA data. Further, the proposed method is tested on synthetic datasets to validate its effectiveness under various scenarios.
△ Less
Submitted 18 August, 2019; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Transport and Equilibrium in Non-Conservative Systems
Authors:
L. Chayes,
H. K. Lei
Abstract:
We study, in finite volume, a grand canonical version of the McKean-Vlasov equation where the total particle content is allowed to vary. The dynamics is anticipated to minimize an appropriate grand canonical free energy; we make this notion precise by introducing a metric on a set of positive Borel measures without pre-prescribed mass and demonstrating that the dynamics is a gradient flow with res…
▽ More
We study, in finite volume, a grand canonical version of the McKean-Vlasov equation where the total particle content is allowed to vary. The dynamics is anticipated to minimize an appropriate grand canonical free energy; we make this notion precise by introducing a metric on a set of positive Borel measures without pre-prescribed mass and demonstrating that the dynamics is a gradient flow with respect to this metric. Moreover, we develop a JKO-scheme suitable for these problems. The latter ideas have general applicability to a class of second order non-conservative problems. For this particular system we prove, using the JKO-scheme, that (under certain assumptions) convergence to the uniform stationary state is exponential with a rate which is independent of the volume. By contrast, in related conservative systems, decay rates scale - at best - with the square of the characteristic length of the system. This suggests that a grand canonical approach may be useful for both theoretical and computational study of large scale systems.
△ Less
Submitted 26 October, 2016; v1 submitted 15 October, 2014;
originally announced October 2014.
-
On the Rate of Convergence for Critical Crossing Probabilities
Authors:
I. Binder,
L. Chayes,
H. K. Lei
Abstract:
For the site percolation model on the triangular lattice and certain generalizations for which Cardy's Formula has been established we acquire a power law estimate for the \emph{rate} of convergence of the crossing probabilities to Cardy's Formula.
For the site percolation model on the triangular lattice and certain generalizations for which Cardy's Formula has been established we acquire a power law estimate for the \emph{rate} of convergence of the crossing probabilities to Cardy's Formula.
△ Less
Submitted 1 July, 2013; v1 submitted 6 October, 2012;
originally announced October 2012.
-
Hamiltonian ODE's on a Space of Deficient Measures
Authors:
L. Chayes,
W. Gangbo,
H. K. Lei
Abstract:
We continue the study (initiated in [1]) of Borel measures whose time evolution is provided by an interacting Hamiltonian structure. Here, the principal focus is the development and advancement of deficency in the measure caused by displacement of mass to infinity in finite time. We introduce - and study in its own right - a regularization scheme based on a dissipative mechanism which naturally de…
▽ More
We continue the study (initiated in [1]) of Borel measures whose time evolution is provided by an interacting Hamiltonian structure. Here, the principal focus is the development and advancement of deficency in the measure caused by displacement of mass to infinity in finite time. We introduce - and study in its own right - a regularization scheme based on a dissipative mechanism which naturally degrades mass according to distance traveled (in phase space). Our principal results are obtained based on some dynamical considerations in the form of a condition which forbids mass to return from infinity.
△ Less
Submitted 31 July, 2011;
originally announced August 2011.
-
Degenerate diffusion with a drift potential: a viscosity solutions approach, joint work with I. C. Kim, truncated version
Authors:
H. K. Lei
Abstract:
This is a truncated version of the paper "Degenerate diffusion with a drift potential: a viscosity solutions approach", co-authored with I. C. Kim. The purpose of this version is to withdraw the claim of quantitative rate of convergence of the free boundary on the part of H. K. Lei. The difference from the previous version lies in Section 3 where 1) the quantitative version of the convergence of t…
▽ More
This is a truncated version of the paper "Degenerate diffusion with a drift potential: a viscosity solutions approach", co-authored with I. C. Kim. The purpose of this version is to withdraw the claim of quantitative rate of convergence of the free boundary on the part of H. K. Lei. The difference from the previous version lies in Section 3 where 1) the quantitative version of the convergence of the free boundary statement has been removed and 2) the more basic version of some convergence of the free boundary given uniform convergence of the function has been rewritten.
It is emphasized that while some effort has been made towards better exposition and clarity with regard to showing some convergence of the free boundary given uniform convergence of the function (see Section 3) there is no new result here. Quite on the contrary, as the title indicates, what is contained here is a strict subset of the original.
We introduce a notion of viscosity solution for a nonlinear degenerate diffusion equation with a drift potential. We show that our notion of solution coincides with that of the weak solu- tion defined via integration by parts. As an application of the viscosity solutions theory, we show that in the case of a strictly convex potential, the free boundary uniformly converges to equilibrium as $t$ grows.
△ Less
Submitted 19 November, 2010;
originally announced November 2010.
-
On Convergence to SLE$_6$ II: Discrete Approximations and Extraction of Cardy's Formula for General Domains
Authors:
I. Binder,
L. Chayes,
H. K. Lei
Abstract:
Following the approach outlined in [17], convergence to SLE$_6$ of the Exploration Processes for the correlated bond-triangular type models studied in [6] is established in [3] and the present work. In this second part, we focus on establishing Cardy's Formula for general domains.
Following the approach outlined in [17], convergence to SLE$_6$ of the Exploration Processes for the correlated bond-triangular type models studied in [6] is established in [3] and the present work. In this second part, we focus on establishing Cardy's Formula for general domains.
△ Less
Submitted 26 April, 2010;
originally announced April 2010.
-
On Convergence to SLE$_6$ I: Conformal Invariance for Certain Models of the Bond-Triangular Type
Authors:
I. Binder,
L. Chayes,
H. K. Lei
Abstract:
Following the approach outlined in [26], convergence to SLE$_6$ of the Exploration Processes for the correlated bond-triangular type models studied in [11] is established. This puts the said models in the same universality class as the standard site percolation model on the triangular lattice [27]. In the context of these models, the result is proven for all domains with boundary Minkowski dimensi…
▽ More
Following the approach outlined in [26], convergence to SLE$_6$ of the Exploration Processes for the correlated bond-triangular type models studied in [11] is established. This puts the said models in the same universality class as the standard site percolation model on the triangular lattice [27]. In the context of these models, the result is proven for all domains with boundary Minkowski dimension less than two. Moreover, the proof of convergence applies in the context of general critical 2D percolation models and for general domains, under the stipulation that Cardy's Formula can be established for domains in this generality.
△ Less
Submitted 26 April, 2010;
originally announced April 2010.
-
Degenerate diffusion with a drift potential: a viscosity solutions approach
Authors:
I. C. Kim,
H. K. Lei
Abstract:
We introduce a notion of viscosity solutions for a nonlinear degenerate diffusion equation with a drift potential. We show that our notion of solutions coincide with the weak solutions defined via integration by parts. As an application of the viscosity solutions theory, we show that the free boundary uniformly converges to the equilibrium as time grows. In the case of a convex potential, an exp…
▽ More
We introduce a notion of viscosity solutions for a nonlinear degenerate diffusion equation with a drift potential. We show that our notion of solutions coincide with the weak solutions defined via integration by parts. As an application of the viscosity solutions theory, we show that the free boundary uniformly converges to the equilibrium as time grows. In the case of a convex potential, an exponential rate of free boundary convergence is obtained.
△ Less
Submitted 19 October, 2009; v1 submitted 19 October, 2009;
originally announced October 2009.
-
Conformal Invariance for Certain Models of the Bond-Triangular Type
Authors:
I. Binder,
L. Chayes,
H. K. Lei
Abstract:
Following the approach outlined in [18], convergence to SLE6 of the Exploration Processes for the correlated bond-triangular type models studied in [7] is established. This puts the said models in the same universality class as the standard site percolation model on the triangular lattice [19]. The result is proven for all domains with boundary (upper) Minkowski dimension less than two. Moreover…
▽ More
Following the approach outlined in [18], convergence to SLE6 of the Exploration Processes for the correlated bond-triangular type models studied in [7] is established. This puts the said models in the same universality class as the standard site percolation model on the triangular lattice [19]. The result is proven for all domains with boundary (upper) Minkowski dimension less than two. Moreover, the proof of convergence applies in the general context of critical 2D percolation models, under the stipulation that Cardy's Formula can be established.
△ Less
Submitted 26 April, 2010; v1 submitted 18 October, 2007;
originally announced October 2007.
-
Cardy's Formula for Certain Models of the Bond-Triangular Type
Authors:
L. Chayes,
H. K. Lei
Abstract:
We introduce and study a family of 2D percolation systems which are based on the bond percolation model of the triangular lattice. The system under study has local correlations, however, bonds separated by a few lattice spacings act independently of one another. By avoiding explicit use of microscopic paths, it is first established that the model possesses the typical attributes which are indica…
▽ More
We introduce and study a family of 2D percolation systems which are based on the bond percolation model of the triangular lattice. The system under study has local correlations, however, bonds separated by a few lattice spacings act independently of one another. By avoiding explicit use of microscopic paths, it is first established that the model possesses the typical attributes which are indicative of critical behavior in 2D percolation problems. Subsequently, the so called Cardy-Carleson functions are demonstrated to satisfy, in the continuum limit, Cardy's formula for crossing probabilities. This extends the results of S. Smirnov to a non-trivial class of critical 2D percolation systems.
△ Less
Submitted 18 October, 2007; v1 submitted 11 January, 2006;
originally announced January 2006.
-
Random Cluster Models on the Triangular Lattice
Authors:
L. Chayes,
H. K. Lei
Abstract:
We study percolation and the random cluster model on the triangular lattice with 3-body interactions. Starting with percolation, we generalize the star--triangle transformation: We introduce a new parameter (the 3-body term) and identify configurations on the triangles solely by their connectivity. In this new setup, necessary and sufficient conditions are found for positive correlations and thi…
▽ More
We study percolation and the random cluster model on the triangular lattice with 3-body interactions. Starting with percolation, we generalize the star--triangle transformation: We introduce a new parameter (the 3-body term) and identify configurations on the triangles solely by their connectivity. In this new setup, necessary and sufficient conditions are found for positive correlations and this is used to establish regions of percolation and non-percolation. Next we apply this set of ideas to the $q>1$ random cluster model: We derive duality relations for the suitable random cluster measures, prove necessary and sufficient conditions for them to have positive correlations, and finally prove some rigorous theorems concerning phase transitions.
△ Less
Submitted 10 August, 2005;
originally announced August 2005.