Search | arXiv e-print repository

ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to Improve Segmentation Performance

Authors: Jiayu Huo, Yang Liu, Xi Ouyang, Alejandro Granados, Sebastien Ourselin, Rachel Sparks

Abstract: Accurately segmenting brain lesions in MRI scans is critical for providing patients with prognoses and neurological monitoring. However, the performance of CNN-based segmentation methods is constrained by the limited training set size. Advanced data augmentation is an effective strategy to improve the model's robustness. However, they often introduce intensity disparities between foreground and ba… ▽ More Accurately segmenting brain lesions in MRI scans is critical for providing patients with prognoses and neurological monitoring. However, the performance of CNN-based segmentation methods is constrained by the limited training set size. Advanced data augmentation is an effective strategy to improve the model's robustness. However, they often introduce intensity disparities between foreground and background areas and boundary artifacts, which weakens the effectiveness of such strategies. In this paper, we propose a foreground harmonization framework (ARHNet) to tackle intensity disparities and make synthetic images look more realistic. In particular, we propose an Adaptive Region Harmonization (ARH) module to dynamically align foreground feature maps to the background with an attention mechanism. We demonstrate the efficacy of our method in improving the segmentation performance using real and synthetic images. Experimental results on the ATLAS 2.0 dataset show that ARHNet outperforms other methods for image harmonization tasks, and boosts the down-stream segmentation performance. Our code is publicly available at https://github.com/King-HAW/ARHNet. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 9 pages, 4 figures, 3 tables

arXiv:2306.07813 [pdf, other]

Convective meta-thermal concentration for ultrahigh efficient Stirling engine with waste heat and cold utilization

Authors: Xinchen Zhou, Xiang Xu, Xiaoping Ouyang, Jiping Huang

Abstract: The Stirling engine, which possesses external combustion characteristics, a simple structure, and high theoretical thermal efficiency, has excellent potential for utilizing finite waste heat and cold resources. However, practical applications of this technology suffered from thermal inefficiency due to the discontinuity and instability of waste resources. Despite advances in energy storage technol… ▽ More The Stirling engine, which possesses external combustion characteristics, a simple structure, and high theoretical thermal efficiency, has excellent potential for utilizing finite waste heat and cold resources. However, practical applications of this technology suffered from thermal inefficiency due to the discontinuity and instability of waste resources. Despite advances in energy storage technology, temperature variations in the heat-exchanging fluids at the hot and cold ends of the Stirling engine remained significant obstacles. In this work, convective meta-thermal concentration (CMTC) was introduced between the heating (cooling) fluids and the hot (cold) end of the Stirling engine, employing alternating isotropic materials with high and low thermal conductivities. It was demonstrated that CMTC effectively enhanced the temperature difference between the hot and cold ends, leading to a remarkable improvement in Stirling engine efficiency. Particularly, when the Stirling engine efficiency tended to zero due to the limited availability of waste heat and cold resources, CMTC overcame this limitation, surpassing existing optimization technology. Further analysis under various operating conditions showed that CMTC achieved a significant thermal efficiency improvement of up to 1460%. This work expanded the application of thermal metamaterials to heat engine systems, offering an exciting avenue for sustainable energy utilization. △ Less

Submitted 8 January, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.13598 [pdf, other]

doi 10.3847/1538-4357/acd761

FAST search for circumstellar atomic hydrogen. II. Is BD+303639 an interacting planetary nebula?

Authors: Xu-Jia Ouyang, Yong Zhang, Albert Zijlstra, Chuan-Peng Zhang, Jun-ichi Nakashima, Quentin A Parker

Abstract: The young, compact, very high surface brightness but low excitation planetary nebula (PN) BD+303639 is one of the very few PNe that have been reported to exhibit the 21cm HI emission line. As part of a long-term programme to search for circumstellar atomic hydrogen, we observed the 21cm feature toward BD+303639 with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Assuming a direc… ▽ More The young, compact, very high surface brightness but low excitation planetary nebula (PN) BD+303639 is one of the very few PNe that have been reported to exhibit the 21cm HI emission line. As part of a long-term programme to search for circumstellar atomic hydrogen, we observed the 21cm feature toward BD+303639 with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Assuming a direct association between the PN and the detected HI emission, these new observations show that this surrounding emission is significantly more spatially extended than indicated by previous interferometric observations, and can be resolved into two velocity components. The estimated HI mass is larger than 100M_sun, invalidating an origin from the host star itself or its ejecta for the emitting material. We discuss the possibility that the extended HI emission stems from the interstellar medium (ISM) swept out over time by the stellar wind. Moreover, we report tentative detections of HI absorption features lying near and blueward of the systemic velocity of this PN, which are probably from a stalled asterosphere at the outer boundary of the expanding ionized region. The mass of the gas producing the HI absorption is insufficient to solve the so-called `PN missing mass problem'. We demonstrate the capability of FAST to investigate the interaction process between a PN and the surrounding ISM. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 20 pages, 7 figures, accepted for publication in ApJ

arXiv:2305.08826 [pdf, other]

Learning Better Contrastive View from Radiologist's Gaze

Authors: Sheng Wang, Zixu Zhuang, Xi Ouyang, Lichi Zhang, Zheren Li, Chong Ma, Tianming Liu, Dinggang Shen, Qian Wang

Abstract: Recent self-supervised contrastive learning methods greatly benefit from the Siamese structure that aims to minimizing distances between positive pairs. These methods usually apply random data augmentation to input images, expecting the augmented views of the same images to be similar and positively paired. However, random augmentation may overlook image semantic information and degrade the qualit… ▽ More Recent self-supervised contrastive learning methods greatly benefit from the Siamese structure that aims to minimizing distances between positive pairs. These methods usually apply random data augmentation to input images, expecting the augmented views of the same images to be similar and positively paired. However, random augmentation may overlook image semantic information and degrade the quality of augmented views in contrastive learning. This issue becomes more challenging in medical images since the abnormalities related to diseases can be tiny, and are easy to be corrupted (e.g., being cropped out) in the current scheme of random augmentation. In this work, we first demonstrate that, for widely-used X-ray images, the conventional augmentation prevalent in contrastive pre-training can affect the performance of the downstream diagnosis or classification tasks. Then, we propose a novel augmentation method, i.e., FocusContrast, to learn from radiologists' gaze in diagnosis and generate contrastive views for medical images with guidance from radiologists' visual attention. Specifically, we track the gaze movement of radiologists and model their visual attention when reading to diagnose X-ray images. The learned model can predict visual attention of the radiologists given a new input image, and further guide the attention-aware augmentation that hardly neglects the disease-related abnormalities. As a plug-and-play and framework-agnostic module, FocusContrast consistently improves state-of-the-art contrastive learning methods of SimCLR, MoCo, and BYOL by 4.0~7.0% in classification accuracy on a knee X-ray dataset. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2304.10226 [pdf, other]

Domain Generalization for Mammographic Image Analysis with Contrastive Learning

Authors: Zheren Li, Zhiming Cui, Lichi Zhang, Sheng Wang, Chenjin Lei, Xi Ouyang, Dongdong Chen, Xiangyu Zhao, Yajia Gu, Zaiyi Liu, Chunling Liu, Dinggang Shen, Jie-Zhi Cheng

Abstract: The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient… ▽ More The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient amount of diverse data for training. To this end, a novel contrastive learning is developed to equip the deep learning models with better style generalization capability. Specifically, the multi-style and multi-view unsupervised self-learning scheme is carried out to seek robust feature embedding against style diversity as a pretrained model. Afterward, the pretrained network is further fine-tuned to the downstream tasks, e.g., mass detection, matching, BI-RADS rating, and breast density classification. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets. The experimental results suggest that the proposed domain generalization method can effectively improve performance of four mammographic image tasks on the data from both seen and unseen domains, and outperform many state-of-the-art (SOTA) generalization methods. △ Less

Submitted 7 September, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: arXiv admin note: text overlap with arXiv:2111.10827

arXiv:2304.07171 [pdf, other]

doi 10.1093/mnras/stad1021

Radio Galaxy Zoo EMU: Towards a Semantic Radio Galaxy Morphology Taxonomy

Authors: Micah Bowles, Hongming Tang, Eleni Vardoulaki, Emma L. Alexander, Yan Luo, Lawrence Rudnick, Mike Walmsley, Fiona Porter, Anna M. M. Scaife, Inigo Val Slijepcevic, Elizabeth A. K. Adams, Alexander Drabent, Thomas Dugdale, Gülay Gürkan, Andrew M. Hopkins, Eric F. Jimenez-Andrade, Denis A. Leahy, Ray P. Norris, Syed Faisal ur Rahman, Xichang Ouyang, Gary Segal, Stanislav S. Shabala, O. Ivy Wong

Abstract: We present a novel natural language processing (NLP) approach to deriving plain English descriptors for science cases otherwise restricted by obfuscating technical terminology. We address the limitations of common radio galaxy morphology classifications by applying this approach. We experimentally derive a set of semantic tags for the Radio Galaxy Zoo EMU (Evolutionary Map of the Universe) project… ▽ More We present a novel natural language processing (NLP) approach to deriving plain English descriptors for science cases otherwise restricted by obfuscating technical terminology. We address the limitations of common radio galaxy morphology classifications by applying this approach. We experimentally derive a set of semantic tags for the Radio Galaxy Zoo EMU (Evolutionary Map of the Universe) project and the wider astronomical community. We collect 8,486 plain English annotations of radio galaxy morphology, from which we derive a taxonomy of tags. The tags are plain English. The result is an extensible framework which is more flexible, more easily communicated, and more sensitive to rare feature combinations which are indescribable using the current framework of radio astronomy classifications. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: 17 pages, 11 Figures, Accepted at MNRAS

arXiv:2304.03948 [pdf]

Equilibrium distribution and diffusion of mixed hydrogen-methane gas in gravity field

Authors: Shiyao Peng, Qiao He, Ducheng Peng, Xin Ouyang, Xiaorui Zhang, Chong Chai, Lianlai Zhang, Xu Sun, Huiqiu Deng, Wangyu Hu, Jie Hou

Abstract: Repurposing existing natural gas pipelines is a promising solution for large-scale transportation of mixed hydrogen-methane gas. However, it remains debatable whether gravitational stratification can notably affect hydrogen partial pressure in the gas mixture. To address this issue, we combined molecular dynamics simulation with thermodynamic and diffusion theories. Our study systematically examin… ▽ More Repurposing existing natural gas pipelines is a promising solution for large-scale transportation of mixed hydrogen-methane gas. However, it remains debatable whether gravitational stratification can notably affect hydrogen partial pressure in the gas mixture. To address this issue, we combined molecular dynamics simulation with thermodynamic and diffusion theories. Our study systematically examined the equilibrium distribution of hydrogen-methane mixtures in gravity fields. We demonstrated that partial pressures of both gases decrease with altitude, with hydrogen showing slower decrease due to its smaller molar mass. As a result, the volume fraction of hydrogen is maximized at the top end of pipes. The stratification is more favorable at low temperature and large altitude drops, with notable gas stratification only occurring at extremely large drops in altitude, being generally negligible even at a drop of 1500 m. Furthermore, we showed that the diffusion time required to achieve the equilibrium distribution is proportional to gas pressure and the square of pipeline height. This requires approximately 300 years for a 1500 m pipeline at 1 bar. Therefore, temporary interruptions in pipeline gas transportation will not cause visible stratification. Our work clarifies the effect of gravity on hydrogen-methane gas mixtures and provides quantitative insights into assessing the stratification of gas mixtures in pipelines. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: 14 pages, 8 figures

arXiv:2303.16784 [pdf, other]

Complex phase diagram and supercritical matter

Authors: Xiao-Yu Ouyang, Qi-Jun Ye, Xin-Zheng Li

Abstract: The supercritical region is often described as uniform with no definite transitions. The distinct behaviors of the matter therein (as liquid-like and gas-like), however, suggest ``supercritical boundaries". Here, we provide a mathematical description of these phenomena by revisiting the Lee-Yang (LY) theory and introducing a complex phase diagram, i.e. a 4-D one with complex $T$ and $p$. While the… ▽ More The supercritical region is often described as uniform with no definite transitions. The distinct behaviors of the matter therein (as liquid-like and gas-like), however, suggest ``supercritical boundaries". Here, we provide a mathematical description of these phenomena by revisiting the Lee-Yang (LY) theory and introducing a complex phase diagram, i.e. a 4-D one with complex $T$ and $p$. While the traditional 2-D phase diagram with real $T$ and $p$ values (the physical plane) lacks LY zeros beyond the critical point, preventing the occurrence of criticality, the off-plane zeros in this 4-D scenario possess critical anomalies in various physical properties. For example, when the isobaric heat capacity $C_p$, which is a response function of the system to $T$, is used to separate the supercritical region, this 4D complex phase diagram can be visualized by reducing to a 3D one with complex $T$ and real $p$. Then, we find that the supercritical boundary defined by $C_p$ shows perfect correspondence with the projection of the edges of the LY zeros with complex $T$ in this 3D phase diagram on the physical plane, whilst in conventional LY theory these off-plane zeros are neglected. The same relation applies to the isothermal compression coefficient $K_T$ (or $κ_T$) which is a response function of the system to $p$, where complex $p$ should be used. This correlation between the Widom line and the edges of LY zeros is demonstrated in three systems, i.e., van der Waals model, 2D Ising model and water, which unambiguously reveals the incipient phase transition nature of the supercritical matter. With this extension of the LY theory and the associated new findings, a unified picture of phase and phase transition valid for both the phase transition and supercritical regions is provided, which should apply to the complex phase diagram of other thermodynamic state functions. △ Less

Submitted 20 November, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

arXiv:2303.09819 [pdf, ps, other]

doi 10.1016/j.nima.2023.168391

Muon radiography experiments on the subway overburden structure detection

Authors: Xin Mao, Zhiwei Li, Shuning Dong, Jingtai Li, Jianming Zhang, Jie Pang, Yaping Cheng, Bin Liao, Xiaoping Ouyang, Ran Han

Abstract: Muon radiography is an innovative and non-destructive technique for internal density structure imaging, based on measuring the attenuation of cosmic-ray muons after they penetrate the target. Due to the strong penetration ability of muons, the detection range of muon radiography can reach the order of hundreds of meters or even kilometers. Using a portable muon detector composed of plastic scintil… ▽ More Muon radiography is an innovative and non-destructive technique for internal density structure imaging, based on measuring the attenuation of cosmic-ray muons after they penetrate the target. Due to the strong penetration ability of muons, the detection range of muon radiography can reach the order of hundreds of meters or even kilometers. Using a portable muon detector composed of plastic scintillators and silicon photomultipliers, we performed a short-duration(1h) flux scanning experiment of the overburden above the platform and tunnel of the Xiaoying West Road subway station under construction. With the observation direction facing up, the detector is placed on the north side of the track and moved eastward from the platform section inside the station to the tunnel section. The scanning length is 264m and a total of 21 locations are observed. By comparing the observed and predicted values of the muon survival ratio at different locations, the experiment accurately detects the jump in thickness at the interface of the platform section and tunnel section. Furthermore, unknown anomalies caused by random placed light brick piles and side passage mouth above the observation locations are detected and confirmed later. This experiment verifies the feasibility of using natural muons to quickly detect abnormal structures of the overburden of tunnel, and shows that muon radiography has broad application prospects in tunnel safety and other similar aspects. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 30 pages, 10 figures

arXiv:2303.08322 [pdf, other]

Optimization Design for Federated Learning in Heterogeneous 6G Networks

Authors: Bing Luo, Xiaomin Ouyang, Peng Sun, Pengchao Han, Ningning Ding, Jianwei Huang

Abstract: With the rapid advancement of 5G networks, billions of smart Internet of Things (IoT) devices along with an enormous amount of data are generated at the network edge. While still at an early age, it is expected that the evolving 6G network will adopt advanced artificial intelligence (AI) technologies to collect, transmit, and learn this valuable data for innovative applications and intelligent ser… ▽ More With the rapid advancement of 5G networks, billions of smart Internet of Things (IoT) devices along with an enormous amount of data are generated at the network edge. While still at an early age, it is expected that the evolving 6G network will adopt advanced artificial intelligence (AI) technologies to collect, transmit, and learn this valuable data for innovative applications and intelligent services. However, traditional machine learning (ML) approaches require centralizing the training data in the data center or cloud, raising serious user-privacy concerns. Federated learning, as an emerging distributed AI paradigm with privacy-preserving nature, is anticipated to be a key enabler for achieving ubiquitous AI in 6G networks. However, there are several system and statistical heterogeneity challenges for effective and efficient FL implementation in 6G networks. In this article, we investigate the optimization approaches that can effectively address the challenging heterogeneity issues from three aspects: incentive mechanism design, network resource management, and personalized model optimization. We also present some open problems and promising directions for future research. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted in IEEE Nework

arXiv:2303.04811 [pdf, other]

Naive Bayes Classifiers over Missing Data: Decision and Poisoning

Authors: Song Bian, Xiating Ouyang, Zhiwei Fan, Paraschos Koutris

Abstract: We study the certifiable robustness of ML classifiers on dirty datasets that could contain missing values. A test point is certifiably robust for an ML classifier if the classifier returns the same prediction for that test point, regardless of which cleaned version (among exponentially many) of the dirty dataset the classifier is trained on. In this paper, we show theoretically that for Naive Baye… ▽ More We study the certifiable robustness of ML classifiers on dirty datasets that could contain missing values. A test point is certifiably robust for an ML classifier if the classifier returns the same prediction for that test point, regardless of which cleaned version (among exponentially many) of the dirty dataset the classifier is trained on. In this paper, we show theoretically that for Naive Bayes Classifiers (NBC) over dirty datasets with missing values: (i) there exists an efficient polynomial time algorithm to decide whether multiple input test points are all certifiably robust over a dirty dataset; and (ii) the data poisoning attack, which aims to make all input test points certifiably non-robust by inserting missing cells to the clean dataset, is in polynomial time for single test points but NP-complete for multiple test points. Extensive experiments demonstrate that our algorithms are efficient and outperform existing baselines. △ Less

Submitted 28 May, 2024; v1 submitted 7 March, 2023; originally announced March 2023.

Comments: 22 pages, 10 figures

Journal ref: ICML 2024

arXiv:2303.01903 [pdf, other]

doi 10.1109/TPAMI.2025.3562422

Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering

Authors: Zhou Yu, Xuecheng Ouyang, Zhenwei Shao, Meng Wang, Jun Yu

Abstract: Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Early studies retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question, hence restricting the performance of their models. Recent works have resorted to using a powerful large language model (LLM) as an implicit k… ▽ More Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Early studies retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question, hence restricting the performance of their models. Recent works have resorted to using a powerful large language model (LLM) as an implicit knowledge engine to acquire the necessary knowledge for answering. Despite the encouraging results achieved by these methods, we argue that they have not fully activated the capacity of the \emph{blind} LLM as the provided textual input is insufficient to depict the required visual information to answer the question. In this paper, we present Prophet -- a conceptually simple, flexible, and general framework designed to prompt LLM with answer heuristics for knowledge-based VQA. Specifically, we first train a vanilla VQA model on a specific knowledge-based VQA dataset without external knowledge. After that, we extract two types of complementary answer heuristics from the VQA model: answer candidates and answer-aware examples. The two types of answer heuristics are jointly encoded into a formatted prompt to facilitate the LLM's understanding of both the image and question, thus generating a more accurate answer. By incorporating the state-of-the-art LLM GPT-3, Prophet significantly outperforms existing state-of-the-art methods on four challenging knowledge-based VQA datasets. Prophet is general that can be instantiated with the combinations of different VQA models (i.e., both discriminative and generative ones) and different LLMs (i.e., both commercial and open-source ones). Moreover, Prophet can also be integrated with modern large multimodal models in different stages, which is named Prophet++, to further improve the capabilities on knowledge-based VQA tasks. △ Less

Submitted 28 April, 2025; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: An extended journal version of our CVPR 2023 paper, which has been accepted at IEEE T-PAMI 2025. The original conference version can be referred to as the v1 version

arXiv:2302.07257 [pdf, other]

ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models

Authors: Sheng Wang, Zihao Zhao, Xi Ouyang, Qian Wang, Dinggang Shen

Abstract: Large language models (LLMs) have recently demonstrated their potential in clinical applications, providing valuable medical knowledge and advice. For example, a large dialog LLM like ChatGPT has successfully passed part of the US medical licensing exam. However, LLMs currently have difficulty processing images, making it challenging to interpret information from medical images, which are rich in… ▽ More Large language models (LLMs) have recently demonstrated their potential in clinical applications, providing valuable medical knowledge and advice. For example, a large dialog LLM like ChatGPT has successfully passed part of the US medical licensing exam. However, LLMs currently have difficulty processing images, making it challenging to interpret information from medical images, which are rich in information that supports clinical decisions. On the other hand, computer-aided diagnosis (CAD) networks for medical images have seen significant success in the medical field by using advanced deep-learning algorithms to support clinical decision-making. This paper presents a method for integrating LLMs into medical-image CAD networks. The proposed framework uses LLMs to enhance the output of multiple CAD networks, such as diagnosis networks, lesion segmentation networks, and report generation networks, by summarizing and reorganizing the information presented in natural language text format. The goal is to merge the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models to create a more user-friendly and understandable system for patients compared to conventional CAD systems. In the future, LLM's medical knowledge can be also used to improve the performance of vision-based medical-image CAD models. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2209.04407 [pdf]

doi 10.1109/VLSITechnologyandCir46769.2022.9830335

e-G2C: A 0.14-to-8.31 $μ$J/Inference NN-based Processor with Continuous On-chip Adaptation for Anomaly Detection and ECG Conversion from EGM

Authors: Yang Zhao, Yongan Zhang, Yonggan Fu, Xu Ouyang, Cheng Wan, Shang Wu, Anton Banta, Mathews M. John, Allison Post, Mehdi Razavi, Joseph Cavallaro, Behnaam Aazhang, Yingyan Lin

Abstract: This work presents the first silicon-validated dedicated EGM-to-ECG (G2C) processor, dubbed e-G2C, featuring continuous lightweight anomaly detection, event-driven coarse/precise conversion, and on-chip adaptation. e-G2C utilizes neural network (NN) based G2C conversion and integrates 1) an architecture supporting anomaly detection and coarse/precise conversion via time multiplexing to balance the… ▽ More This work presents the first silicon-validated dedicated EGM-to-ECG (G2C) processor, dubbed e-G2C, featuring continuous lightweight anomaly detection, event-driven coarse/precise conversion, and on-chip adaptation. e-G2C utilizes neural network (NN) based G2C conversion and integrates 1) an architecture supporting anomaly detection and coarse/precise conversion via time multiplexing to balance the effectiveness and power, 2) an algorithm-hardware co-designed vector-wise sparsity resulting in a 1.6-1.7$\times$ speedup, 3) hybrid dataflows for enhancing near 100% utilization for normal/depth-wise(DW)/point-wise(PW) convolutions (Convs), and 4) an on-chip detection threshold adaptation engine for continuous effectiveness. The achieved 0.14-8.31 $μ$J/inference energy efficiency outperforms prior arts under similar complexity, promising real-time detection/conversion and possibly life-critical interventions △ Less

Submitted 23 July, 2022; originally announced September 2022.

Comments: Accepted by 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)

arXiv:2208.13638 [pdf, other]

doi 10.1073/pnas.2217068120

Tunable liquid-solid hybrid thermal metamaterials with a topology transition

Authors: Peng Jin, Jinrong Liu, Liujun Xu, Jun Wang, Xiaoping Ouyang, Jian-Hua Jiang, Jiping Huang

Abstract: Thermal metamaterials provide rich control of heat transport which is becoming the foundations of cutting-edge applications ranging from chip cooling to biomedical. However, due to the fundamental laws of physics, the manipulation of heat is much constrained in conventional thermal metamaterials where effective heat conduction with Onsager reciprocity dominates. Here, through the inclusion of ther… ▽ More Thermal metamaterials provide rich control of heat transport which is becoming the foundations of cutting-edge applications ranging from chip cooling to biomedical. However, due to the fundamental laws of physics, the manipulation of heat is much constrained in conventional thermal metamaterials where effective heat conduction with Onsager reciprocity dominates. Here, through the inclusion of thermal convection and breaking the Onsager reciprocity, we unveil a regime in thermal metamaterials and transformation thermotics that goes beyond effective heat conduction. By designing a liquid-solid hybrid thermal metamaterial, we demonstrate a continuous switch from thermal cloaking to thermal concentration in one device with external tuning. Underlying such a switch is a topology transition in the virtual space of the thermotic transformation which is achieved by tuning the liquid flow via external control. These discoveries illustrate the extraordinary heat transport in complex multi-component thermal metamaterials and pave the way toward an unprecedented regime of heat manipulation. △ Less

Submitted 4 February, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

Journal ref: PNAS 120, e2217068120 (2023)

arXiv:2208.12339 [pdf, other]

LinCQA: Faster Consistent Query Answering with Linear Time Guarantees

Authors: Zhiwei Fan, Paraschos Koutris, Xiating Ouyang, Jef Wijsen

Abstract: Most data analytical pipelines often encounter the problem of querying inconsistent data that violate pre-determined integrity constraints. Data cleaning is an extensively studied paradigm that singles out a consistent repair of the inconsistent data. Consistent query answering (CQA) is an alternative approach to data cleaning that asks for all tuples guaranteed to be returned by a given query on… ▽ More Most data analytical pipelines often encounter the problem of querying inconsistent data that violate pre-determined integrity constraints. Data cleaning is an extensively studied paradigm that singles out a consistent repair of the inconsistent data. Consistent query answering (CQA) is an alternative approach to data cleaning that asks for all tuples guaranteed to be returned by a given query on all (in most cases, exponentially many) repairs of the inconsistent data. This paper identifies a class of acyclic select-project-join (SPJ) queries for which CQA can be solved via SQL rewriting with a linear time guarantee. Our rewriting method can be viewed as a generalization of Yannakakis's algorithm for acyclic joins to the inconsistent setting. We present LinCQA, a system that can output rewritings in both SQL and non-recursive Datalog rules for every query in this class. We show that LinCQA often outperforms the existing CQA systems on both synthetic and real-world workloads, and in some cases, by orders of magnitude. △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2207.14386 [pdf, other]

Efficient NLP Model Finetuning via Multistage Data Filtering

Authors: Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji

Abstract: As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To this end, we set to filter training examples in a streaming fashion, in tandem with training the target model. Our key techniques are two: (1) automatically deter… ▽ More As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To this end, we set to filter training examples in a streaming fashion, in tandem with training the target model. Our key techniques are two: (1) automatically determine a training loss threshold for skipping backward training passes; (2) run a meta predictor for further skipping forward training passes. We integrate the above techniques in a holistic, three-stage training process. On a diverse set of benchmarks, our method reduces the required training examples by up to 5.3$\times$ and training time by up to 6.8$\times$, while only seeing minor accuracy degradation. Our method is effective even when training one epoch, where each training example is encountered only once. It is simple to implement and is compatible with the existing finetuning techniques. Code is available at: https://github.com/xo28/efficient- NLP-multistage-training △ Less

Submitted 18 May, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

arXiv:2207.09389 [pdf, other]

Image Synthesis with Disentangled Attributes for Chest X-Ray Nodule Augmentation and Detection

Authors: Zhenrong Shen, Xi Ouyang, Bin Xiao, Jie-Zhi Cheng, Qian Wang, Dinggang Shen

Abstract: Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers. Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR. However, it requires large-scale and diverse medical data with high-quality annotations to train such robust and accurate CADs. To alleviate the limited availability of such datasets, lung… ▽ More Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers. Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR. However, it requires large-scale and diverse medical data with high-quality annotations to train such robust and accurate CADs. To alleviate the limited availability of such datasets, lung nodule synthesis methods are proposed for the sake of data augmentation. Nevertheless, previous methods lack the ability to generate nodules that are realistic with the size attribute desired by the detector. To address this issue, we introduce a novel lung nodule synthesis framework in this paper, which decomposes nodule attributes into three main aspects including shape, size, and texture, respectively. A GAN-based Shape Generator firstly models nodule shapes by generating diverse shape masks. The following Size Modulation then enables quantitative control on the diameters of the generated nodule shapes in pixel-level granularity. A coarse-to-fine gated convolutional Texture Generator finally synthesizes visually plausible nodule textures conditioned on the modulated shape masks. Moreover, we propose to synthesize nodule CXR images by controlling the disentangled nodule attributes for data augmentation, in order to better compensate for the nodules that are easily missed in the detection task. Our experiments demonstrate the enhanced image quality, diversity, and controllability of the proposed lung nodule synthesis framework. We also validate the effectiveness of our data augmentation on greatly improving nodule detection performance. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2207.03677 [pdf, other]

SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning

Authors: Haoran You, Baopu Li, Zhanyi Sun, Xu Ouyang, Yingyan Celine Lin

Abstract: Neural architecture search (NAS) has demonstrated amazing success in searching for efficient deep neural networks (DNNs) from a given supernet. In parallel, the lottery ticket hypothesis has shown that DNNs contain small subnetworks that can be trained from scratch to achieve a comparable or higher accuracy than original DNNs. As such, it is currently a common practice to develop efficient DNNs vi… ▽ More Neural architecture search (NAS) has demonstrated amazing success in searching for efficient deep neural networks (DNNs) from a given supernet. In parallel, the lottery ticket hypothesis has shown that DNNs contain small subnetworks that can be trained from scratch to achieve a comparable or higher accuracy than original DNNs. As such, it is currently a common practice to develop efficient DNNs via a pipeline of first search and then prune. Nevertheless, doing so often requires a search-train-prune-retrain process and thus prohibitive computational cost. In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i.e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via a two-in-one training scheme with jointly architecture searching and parameter pruning. Moreover, we develop a progressive and unified SuperTickets identification strategy that allows the connectivity of subnetworks to change during supernet training, achieving better accuracy and efficiency trade-offs than conventional sparse training. Finally, we evaluate whether such identified SuperTickets drawn from one task can transfer well to other tasks, validating their potential of handling multiple tasks simultaneously. Extensive experiments and ablation studies on three tasks and four benchmark datasets validate that our proposed SuperTickets achieve boosted accuracy and efficiency trade-offs than both typical NAS and pruning pipelines, regardless of having retraining or not. Codes and pretrained models are available at https://github.com/RICE-EIC/SuperTickets. △ Less

Submitted 3 March, 2025; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022

arXiv:2206.08141 [pdf]

i-FlatCam: A 253 FPS, 91.49 $μ$J/Frame Ultra-Compact Intelligent Lensless Camera for Real-Time and Efficient Eye Tracking in VR/AR

Authors: Yang Zhao, Ziyun Li, Yonggan Fu, Yongan Zhang, Chaojian Li, Cheng Wan, Haoran You, Shang Wu, Xu Ouyang, Vivek Boominathan, Ashok Veeraraghavan, Yingyan Celine Lin

Abstract: We present a first-of-its-kind ultra-compact intelligent camera system, dubbed i-FlatCam, including a lensless camera with a computational (Comp.) chip. It highlights (1) a predict-then-focus eye tracking pipeline for boosted efficiency without compromising the accuracy, (2) a unified compression scheme for single-chip processing and improved frame rate per second (FPS), and (3) dedicated intra-ch… ▽ More We present a first-of-its-kind ultra-compact intelligent camera system, dubbed i-FlatCam, including a lensless camera with a computational (Comp.) chip. It highlights (1) a predict-then-focus eye tracking pipeline for boosted efficiency without compromising the accuracy, (2) a unified compression scheme for single-chip processing and improved frame rate per second (FPS), and (3) dedicated intra-channel reuse design for depth-wise convolutional layers (DW-CONV) to increase utilization. i-FlatCam demonstrates the first eye tracking pipeline with a lensless camera and achieves 3.16 degrees of accuracy, 253 FPS, 91.49 $μ$J/Frame, and 6.7mm x 8.9mm x 1.2mm camera form factor, paving the way for next-generation Augmented Reality (AR) and Virtual Reality (VR) devices. △ Less

Submitted 28 March, 2025; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted by VLSI 2022

arXiv:2205.08781 [pdf]

doi 10.1093/nsr/nwac159

Blackhole-Inspired Thermal Trapping with Graded Heat-Conduction Metadevices

Authors: Liujun Xu, Jinrong Liu, Peng Jin, Guoqiang Xu, Jiaxin Li, Xiaoping Ouyang, Ying Li, Cheng-Wei Qiu, Jiping Huang

Abstract: Black holes are one of the most intriguing predictions of general relativity. So far, metadevices have enabled analogous black holes to trap light or sound in laboratory spacetime. However, trapping heat in a conductive ambient is still challenging because diffusive behaviors are directionless. Inspired by black holes, we construct graded heat-conduction metadevices to achieve thermal trapping, re… ▽ More Black holes are one of the most intriguing predictions of general relativity. So far, metadevices have enabled analogous black holes to trap light or sound in laboratory spacetime. However, trapping heat in a conductive ambient is still challenging because diffusive behaviors are directionless. Inspired by black holes, we construct graded heat-conduction metadevices to achieve thermal trapping, resorting to the imitated advection produced by graded thermal conductivities rather than the trivial solution of using insulation materials to confine thermal diffusion. We experimentally demonstrate thermal trapping for guiding hot spots to diffuse towards the center. Graded heat-conduction metadevices have advantages in energy-efficient thermal regulation because the imitated advection has a similar temperature field effect to the realistic advection that is usually driven by external energy sources. These results also provide insights into correlating transformation thermotics with other disciplines such as cosmology for emerging heat control schemes. △ Less

Submitted 19 August, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Journal ref: National Science Review, 10, nwac159 (2023)

arXiv:2205.06170 [pdf, other]

doi 10.3847/1538-4357/ac6fdb

FAST search for circumstellar atomic hydrogen--I: the young planetary nebula IC 4997

Authors: Xu-Jia Ouyang, Yong Zhang, Albert Zijlstra, Chuan-Peng Zhang, Jun-ichi Nakashima, Quentin A Parker

Abstract: Using the Five-hundred-meter Aperture Spherical radio Telescope (FAST) in Guizhou, China, we detect the 21cm neutral atomic hydrogen absorption in the young planetary nebula IC 4997. The absorption arises from a shell also associated with Na I D lines. The H I shell has a mass of $1.46\times10^{-2}$ M$_\odot$ and a dynamic age of 990yr. The column density of H I is estimated to be… ▽ More Using the Five-hundred-meter Aperture Spherical radio Telescope (FAST) in Guizhou, China, we detect the 21cm neutral atomic hydrogen absorption in the young planetary nebula IC 4997. The absorption arises from a shell also associated with Na I D lines. The H I shell has a mass of $1.46\times10^{-2}$ M$_\odot$ and a dynamic age of 990yr. The column density of H I is estimated to be $7.1\times10^{20}$ cm$^{-2}$, which can be well explained in terms of a photodissociation region around the ionized nebula, limited by self shielding of H$_2$. We find that the atomic-to-ionized hydrogen ratio is 0.6, suggesting that H I substantially contributes to overall nebular mass. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 11 pages, 4 figures, Accepted for publication in ApJ

arXiv:2204.02976 [pdf, other]

doi 10.1109/TMI.2022.3146973

Follow My Eye: Using Gaze to Supervise Computer-Aided Diagnosis

Authors: Sheng Wang, Xi Ouyang, Tianming Liu, Qian Wang, Dinggang Shen

Abstract: When deep neural network (DNN) was first introduced to the medical image analysis community, researchers were impressed by its performance. However, it is evident now that a large number of manually labeled data is often a must to train a properly functioning DNN. This demand for supervision data and labels is a major bottleneck in current medical image analysis, since collecting a large number of… ▽ More When deep neural network (DNN) was first introduced to the medical image analysis community, researchers were impressed by its performance. However, it is evident now that a large number of manually labeled data is often a must to train a properly functioning DNN. This demand for supervision data and labels is a major bottleneck in current medical image analysis, since collecting a large number of annotations from experienced experts can be time-consuming and expensive. In this paper, we demonstrate that the eye movement of radiologists reading medical images can be a new form of supervision to train the DNN-based computer-aided diagnosis (CAD) system. Particularly, we record the tracks of the radiologists' gaze when they are reading images. The gaze information is processed and then used to supervise the DNN's attention via an Attention Consistency module. To the best of our knowledge, the above pipeline is among the earliest efforts to leverage expert eye movement for deep-learning-based CAD. We have conducted extensive experiments on knee X-ray images for osteoarthritis assessment. The results show that our method can achieve considerable improvement in diagnosis performance, with the help of gaze supervision. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2201.04318 [pdf, other]

Knee Cartilage Defect Assessment by Graph Representation and Surface Convolution

Authors: Zixu Zhuang, Liping Si, Sheng Wang, Kai Xuan, Xi Ouyang, Yiqiang Zhan, Zhong Xue, Lichi Zhang, Dinggang Shen, Weiwu Yao, Qian Wang

Abstract: Knee osteoarthritis (OA) is the most common osteoarthritis and a leading cause of disability. Cartilage defects are regarded as major manifestations of knee OA, which are visible by magnetic resonance imaging (MRI). Thus early detection and assessment for knee cartilage defects are important for protecting patients from knee OA. In this way, many attempts have been made on knee cartilage defect as… ▽ More Knee osteoarthritis (OA) is the most common osteoarthritis and a leading cause of disability. Cartilage defects are regarded as major manifestations of knee OA, which are visible by magnetic resonance imaging (MRI). Thus early detection and assessment for knee cartilage defects are important for protecting patients from knee OA. In this way, many attempts have been made on knee cartilage defect assessment by applying convolutional neural networks (CNNs) to knee MRI. However, the physiologic characteristics of the cartilage may hinder such efforts: the cartilage is a thin curved layer, implying that only a small portion of voxels in knee MRI can contribute to the cartilage defect assessment; heterogeneous scanning protocols further challenge the feasibility of the CNNs in clinical practice; the CNN-based knee cartilage evaluation results lack interpretability. To address these challenges, we model the cartilages structure and appearance from knee MRI into a graph representation, which is capable of handling highly diverse clinical data. Then, guided by the cartilage graph representation, we design a non-Euclidean deep learning network with the self-attention mechanism, to extract cartilage features in the local and global, and to derive the final assessment with a visualized result. Our comprehensive experiments show that the proposed method yields superior performance in knee cartilage defect assessment, plus its convenient 3D visualization for interpretability. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: 10 pages, 4 figures

arXiv:2112.13785 [pdf, other]

Experimental unsupervised learning of non-Hermitian knotted phases with solid-state spins

Authors: Yefei Yu, Li-Wei Yu, Wengang Zhang, Huili Zhang, Xiaolong Ouyang, Yanqing Liu, Dong-Ling Deng, L. -M. Duan

Abstract: Non-Hermiticity has widespread applications in quantum physics. It brings about distinct topological phases without Hermitian counterparts, and gives rise to the fundamental challenge of phase classification from both theoretical and experimental aspects. Here we report the first experimental demonstration of unsupervised learning of non-Hermitian topological phases with the nitrogen-vacancy cente… ▽ More Non-Hermiticity has widespread applications in quantum physics. It brings about distinct topological phases without Hermitian counterparts, and gives rise to the fundamental challenge of phase classification from both theoretical and experimental aspects. Here we report the first experimental demonstration of unsupervised learning of non-Hermitian topological phases with the nitrogen-vacancy center platform. In particular, we implement the non-Hermitian twister model, which hosts peculiar knotted topological phases, with a solid-state quantum simulator consisting of an electron spin and a nearby $^{13}$C nuclear spin in a nitrogen-vacancy center in diamond. By tuning the microwave pulses, we efficiently generate a set of experimental data without phase labels. Furthermore, based on the diffusion map method, we cluster this set of experimental raw data into three different knotted phases in an unsupervised fashion without a priori knowledge of the system, which is in sharp contrast to the previously implemented supervised learning phases of matter. Our results showcase the intriguing potential for autonomous classification of exotic unknown topological phases with experimental raw data. △ Less

Submitted 27 December, 2021; originally announced December 2021.

Comments: 7+20 pages

arXiv:2112.12349 [pdf, other]

doi 10.1109/TMI.2020.3042773

Learning Hierarchical Attention for Weakly-supervised Chest X-Ray Abnormality Localization and Diagnosis

Authors: Xi Ouyang, Srikrishna Karanam, Ziyan Wu, Terrence Chen, Jiayu Huo, Xiang Sean Zhou, Qian Wang, Jie-Zhi Cheng

Abstract: We consider the problem of abnormality localization for clinical applications. While deep learning has driven much recent progress in medical imaging, many clinical challenges are not fully addressed, limiting its broader usage. While recent methods report high diagnostic accuracies, physicians have concerns trusting these algorithm results for diagnostic decision-making purposes because of a gene… ▽ More We consider the problem of abnormality localization for clinical applications. While deep learning has driven much recent progress in medical imaging, many clinical challenges are not fully addressed, limiting its broader usage. While recent methods report high diagnostic accuracies, physicians have concerns trusting these algorithm results for diagnostic decision-making purposes because of a general lack of algorithm decision reasoning and interpretability. One potential way to address this problem is to further train these models to localize abnormalities in addition to just classifying them. However, doing this accurately will require a large amount of disease localization annotations by clinical experts, a task that is prohibitively expensive to accomplish for most applications. In this work, we take a step towards addressing these issues by means of a new attention-driven weakly supervised algorithm comprising a hierarchical attention mining framework that unifies activation- and gradient-based visual attention in a holistic manner. Our key algorithmic innovations include the design of explicit ordinal attention constraints, enabling principled model training in a weakly-supervised fashion, while also facilitating the generation of visual-attention-driven model explanations by means of localization cues. On two large-scale chest X-ray datasets (NIH ChestX-ray14 and CheXpert), we demonstrate significant localization performance improvements over the current state of the art while also achieving competitive classification performance. Our code is available on https://github.com/oyxhust/HAM. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Journal ref: IEEE Transactions on Medical Imaging 2021

arXiv:2111.12715 [pdf, other]

doi 10.1038/s41467-022-32611-7

Experimental demonstration of adversarial examples in learning topological phases

Authors: Huili Zhang, Si Jiang, Xin Wang, Wengang Zhang, Xianzhi Huang, Xiaolong Ouyang, Yefei Yu, Yanqing Liu, Dong-Ling Deng, L. -M. Duan

Abstract: Classification and identification of different phases and the transitions between them is a central task in condensed matter physics. Machine learning, which has achieved dramatic success in a wide range of applications, holds the promise to bring unprecedented perspectives for this challenging task. However, despite the exciting progress made along this direction, the reliability of machine-learn… ▽ More Classification and identification of different phases and the transitions between them is a central task in condensed matter physics. Machine learning, which has achieved dramatic success in a wide range of applications, holds the promise to bring unprecedented perspectives for this challenging task. However, despite the exciting progress made along this direction, the reliability of machine-learning approaches likewise demands further investigation. Here, with the nitrogen-vacancy center platform, we report the first proof-of-principle experimental demonstration of adversarial examples in learning topological phases. We show that, after adding a tiny amount of carefully-designed perturbations, the experimentally observed adversarial examples can successfully deceive a splendid phase classifier, whose prediction accuracy is larger than $99.2\%$ on legitimate samples, with a notably high confidence. Our results explicitly showcase the crucial vulnerability aspect of applying machine learning techniques in classifying phases of matter, which provides an indispensable guide for future studies in this interdisciplinary field. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Journal ref: Nature Communications volume 13, Article number: 4993(2022)

arXiv:2111.10827 [pdf, other]

Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning

Authors: Zheren Li, Zhiming Cui, Sheng Wang, Yuji Qi, Xi Ouyang, Qitian Chen, Yuezhi Yang, Zhong Xue, Dinggang Shen, Jie-Zhi Cheng

Abstract: Lesion detection is a fundamental problem in the computer-aided diagnosis scheme for mammography. The advance of deep learning techniques have made a remarkable progress for this task, provided that the training data are large and sufficiently diverse in terms of image style and quality. In particular, the diversity of image style may be majorly attributed to the vendor factor. However, the collec… ▽ More Lesion detection is a fundamental problem in the computer-aided diagnosis scheme for mammography. The advance of deep learning techniques have made a remarkable progress for this task, provided that the training data are large and sufficiently diverse in terms of image style and quality. In particular, the diversity of image style may be majorly attributed to the vendor factor. However, the collection of mammograms from vendors as many as possible is very expensive and sometimes impractical for laboratory-scale studies. Accordingly, to further augment the generalization capability of deep learning model to various vendors with limited resources, a new contrastive learning scheme is developed. Specifically, the backbone network is firstly trained with a multi-style and multi-view unsupervised self-learning scheme for the embedding of invariant features to various vendor-styles. Afterward, the backbone network is then recalibrated to the downstream task of lesion detection with the specific supervised learning. The proposed method is evaluated with mammograms from four vendors and one unseen public dataset. The experimental results suggest that our approach can effectively improve detection performance on both seen and unseen domains, and outperforms many state-of-the-art (SOTA) generalization methods. △ Less

Submitted 21 November, 2021; originally announced November 2021.

Comments: Pages 98-108

Journal ref: International Conference on Medical Image Computing and Computer-Assisted Intervention 2021

arXiv:2111.01677 [pdf, other]

Top1 Solution of QQ Browser 2021 Ai Algorithm Competition Track 1 : Multimodal Video Similarity

Authors: Zhuoran Ma, Majing Lou, Xuan Ouyang

Abstract: In this paper, we describe the solution to the QQ Browser 2021 Ai Algorithm Competition (AIAC) Track 1. We use the multi-modal transformer model for the video embedding extraction. In the pretrain phase, we train the model with three tasks, (1) Video Tag Classification (VTC), (2) Mask Language Modeling (MLM) and (3) Mask Frame Modeling (MFM). In the finetune phase, we train the model with video si… ▽ More In this paper, we describe the solution to the QQ Browser 2021 Ai Algorithm Competition (AIAC) Track 1. We use the multi-modal transformer model for the video embedding extraction. In the pretrain phase, we train the model with three tasks, (1) Video Tag Classification (VTC), (2) Mask Language Modeling (MLM) and (3) Mask Frame Modeling (MFM). In the finetune phase, we train the model with video similarity based on rank normalized human labels. Our full pipeline, after ensembling several models, scores 0.852 on the leaderboard, which we achieved the 1st place in the competition. The source codes have been released at Github. △ Less

Submitted 30 October, 2021; originally announced November 2021.

arXiv:2111.00773 [pdf]

Generating tightly focused perfect optical vortex for ultra-secure optical encryption

Authors: Qingshuai Yang, Zijian Xie, Mengrui Zhang, Xu Ouyang, Yi Xu, Yaoyu Cao, Sicong Wang, Linwei Zhu, Xiangping Li

Abstract: Light's orbital angular momentum (OAM) with inherent mode orthogonality has been suggested as a new way to the optical encryption. However, the dependence of annular intensity profiles on the topological charge complicates nanoscale light-matter interactions and hampers the ultra-secure encryption application. In this paper, we demonstrate ultra-secure image encryption by tightly focusing perfect… ▽ More Light's orbital angular momentum (OAM) with inherent mode orthogonality has been suggested as a new way to the optical encryption. However, the dependence of annular intensity profiles on the topological charge complicates nanoscale light-matter interactions and hampers the ultra-secure encryption application. In this paper, we demonstrate ultra-secure image encryption by tightly focusing perfect optical vortex (POV) beams with controllable annular intensity profiles and OAM states. A simple scheme composed of single spatial light modulator is proposed to generate radius-controllable POV in tightly focused beams. Such focused POV beams with identical intensity profiles but varied OAM states are applied to disorder-coupled gold nanorod aggregates to selectively excite electromagnetic hot spots for encoding information through photothermal deformation. As such, ultra-secure image encryption in OAM states of POV beams in combination with different polarizations can be achieved. Our results lay the ground for diverse nanophotonic applications harnessing the OAM division of POV beams. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2110.14068 [pdf, other]

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

Authors: Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Celine Lin

Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for th… ▽ More Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for the first time that there exist subnetworks with inborn robustness, matching or surpassing the robust accuracy of the adversarially trained networks with comparable model sizes, within randomly initialized networks without any model training, indicating that adversarial training on model weights is not indispensable towards adversarial robustness. We name such subnetworks Robust Scratch Tickets (RSTs), which are also by nature efficient. Distinct from the popular lottery ticket hypothesis, neither the original dense networks nor the identified RSTs need to be trained. To validate and understand this fascinating finding, we further conduct extensive experiments to study the existence and properties of RSTs under different models, datasets, sparsity patterns, and attacks, drawing insights regarding the relationship between DNNs' robustness and their initialization/overparameterization. Furthermore, we identify the poor adversarial transferability between RSTs of different sparsity ratios drawn from the same randomly initialized dense network, and propose a Random RST Switch (R2S) technique, which randomly switches between different RSTs, as a novel defense method built on top of RSTs. We believe our findings about RSTs have opened up a new perspective to study model robustness and extend the lottery ticket hypothesis. △ Less

Submitted 3 January, 2025; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2110.10639 [pdf, other]

Semi-supervised Domain Adaptation for Semantic Segmentation

Authors: Ying Chen, Xu Ouyang, Kaiyue Zhu, Gady Agam

Abstract: Deep learning approaches for semantic segmentation rely primarily on supervised learning approaches and require substantial efforts in producing pixel-level annotations. Further, such approaches may perform poorly when applied to unseen image domains. To cope with these limitations, both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-super… ▽ More Deep learning approaches for semantic segmentation rely primarily on supervised learning approaches and require substantial efforts in producing pixel-level annotations. Further, such approaches may perform poorly when applied to unseen image domains. To cope with these limitations, both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-supervised learning (SSL) with partial supervision have been proposed. While such methods are effective at aligning different feature distributions, there is still a need to efficiently exploit unlabeled data to address the performance gap with respect to fully-supervised methods. In this paper we address semi-supervised domain adaptation (SSDA) for semantic segmentation, where a large amount of labeled source data as well as a small amount of labeled target data are available. We propose a novel and effective two-step semi-supervised dual-domain adaptation (SSDDA) approach to address both cross- and intra-domain gaps in semantic segmentation. The proposed framework is comprised of two mixing modules. First, we conduct a cross-domain adaptation via an image-level mixing strategy, which learns to align the distribution shift of features between the source data and target data. Second, intra-domain adaptation is achieved using a separate student-teacher network which is built to generate category-level data augmentation by mixing unlabeled target data in a way that respects predicted object boundaries. We demonstrate that the proposed approach outperforms state-of-the-art methods on two common synthetic-to-real semantic segmentation benchmarks. An extensive ablation study is provided to further validate the effectiveness of our approach. △ Less

Submitted 20 October, 2021; originally announced October 2021.

arXiv:2107.02137 [pdf, other]

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Authors: Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan Ouyang, Dianhai Yu, Hao Tian, Hua Wu, Haifeng Wang

Abstract: Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, the… ▽ More Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%). △ Less

Submitted 5 July, 2021; originally announced July 2021.

arXiv:2107.01014 [pdf, other]

doi 10.1103/PhysRevB.103.125123

Breakdown of the Hund's Rule in CuFeAs

Authors: Ze-Yi Song, Xiu-Cai Jiang, Xiao-Fang Ouyang, Yu-Zhong Zhang

Abstract: The ground-state properties of CuFeAs were investigated by applying density functional theory calculations within generalized gradient approximation (GGA) and GGA+U. We find that the bicollinear antiferromagnetic state with antiparallel orbital magnetic moments on each iron which violates the Hund's rule is favored by the on-site Coulomb interaction, which is further stabilized by Cu vacancy. The… ▽ More The ground-state properties of CuFeAs were investigated by applying density functional theory calculations within generalized gradient approximation (GGA) and GGA+U. We find that the bicollinear antiferromagnetic state with antiparallel orbital magnetic moments on each iron which violates the Hund's rule is favored by the on-site Coulomb interaction, which is further stabilized by Cu vacancy. The magnetic ground state can be used to understand weak antiferromagnetism in CuFeAs observed experimentally. We argue that breakdown of the Hund's rule may be the possible origin for reduced magnetism in iron pnictides, rather than magnetic fluctuations induced by electronic correlations. △ Less

Submitted 2 July, 2021; originally announced July 2021.

Comments: 8 pages, 5 figures

Journal ref: Phys. Rev. B 103, 125123 (2021)

arXiv:2104.12310 [pdf]

Manipulation of polar vortex chirality in oxide superlattices

Authors: Pan Chen, Congbing Tan, Zhexin Jiang, Peng Gao, Yuanwei Sun, Xiaomei Li, Ruixue Zhu, Lei Liao, Xu Hou, Lifen Wang, Ke Qu, Ning Li, Xiaomin Li, Zhi Xu, Kaihui Liu, Wenlong Wang, Jinbin Wang, Xiaoping Ouyang, Xiangli Zhong, Jie Wang, Xuedong Bai

Abstract: Topological polar vortices that are the electric analogues of magnetic objects, present great potential in applications of future nanoelectronics due to their nanometer size, anomalous dielectric response, and chirality. To enable the functionalities, it is prerequisite to manipulate the polar states and chirality by using external stimuli. Here, we probe the evolutions of polar state and chiralit… ▽ More Topological polar vortices that are the electric analogues of magnetic objects, present great potential in applications of future nanoelectronics due to their nanometer size, anomalous dielectric response, and chirality. To enable the functionalities, it is prerequisite to manipulate the polar states and chirality by using external stimuli. Here, we probe the evolutions of polar state and chirality of polar vortices in PbTiO3/SrTiO3 superlattices under electric field by using atomically resolved in situ scanning transmission electron microscopy and phase-field simulations. We find that the adjacent clockwise and counterclockwise vortex usually have opposite chirality. The phase-field simulations suggest that the rotation reversal or axial polarization switching can lead to the chirality change. Guided by which, we experimentally validate that the vortex rotation direction can be changed by applying and subsequently removing of electric fields, offering a potential strategy to manipulate the vortex chirality. The revealed details of dynamic behavior for individual polar vortices at atomic scale and the proposed strategy for chirality manipulation provide fundamentals for future device applications. △ Less

Submitted 25 April, 2021; originally announced April 2021.

arXiv:2104.10963 [pdf, other]

doi 10.29026/oea.2022.210014

Cylindrical vector beams reveal radiationless anapole condition in a resonant state

Authors: Yudong Lu, Yi Xu, Xu Ouyang, Mingcong Xian, Yaoyu Cao, Kai Chen, Xiangping Li

Abstract: Nonscattering optical anapole condition is corresponding to the excitation of radiationless field distributions in open resonators, which offers new degrees of freedom for tailoring light-matter interaction. Conventional mechanisms for achieving such a condition relies on sophisticated manipulation of electromagnetic multipolar moments of all orders to guarantee superpositions of vanished moment s… ▽ More Nonscattering optical anapole condition is corresponding to the excitation of radiationless field distributions in open resonators, which offers new degrees of freedom for tailoring light-matter interaction. Conventional mechanisms for achieving such a condition relies on sophisticated manipulation of electromagnetic multipolar moments of all orders to guarantee superpositions of vanished moment strengths at the same wavelength. In contrast, here we report on the excitation of optical radiationless anapole hidden in a resonant state of a Si nanoparticle utilizing tightly focused radially polarized (RP) beam. The coexistence of magnetic resonant state and anapole condition at the same wavelength further enables the triggering of resonant state by tightly focused azimuthally polarized (AP) beam whose corresponding electric multipole coefficient could be zero. As a result, high contrast inter-transition between radiationless anapole condition and ideal magnetic resonant scattering can be achieved experimentally in visible spectrum. The proposed mechanism is general which can be realized in different types of nanostructures. Our results showcase that the unique combination of structured light and structured Mie resonances could provide new degrees of freedom for tailoring light-matter interaction, which might shed new light on functional meta-optics. △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: 11 pages, 5figures

Journal ref: Opto-Electron Adv 5, 210014 (2022)

arXiv:2104.08215 [pdf, other]

"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization

Authors: Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

Abstract: Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN). However, the BN layer is costly to calculate and is typically implemented with non-binary parameters, leaving a hurdle for the efficient implementation of BNN training. It also introduces undesirable dependence between samples within each batch. Inspired by the latest advance o… ▽ More Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN). However, the BN layer is costly to calculate and is typically implemented with non-binary parameters, leaving a hurdle for the efficient implementation of BNN training. It also introduces undesirable dependence between samples within each batch. Inspired by the latest advance on Batch Normalization Free (BN-Free) training, we extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes. By plugging in and customizing techniques including adaptive gradient clipping, scale weight standardization, and specialized bottleneck block, a BN-free BNN is capable of maintaining competitive accuracy compared to its BN-based counterpart. Extensive experiments validate the effectiveness of our proposal across diverse BNN backbones and datasets. For example, after removing BNs from the state-of-the-art ReActNets, it can still be trained with our proposed methodology to achieve 92.08%, 68.34%, and 68.0% accuracy on CIFAR-10, CIFAR-100, and ImageNet respectively, with marginal performance drop (0.23%~0.44% on CIFAR and 1.40% on ImageNet). Codes and pre-trained models are available at: https://github.com/VITA-Group/BNN_NoBN. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2104.05903 [pdf]

Near-perfect fidelity polarization-encoded multilayer optical data storage based on aligned gold nanorods

Authors: Linwei Zhu, Yaoyu Cao, Qiuqun Chen, Xu Ouyang, Yi Xu, Zhongliang Hu, Jianrong Qiu, Xiangping Li

Abstract: Encoding information in light polarization is of great importance in facilitating optical data storage (ODS) for information security and data storage capacity escalation. However, despite recent advances in nanophotonic techniques vastly enhancing the feasibility of applying polarization channels, the data fidelity in reconstructed bits has been constrained by severe crosstalks occurring between… ▽ More Encoding information in light polarization is of great importance in facilitating optical data storage (ODS) for information security and data storage capacity escalation. However, despite recent advances in nanophotonic techniques vastly enhancing the feasibility of applying polarization channels, the data fidelity in reconstructed bits has been constrained by severe crosstalks occurring between varied polarization angles during data recording and reading process, which gravely hindered the utilization of this technique in practice. In this paper, we demonstrate an ultra-low crosstalk polarization-encoding multilayer optical data storage technique for high-fidelity data recording and retrieving by utilizing a nanofibre-based nanocomposite film involving highly aligned gold nanorods (GNRs). With parallelizing the gold nanorods in the recording medium, the information carrier configuration minimizes miswriting and misreading possibilities for information input and output, respectively, compared with its randomly self-assembled counterparts. The enhanced data accuracy has significantly improved the bit recall fidelity that is quantified by a correlation coefficient higher than 0.99. It is anticipated that the demonstrated technique can facilitate the development of multiplexing ODS for a greener future. △ Less

Submitted 12 April, 2021; originally announced April 2021.

arXiv:2102.12696 [pdf]

Resolving ultrahigh-contrast ultrashort pulses with single-shot cross-correlator at the photon noise limit

Authors: Jingui Ma, Peng Yuan, Xiaoping Ouyang, Jing Wang, Guoqiang Xie, Liejia Qian

Abstract: In strong-field physics experiments with intense lasers, it is of paramount importance to single-shot diagnose the temporal contrast between laser pulse peak and its noise pedestal. This allows fast optimization of pulse contrast and meaningful comparison with theory for each pulse shot, and it can help new outcomes from clean laser-plasma interactions. Thus far, high contrast ratios up to ~10^10,… ▽ More In strong-field physics experiments with intense lasers, it is of paramount importance to single-shot diagnose the temporal contrast between laser pulse peak and its noise pedestal. This allows fast optimization of pulse contrast and meaningful comparison with theory for each pulse shot, and it can help new outcomes from clean laser-plasma interactions. Thus far, high contrast ratios up to ~10^10, required by present petawatt (PW) class lasers, have been accessible in both generation and single-shot characterization. However, ultrahigh contrast ~10^13, required by the planned 200-PW lasers, challenges intense laser technology and remains an open question. This paper reports on the first demonstration of such an ultrahigh-contrast measurement by adapting single-shot cross-correlator (SSCC). We introduce an ultrafast method that enables to determine the SSCC detection limit. Our strategy mimics the test laser having known ultrahigh contrast in the measurement frame of time-to-space mapping. The ultimate contrast-measurement limit of 10^13 is achieved, which corresponds to the highest pulse intensity set by SSCC damage threshold and the lowest noise pedestal set by single-photon detection. As a consequence, photon noise in the detection is observed and increases as the noise pedestal reduces. The demonstrated measurement ability at the photon noise limit is applied to a high-contrast laser system based on second-harmonic generation and optical parametric chirped-pulse amplification, suggesting accessible of ultrahigh contrast pulses. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 12 pages, 4 figures

arXiv:2101.10156 [pdf, other]

Mask-based Data Augmentation for Semi-supervised Semantic Segmentation

Authors: Ying Chen, Xu Ouyang, Kaiyue Zhu, Gady Agam

Abstract: Semantic segmentation using convolutional neural networks (CNN) is a crucial component in image analysis. Training a CNN to perform semantic segmentation requires a large amount of labeled data, where the production of such labeled data is both costly and labor intensive. Semi-supervised learning algorithms address this issue by utilizing unlabeled data and so reduce the amount of labeled data nee… ▽ More Semantic segmentation using convolutional neural networks (CNN) is a crucial component in image analysis. Training a CNN to perform semantic segmentation requires a large amount of labeled data, where the production of such labeled data is both costly and labor intensive. Semi-supervised learning algorithms address this issue by utilizing unlabeled data and so reduce the amount of labeled data needed for training. In particular, data augmentation techniques such as CutMix and ClassMix generate additional training data from existing labeled data. In this paper we propose a new approach for data augmentation, termed ComplexMix, which incorporates aspects of CutMix and ClassMix with improved performance. The proposed approach has the ability to control the complexity of the augmented data while attempting to be semantically-correct and address the tradeoff between complexity and correctness. The proposed ComplexMix approach is evaluated on a standard dataset for semantic segmentation and compared to other state-of-the-art techniques. Experimental results show that our method yields improvement over state-of-the-art methods on standard datasets for semantic image segmentation. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2012.15674 [pdf, other]

ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora

Authors: Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Abstract: Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. This improvement benefits from learning a large amount of monolingual and parallel corpora. Although it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of paralle… ▽ More Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. This improvement benefits from learning a large amount of monolingual and parallel corpora. Although it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of parallel corpora, especially for low-resource languages. In this paper, we propose ERNIE-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance. Our key insight is to integrate back-translation into the pre-training process. We generate pseudo-parallel sentence pairs on a monolingual corpus to enable the learning of semantic alignments between different languages, thereby enhancing the semantic modeling of cross-lingual models. Experimental results show that ERNIE-M outperforms existing cross-lingual models and delivers new state-of-the-art results in various cross-lingual downstream tasks. △ Less

Submitted 17 September, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

Comments: Accepted by EMNLP 2021 (main conference, long paper)

arXiv:2012.09191 [pdf, other]

doi 10.1103/PhysRevLett.127.090501

Observation of non-Hermitian topology with non-unitary dynamics of solid-state spins

Authors: Wengang Zhang, Xiaolong Ouyang, Xianzhi Huang, Xin Wang, Huili Zhang, Yefei Yu, Xiuying Chang, Yanqing Liu, Dong-Ling Deng, L. -M. Duan

Abstract: Non-Hermitian topological phases exhibit a number of exotic features that have no Hermitian counterparts, including the skin effect and breakdown of the conventional bulk-boundary correspondence. Here, we implement the non-Hermitian Su-Schrieffer-Heeger (SSH) Hamiltonian, which is a prototypical model for studying non-Hermitian topological phases, with a solid-state quantum simulator consisting of… ▽ More Non-Hermitian topological phases exhibit a number of exotic features that have no Hermitian counterparts, including the skin effect and breakdown of the conventional bulk-boundary correspondence. Here, we implement the non-Hermitian Su-Schrieffer-Heeger (SSH) Hamiltonian, which is a prototypical model for studying non-Hermitian topological phases, with a solid-state quantum simulator consisting of an electron spin and a $^{13}$C nuclear spin in a nitrogen-vacancy (NV) center in a diamond. By employing a dilation method, we realize the desired non-unitary dynamics for the electron spin and map out its spin texture in the momentum space, from which the corresponding topological invariant can be obtained directly. Our result paves the way for further exploiting and understanding the intriguing properties of non-Hermitian topological phases with solid-state spins or other quantum simulation platforms. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: 10 pages, 6 figures

Journal ref: Phys. Rev. Lett. 127, 090501 (2021)

arXiv:2012.02264 [pdf, other]

Domain Adaptation on Semantic Segmentation for Aerial Images

Authors: Ying Chen, Xu Ouyang, Kaiyue Zhu, Gady Agam

Abstract: Semantic segmentation has achieved significant advances in recent years. While deep neural networks perform semantic segmentation well, their success rely on pixel level supervision which is expensive and time-consuming. Further, training using data from one domain may not generalize well to data from a new domain due to a domain gap between data distributions in the different domains. This domain… ▽ More Semantic segmentation has achieved significant advances in recent years. While deep neural networks perform semantic segmentation well, their success rely on pixel level supervision which is expensive and time-consuming. Further, training using data from one domain may not generalize well to data from a new domain due to a domain gap between data distributions in the different domains. This domain gap is particularly evident in aerial images where visual appearance depends on the type of environment imaged, season, weather, and time of day when the environment is imaged. Subsequently, this distribution gap leads to severe accuracy loss when using a pretrained segmentation model to analyze new data with different characteristics. In this paper, we propose a novel unsupervised domain adaptation framework to address domain shift in the context of aerial semantic image segmentation. To this end, we solve the problem of domain shift by learn the soft label distribution difference between the source and target domains. Further, we also apply entropy minimization on the target domain to produce high-confident prediction rather than using high-confident prediction by pseudo-labeling. We demonstrate the effectiveness of our domain adaptation framework using the challenge image segmentation dataset of ISPRS, and show improvement over state-of-the-art methods in terms of various metrics. △ Less

Submitted 11 December, 2020; v1 submitted 3 December, 2020; originally announced December 2020.

arXiv:2010.10458 [pdf, other]

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

Authors: Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu

Abstract: Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances, traditional state-of-the-art distributed training systems cannot scale well in training large-scale models. In this paper, we propose a new computing and communicatio… ▽ More Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances, traditional state-of-the-art distributed training systems cannot scale well in training large-scale models. In this paper, we propose a new computing and communication efficient top-k sparsification communication library for distributed training. To further improve the system scalability, we optimize I/O by proposing a simple yet efficient multi-level data caching mechanism and optimize the update operation by introducing a novel parallel tensor operator. Experimental results on a 16-node Tencent Cloud cluster (each node with 8 Nvidia Tesla V100 GPUs) show that our system achieves 25%-40% faster than existing state-of-the-art systems on CNNs and Transformer. We finally break the record on DAWNBench on training ResNet-50 to 93% top-5 accuracy on ImageNet. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Comments: 13 pages

arXiv:2010.03542 [pdf, other]

Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

Authors: Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun

Abstract: This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised mod… ▽ More This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification. △ Less

Submitted 7 October, 2020; originally announced October 2020.

Comments: 8 pages, 2 figures, 6 tables. Accepted at Proceedings of 14th International Workshop on Semantic Evaluation (SemEval-2020)

arXiv:2009.12307 [pdf, other]

doi 10.1088/1361-6587/abcf7e

Bremsstrahlung emission and plasma characterization driven by moderately relativistic laser-plasma interactions

Authors: Sushil Singh, Chris D. Armstrong, Ning Kang, Lei Ren, Huiya Liu, Neng Hua, Dean R. Rusby, Ondřej Klimo, Roberto Versaci, Yan Zhang, Mingying Sun, Baoqiang Zhu, Anle Lei, Xiaoping Ouyang, Livia Lancia, Alejandro Laso Garcia, Andreas Wagner, Thomas Cowan, Jianqiang Zhu, Theodor Schlegel, Stefan Weber, Paul McKenna, David Neely, Vladimir Tikhonchuk, Deepak Kumar

Abstract: Relativistic electrons generated by the interaction of petawatt-class short laser pulses with solid targets can be used to generate bright X-rays via bremsstrahlung. The efficiency of laser energy transfer into these electrons depends on multiple parameters including the focused intensity and pre-plasma level. This paper reports experimental results from the interaction of a high intensity petawat… ▽ More Relativistic electrons generated by the interaction of petawatt-class short laser pulses with solid targets can be used to generate bright X-rays via bremsstrahlung. The efficiency of laser energy transfer into these electrons depends on multiple parameters including the focused intensity and pre-plasma level. This paper reports experimental results from the interaction of a high intensity petawatt-class glass laser pulses with solid targets at a maximum intensity of $10^{19}$ W/cm$^2$. In-situ measurements of specularly reflected light are used to provide an upper bound of laser absorption and to characterize focused laser intensity, the pre-plasma level and the generation mechanism of second harmonic light. The measured spectrum of electrons and bremsstrahlung radiation provide information about the efficiency of laser energy transfer. △ Less

Submitted 25 September, 2020; originally announced September 2020.

arXiv:2009.03706 [pdf, other]

ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model

Authors: Zhengjie Huang, Shikun Feng, Weiyue Su, Xuyi Chen, Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun

Abstract: This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media. Given a sentence, we are asked to find out the most important words as the suggestion for automated design. We leverage the unsupervised pre-training model and finetune these models on our task. After our investigation, we found that th… ▽ More This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media. Given a sentence, we are asked to find out the most important words as the suggestion for automated design. We leverage the unsupervised pre-training model and finetune these models on our task. After our investigation, we found that the following models achieved an excellent performance in this task: ERNIE 2.0, XLM-ROBERTA, ROBERTA and ALBERT. We combine a pointwise regression loss and a pairwise ranking loss which is more close to the final M atchm metric to finetune our models. And we also find that additional feature engineering and data augmentation can help improve the performance. Our best model achieves the highest score of 0.823 and ranks first for all kinds of metrics △ Less

Submitted 8 September, 2020; originally announced September 2020.

arXiv:2009.03673 [pdf, ps, other]

kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification

Authors: Jiaxiang Liu, Xuyi Chen, Shikun Feng, Shuohuan Wang, Xuan Ouyang, Yu Sun, Zhengjie Huang, Weiyue Su

Abstract: Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language. With the increasing communication between groups with different languages, this phenomenon is more and more popular. However, there are little research and data in this area, especially in code-mixing sentiment classification. In this work, the domain transfer learnin… ▽ More Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language. With the increasing communication between groups with different languages, this phenomenon is more and more popular. However, there are little research and data in this area, especially in code-mixing sentiment classification. In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset, and surprisingly, a strong baseline is achieved. Furthermore, the adversarial training with a multi-lingual model is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment classification competition. △ Less

Submitted 8 September, 2020; v1 submitted 8 September, 2020; originally announced September 2020.

arXiv:2008.12463 [pdf, other]

Accelerated WGAN update strategy with loss change rate balancing

Authors: Xu Ouyang, Gady Agam

Abstract: Optimizing the discriminator in Generative Adversarial Networks (GANs) to completion in the inner training loop is computationally prohibitive, and on finite datasets would result in overfitting. To address this, a common update strategy is to alternate between k optimization steps for the discriminator D and one optimization step for the generator G. This strategy is repeated in various GAN algor… ▽ More Optimizing the discriminator in Generative Adversarial Networks (GANs) to completion in the inner training loop is computationally prohibitive, and on finite datasets would result in overfitting. To address this, a common update strategy is to alternate between k optimization steps for the discriminator D and one optimization step for the generator G. This strategy is repeated in various GAN algorithms where k is selected empirically. In this paper, we show that this update strategy is not optimal in terms of accuracy and convergence speed, and propose a new update strategy for Wasserstein GANs (WGAN) and other GANs using the WGAN loss(e.g. WGAN-GP, Deblur GAN, and Super-resolution GAN). The proposed update strategy is based on a loss change ratio comparison of G and D. We demonstrate that the proposed strategy improves both convergence speed and accuracy. △ Less

Submitted 2 November, 2020; v1 submitted 27 August, 2020; originally announced August 2020.

arXiv:2007.13792 [pdf]

Experimental density radiography of Wudalianchi volcano with cosmic ray muons

Authors: Y. Cheng, R. Han, Z. Li, J. Li, J. Li, W. Gu, X. Yang, X. Ouyang, B. Liao

Abstract: Muon radiography is a promising technique to image the internal density structures upto a few hundred meters scale, such as tunnels, pyramids and volcanos, by measuring the flux attenuation of cosmic ray muons after trvaling through these targets. In this study, we conducted an experimantal cosmic ray muon radiography of the Wudalianchi volcano in northeast China for imaging its internal density s… ▽ More Muon radiography is a promising technique to image the internal density structures upto a few hundred meters scale, such as tunnels, pyramids and volcanos, by measuring the flux attenuation of cosmic ray muons after trvaling through these targets. In this study, we conducted an experimantal cosmic ray muon radiography of the Wudalianchi volcano in northeast China for imaging its internal density structures. The muon detector used in this study is made of plastic scintillator and silicon photomultiplier. After about one and a half month observation for the Laoheishan volcano cone in the Wudalianchi volcano, from September 23rd to November 10th, 2019, more than 3 million muon tracks passing the data selection criteria are obtained. Based on the muon observations and the high-resoluiton topography from aerial photogrammetry by unmanned aerial vehicle, the relative density image of the Laoheishan volcano cone is obtained. The experiment in this study is the first muon radiography of volcano performed in China, and the results suggest the feasibility of radiography technique based on plastic scintillator muon detector. As a new passive geophysical imaging method, cosmic ray muon radiography could become a promising method to obtain the high-resoution 2-D and 3-D density structures for shallow geological targets. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Showing 51–100 of 127 results for author: Ouyang, X