-
Evolution of Optimization Algorithms for Global Placement via Large Language Models
Authors:
Xufeng Yao,
Jiaxi Jiang,
Yuxuan Zhao,
Peiyu Liao,
Yibo Lin,
Bei Yu
Abstract:
Optimization algorithms are widely employed to tackle complex problems, but designing them manually is often labor-intensive and requires significant expertise. Global placement is a fundamental step in electronic design automation (EDA). While analytical approaches represent the state-of-the-art (SOTA) in global placement, their core optimization algorithms remain heavily dependent on heuristics…
▽ More
Optimization algorithms are widely employed to tackle complex problems, but designing them manually is often labor-intensive and requires significant expertise. Global placement is a fundamental step in electronic design automation (EDA). While analytical approaches represent the state-of-the-art (SOTA) in global placement, their core optimization algorithms remain heavily dependent on heuristics and customized components, such as initialization strategies, preconditioning methods, and line search techniques. This paper presents an automated framework that leverages large language models (LLM) to evolve optimization algorithms for global placement. We first generate diverse candidate algorithms using LLM through carefully crafted prompts. Then we introduce an LLM-based genetic flow to evolve selected candidate algorithms. The discovered optimization algorithms exhibit substantial performance improvements across many benchmarks. Specifically, Our design-case-specific discovered algorithms achieve average HPWL improvements of \textbf{5.05\%}, \text{5.29\%} and \textbf{8.30\%} on MMS, ISPD2005 and ISPD2019 benchmarks, and up to \textbf{17\%} improvements on individual cases. Additionally, the discovered algorithms demonstrate good generalization ability and are complementary to existing parameter-tuning methods.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection
Authors:
Di Wu,
Feng Yang,
Benlian Xu,
Pan Liao,
Wenhui Zhao,
Dingwen Zhang
Abstract:
The application of vision-based multi-view environmental perception system has been increasingly recognized in autonomous driving technology, especially the BEV-based models. Current state-of-the-art solutions primarily encode image features from each camera view into the BEV space through explicit or implicit depth prediction. However, these methods often focus on improving the accuracy of projec…
▽ More
The application of vision-based multi-view environmental perception system has been increasingly recognized in autonomous driving technology, especially the BEV-based models. Current state-of-the-art solutions primarily encode image features from each camera view into the BEV space through explicit or implicit depth prediction. However, these methods often focus on improving the accuracy of projecting 2D features into corresponding depth regions, while overlooking the highly structured information of real-world objects and the varying height distributions of objects across different scenes. In this work, we propose HV-BEV, a novel approach that decouples feature sampling in the BEV grid queries paradigm into horizontal feature aggregation and vertical adaptive height-aware reference point sampling, aiming to improve both the aggregation of objects' complete information and generalization to diverse road environments. Specifically, we construct a learnable graph structure in the horizontal plane aligned with the ground for 3D reference points, reinforcing the association of the same instance across different BEV grids, especially when the instance spans multiple image views around the vehicle. Additionally, instead of relying on uniform sampling within a fixed height range, we introduce a height-aware module that incorporates historical information, enabling the reference points to adaptively focus on the varying heights at which objects appear in different scenes. Extensive experiments validate the effectiveness of our proposed method, demonstrating its superior performance over the baseline across the nuScenes dataset. Moreover, our best-performing model achieves a remarkable 50.5% mAP and 59.8% NDS on the nuScenes testing set.
△ Less
Submitted 30 December, 2024; v1 submitted 25 December, 2024;
originally announced December 2024.
-
BGTplanner: Maximizing Training Accuracy for Differentially Private Federated Recommenders via Strategic Privacy Budget Allocation
Authors:
Xianzhi Zhang,
Yipeng Zhou,
Miao Hu,
Di Wu,
Pengshan Liao,
Mohsen Guizani,
Michael Sheng
Abstract:
To mitigate the rising concern about privacy leakage, the federated recommender (FR) paradigm emerges, in which decentralized clients co-train the recommendation model without exposing their raw user-item rating data. The differentially private federated recommender (DPFR) further enhances FR by injecting differentially private (DP) noises into clients. Yet, current DPFRs, suffering from noise dis…
▽ More
To mitigate the rising concern about privacy leakage, the federated recommender (FR) paradigm emerges, in which decentralized clients co-train the recommendation model without exposing their raw user-item rating data. The differentially private federated recommender (DPFR) further enhances FR by injecting differentially private (DP) noises into clients. Yet, current DPFRs, suffering from noise distortion, cannot achieve satisfactory accuracy. Various efforts have been dedicated to improving DPFRs by adaptively allocating the privacy budget over the learning process. However, due to the intricate relation between privacy budget allocation and model accuracy, existing works are still far from maximizing DPFR accuracy. To address this challenge, we develop BGTplanner (Budget Planner) to strategically allocate the privacy budget for each round of DPFR training, improving overall training performance. Specifically, we leverage the Gaussian process regression and historical information to predict the change in recommendation accuracy with a certain allocated privacy budget. Additionally, Contextual Multi-Armed Bandit (CMAB) is harnessed to make privacy budget allocation decisions by reconciling the current improvement and long-term privacy constraints. Our extensive experimental results on real datasets demonstrate that \emph{BGTplanner} achieves an average improvement of 6.76\% in training performance compared to state-of-the-art baselines.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
FastTrackTr:Towards Fast Multi-Object Tracking with Transformers
Authors:
Pan Liao,
Feng Yang,
Di Wu,
Jinwen Yu,
Wenhui Zhao,
Bo Liu
Abstract:
Transformer-based multi-object tracking (MOT) methods have captured the attention of many researchers in recent years. However, these models often suffer from slow inference speeds due to their structure or other issues. To address this problem, we revisited the Joint Detection and Tracking (JDT) method by looking back at past approaches. By integrating the original JDT approach with some advanced…
▽ More
Transformer-based multi-object tracking (MOT) methods have captured the attention of many researchers in recent years. However, these models often suffer from slow inference speeds due to their structure or other issues. To address this problem, we revisited the Joint Detection and Tracking (JDT) method by looking back at past approaches. By integrating the original JDT approach with some advanced theories, this paper employs an efficient method of information transfer between frames on the DETR, constructing a fast and novel JDT-type MOT framework: FastTrackTr. Thanks to the superiority of this information transfer method, our approach not only reduces the number of queries required during tracking but also avoids the excessive introduction of network structures, ensuring model simplicity. Experimental results indicate that our method has the potential to achieve real-time tracking and exhibits competitive tracking accuracy across multiple datasets.
△ Less
Submitted 6 March, 2025; v1 submitted 24 November, 2024;
originally announced November 2024.
-
Evaluation of waterway lock service quality in Yangtze Delta: from the perspectives of customer and supplier
Authors:
Wenzhang Yang,
Peng Liao,
Shangkun Jiang,
Hao Wang
Abstract:
In recent decades, the waterway locks in the Yangtze Delta, China, have become major traffic bottlenecks. To gain a comprehensive understanding of the crew's perspectives and primary concerns regarding lock services during vessel lockage, and to enhance customer satisfaction and improve vessel lockage efficiency, it is necessary to assess the waterway lock service quality (WLSQ). This paper presen…
▽ More
In recent decades, the waterway locks in the Yangtze Delta, China, have become major traffic bottlenecks. To gain a comprehensive understanding of the crew's perspectives and primary concerns regarding lock services during vessel lockage, and to enhance customer satisfaction and improve vessel lockage efficiency, it is necessary to assess the waterway lock service quality (WLSQ). This paper presents an evaluation system for WLSQ from various stakeholders' viewpoints. Firstly, by employing questionnaire surveys and the structural equation model method, in conjunction with factor analysis, the WLSQ and its influencing factors in the Yangtze River Delta region are analyzed from a customer perspective. Secondly, the Analytic Hierarchy Process method is utilized, along with a dedicated questionnaire for service suppliers, to examine their concerns regarding the performance of vessel lock services. The findings indicate that there exists a cognitive bias towards factors influencing the WLSQ. Crew members express the greatest concern over vessel lockage delays, whereas vessel lockage safety is the primary concern for management department administrators. Furthermore, enhancing the supporting facilities of waterway locks can significantly increase crew members' satisfaction during vessel lockage. Improving staff skills, and safety conditions can also greatly enhance customers' tolerance for lockage delays. The results of this study will provide valuable insights for the lock management department, operators, and the government in formulating relevant policies to improve WLSQ and implementing ongoing service quality evaluations.
△ Less
Submitted 21 September, 2024;
originally announced October 2024.
-
Analysis of vessel traffic flow characteristics in inland restricted waterways using multi-source data
Authors:
Wenzhang Yang,
Peng Liao,
Shangkun Jiang,
Hao Wang
Abstract:
To effectively manage vessel traffic and alleviate congestion on busy inland waterways, a comprehensive understanding of vessel traffic flow characteristics is crucial. However, limited data availability has resulted in minimal research on the traffic flow characteristics of inland waterway vessels. This study addresses this gap by conducting vessel-following experiments and fixed-point video moni…
▽ More
To effectively manage vessel traffic and alleviate congestion on busy inland waterways, a comprehensive understanding of vessel traffic flow characteristics is crucial. However, limited data availability has resulted in minimal research on the traffic flow characteristics of inland waterway vessels. This study addresses this gap by conducting vessel-following experiments and fixed-point video monitoring in inland waterways, collecting multi-source data to analyze vessel traffic flow characteristics. First, the analysis of vessel speed distribution identifies the economic speed for vessels operating in these environments. Next, the relationship between microscopic vessel speed and gap distance is examined, with the logarithmic model emerging as the most accurate among various tested models. Additionally, the study explores the relationships among macroscopic speed, density, and flow rate, proposing a novel piecewise fundamental diagram model to describe these relationships. Lastly, the inland vessel traffic states are categorized using K-means clustering algorithm and applied to vessel navigation services. These findings provide valuable insights for enhancing inland waterway transportation and advancing the development of an integrated waterway transportation system.
△ Less
Submitted 21 September, 2024;
originally announced October 2024.
-
End-User-Centric Collaborative MIMO: Performance Analysis and Proof of Concept
Authors:
Chao-Kai Wen,
Yen-Cheng Chan,
Tzu-Hao Huang,
Hao-Jun Zeng,
Fu-Kang Wang,
Lung-Sheng Tsai,
Pei-Kai Liao
Abstract:
The trend toward using increasingly large arrays of antenna elements continues. However, fitting more antennas into the limited space available on user equipment (UE) within the currently popular Frequency Range 1 spectrum presents a significant challenge. This limitation constrains the capacity-scaling gains for end users, even when networks support a higher number of antennas. To address this is…
▽ More
The trend toward using increasingly large arrays of antenna elements continues. However, fitting more antennas into the limited space available on user equipment (UE) within the currently popular Frequency Range 1 spectrum presents a significant challenge. This limitation constrains the capacity-scaling gains for end users, even when networks support a higher number of antennas. To address this issue, we explore a user-centric collaborative MIMO approach, termed UE-CoMIMO, which leverages several fixed or portable devices within a personal area to form a virtually expanded antenna array. This paper develops a comprehensive mathematical framework to analyze the performance of UE-CoMIMO. Our analytical results demonstrate that UE-CoMIMO can significantly enhance the system's effective channel response within the current communication system without requiring extensive modifications. Further performance improvements can be achieved by optimizing the phase shifters on the expanded antenna arrays at the collaborative devices. These findings are corroborated by ray-tracing simulations. Beyond the simulations, we implemented these collaborative devices and successfully conducted over-the-air validation in a real 5G environment, showcasing the practical potential of UE-CoMIMO. Several practical perspectives are discussed, highlighting the feasibility and benefits of this approach in real-world scenarios.
△ Less
Submitted 24 December, 2024; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Federated One-Shot Ensemble Clustering
Authors:
Rui Duan,
Xin Xiong,
Jueyi Liu,
Katherine P. Liao,
Tianxi Cai
Abstract:
Cluster analysis across multiple institutions poses significant challenges due to data-sharing restrictions. To overcome these limitations, we introduce the Federated One-shot Ensemble Clustering (FONT) algorithm, a novel solution tailored for multi-site analyses under such constraints. FONT requires only a single round of communication between sites and ensures privacy by exchanging only fitted m…
▽ More
Cluster analysis across multiple institutions poses significant challenges due to data-sharing restrictions. To overcome these limitations, we introduce the Federated One-shot Ensemble Clustering (FONT) algorithm, a novel solution tailored for multi-site analyses under such constraints. FONT requires only a single round of communication between sites and ensures privacy by exchanging only fitted model parameters and class labels. The algorithm combines locally fitted clustering models into a data-adaptive ensemble, making it broadly applicable to various clustering techniques and robust to differences in cluster proportions across sites. Our theoretical analysis validates the effectiveness of the data-adaptive weights learned by FONT, and simulation studies demonstrate its superior performance compared to existing benchmark methods. We applied FONT to identify subgroups of patients with rheumatoid arthritis across two health systems, revealing improved consistency of patient clusters across sites, while locally fitted clusters proved less transferable. FONT is particularly well-suited for real-world applications with stringent communication and privacy constraints, offering a scalable and practical solution for multi-site clustering.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Multi-Agent Deep Reinforcement Learning for Energy Efficient Multi-Hop STAR-RIS-Assisted Transmissions
Authors:
Pei-Hsiang Liao,
Li-Hsiang Shen,
Po-Chen Wu,
Kai-Ten Feng
Abstract:
Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) provides a promising way to expand coverage in wireless communications. However, limitation of single STAR-RIS inspire us to integrate the concept of multi-hop transmissions, as focused on RIS in existing research. Therefore, we propose the novel architecture of multi-hop STAR-RISs to achieve a wider range of…
▽ More
Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) provides a promising way to expand coverage in wireless communications. However, limitation of single STAR-RIS inspire us to integrate the concept of multi-hop transmissions, as focused on RIS in existing research. Therefore, we propose the novel architecture of multi-hop STAR-RISs to achieve a wider range of full-plane service coverage. In this paper, we intend to solve active beamforming of the base station and passive beamforming of STAR-RISs, aiming for maximizing the energy efficiency constrained by hardware limitation of STAR-RISs. Furthermore, we investigate the impact of the on-off state of STAR-RIS elements on energy efficiency. To tackle the complex problem, a Multi-Agent Global and locAl deep Reinforcement learning (MAGAR) algorithm is designed. The global agent elevates the collaboration among local agents, which focus on individual learning. In numerical results, we observe the significant improvement of MAGAR compared to the other benchmarks, including Q-learning, multi-agent deep Q network (DQN) with golbal reward, and multi-agent DQN with local rewards. Moreover, the proposed architecture of multi-hop STAR-RISs achieves the highest energy efficiency compared to mode switching based STAR-RISs, conventional RISs and deployment without RISs or STAR-RISs.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
MO-EMT-NAS: Multi-Objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets
Authors:
Peng Liao,
XiLu Wang,
Yaochu Jin,
WenLi Du
Abstract:
Deploying models across diverse devices demands tradeoffs among multiple objectives due to different resource constraints. Arguably, due to the small model trap problem in multi-objective neural architecture search (MO-NAS) based on a supernet, existing approaches may fail to maintain large models. Moreover, multi-tasking neural architecture search (MT-NAS) excels in handling multiple tasks simult…
▽ More
Deploying models across diverse devices demands tradeoffs among multiple objectives due to different resource constraints. Arguably, due to the small model trap problem in multi-objective neural architecture search (MO-NAS) based on a supernet, existing approaches may fail to maintain large models. Moreover, multi-tasking neural architecture search (MT-NAS) excels in handling multiple tasks simultaneously, but most existing efforts focus on tasks from the same dataset, limiting their practicality in real-world scenarios where multiple tasks may come from distinct datasets. To tackle the above challenges, we propose a Multi-Objective Evolutionary Multi-Tasking framework for NAS (MO-EMT-NAS) to achieve architectural knowledge transfer across tasks from different datasets while finding Pareto optimal architectures for multi-objectives, model accuracy and computational efficiency. To alleviate the small model trap issue, we introduce an auxiliary objective that helps maintain multiple larger models of similar accuracy. Moreover, the computational efficiency is further enhanced by parallelizing the training and validation of the weight-sharing-based supernet. Experimental results on seven datasets with two, three, and four task combinations show that MO-EMT-NAS achieves a better minimum classification error while being able to offer flexible trade-offs between model performance and complexity, compared to the state-of-the-art single-objective MT-NAS algorithms. The runtime of MO-EMT-NAS is reduced by 59.7% to 77.7%, compared to the corresponding multi-objective single-task approaches.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation
Authors:
Qilai Zhang,
Jiawen Li,
Peiran Liao,
Jiali Hu,
Tian Guan,
Anjia Han,
Yonghong He
Abstract:
The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereb…
▽ More
The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereby improving the image quality for diagnostic purposes. In this paper, we propose Diffusion-FFPE, a method for FF-to-FFPE histopathological image translation using a pre-trained diffusion model. Specifically, we utilize a one-step diffusion model as the generator, which we fine-tune using LoRA adapters within an adversarial learning framework. To enable the model to effectively capture both global structural patterns and local details, we introduce a multi-scale feature fusion module that leverages two VAE encoders to extract features at different image resolutions, performing feature fusion before inputting them into the UNet. Additionally, a pre-trained vision-language model for histopathology serves as the backbone for the discriminator, enhancing model performance. Our FF-to-FFPE translation experiments on the TCGA-NSCLC dataset demonstrate that the proposed approach outperforms existing methods. The code and models are released at https://github.com/QilaiZhang/Diffusion-FFPE.
△ Less
Submitted 13 November, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Authors:
Byungsoo Jeon,
Mengdi Wu,
Shiyi Cao,
Sunghyun Kim,
Sunghyun Park,
Neeraj Aggarwal,
Colin Unger,
Daiyaan Arfeen,
Peiyuan Liao,
Xupeng Miao,
Mohammad Alizadeh,
Gregory R. Ganger,
Tianqi Chen,
Zhihao Jia
Abstract:
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c…
▽ More
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting in reduced memory requirement and improved GPU performance. In addition, we develop GraphPipe, a distributed system that exploits GPP strategies to enable performant and scalable DNN training. GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies. Evaluation on a variety of DNNs shows that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6X. GraphPipe also reduces the search time by 9-21X compared to PipeDream and Piper.
△ Less
Submitted 28 October, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving
Authors:
Di Wu,
Feng Yang,
Benlian Xu,
Pan Liao,
Bo Liu
Abstract:
With the rapid advancement of autonomous driving technology, there is a growing need for enhanced safety and efficiency in the automatic environmental perception of vehicles during their operation. In modern vehicle setups, cameras and mmWave radar (radar), being the most extensively employed sensors, demonstrate complementary characteristics, inherently rendering them conducive to fusion and faci…
▽ More
With the rapid advancement of autonomous driving technology, there is a growing need for enhanced safety and efficiency in the automatic environmental perception of vehicles during their operation. In modern vehicle setups, cameras and mmWave radar (radar), being the most extensively employed sensors, demonstrate complementary characteristics, inherently rendering them conducive to fusion and facilitating the achievement of both robust performance and cost-effectiveness. This paper focuses on a comprehensive survey of radar-vision (RV) fusion based on deep learning methods for 3D object detection in autonomous driving. We offer a comprehensive overview of each RV fusion category, specifically those employing region of interest (ROI) fusion and end-to-end fusion strategies. As the most promising fusion strategy at present, we provide a deeper classification of end-to-end fusion methods, including those 3D bounding box prediction based and BEV based approaches. Moreover, aligning with recent advancements, we delineate the latest information on 4D radar and its cutting-edge applications in autonomous vehicles (AVs). Finally, we present the possible future trends of RV fusion and summarize this paper.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Textual Inversion and Self-supervised Refinement for Radiology Report Generation
Authors:
Yuanjiang Luo,
Hongxiang Li,
Xuan Wu,
Meng Cao,
Xiaoshuang Huang,
Zhihong Zhu,
Peixi Liao,
Hu Chen,
Yi Zhang
Abstract:
Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specific…
▽ More
Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specifically, textual inversion can project text and image into the same space by representing images as pseudo words to eliminate the cross-modeling gap. Subsequently, self-supervised refinement refines these pseudo words through contrastive loss computation between images and texts, enhancing the fidelity of generated reports to images. Notably, TISR is orthogonal to most existing methods, plug-and-play. We conduct experiments on two widely-used public datasets and achieve significant improvements on various baselines, which demonstrates the effectiveness and generalization of TISR. The code will be available soon.
△ Less
Submitted 6 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
MonoDETRNext: Next-Generation Accurate and Efficient Monocular 3D Object Detector
Authors:
Pan Liao,
Feng Yang,
Di Wu,
Wenhui Zhao,
Jinwen Yu
Abstract:
Monocular 3D object detection has vast application potential across various fields. DETR-type models have shown remarkable performance in different areas, but there is still considerable room for improvement in monocular 3D detection, especially with the existing DETR-based method, MonoDETR. After addressing the query initialization issues in MonoDETR, we explored several performance enhancement s…
▽ More
Monocular 3D object detection has vast application potential across various fields. DETR-type models have shown remarkable performance in different areas, but there is still considerable room for improvement in monocular 3D detection, especially with the existing DETR-based method, MonoDETR. After addressing the query initialization issues in MonoDETR, we explored several performance enhancement strategies, such as incorporating a more efficient encoder and utilizing a more powerful depth estimator. Ultimately, we proposed MonoDETRNext, a model that comes in two variants based on the choice of depth estimator: MonoDETRNext-E, which prioritizes speed, and MonoDETRNext-A, which focuses on accuracy. We posit that MonoDETRNext establishes a new benchmark in monocular 3D object detection and opens avenues for future research. We conducted an exhaustive evaluation demonstrating the model's superior performance against existing solutions. Notably, MonoDETRNext-A demonstrated a 3.52$\%$ improvement in the $AP_{3D}$ metric on the KITTI test benchmark over MonoDETR, while MonoDETRNext-E showed a 2.35$\%$ increase. Additionally, the computational efficiency of MonoDETRNext-E slightly exceeds that of its predecessor.
△ Less
Submitted 27 November, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Analytical Heterogeneous Die-to-Die 3D Placement with Macros
Authors:
Yuxuan Zhao,
Peiyu Liao,
Siting Liu,
Jiaxi Jiang,
Yibo Lin,
Bei Yu
Abstract:
This paper presents an innovative approach to 3D mixed-size placement in heterogeneous face-to-face (F2F) bonded 3D ICs. We propose an analytical framework that utilizes a dedicated density model and a bistratal wirelength model, effectively handling macros and standard cells in a 3D solution space. A novel 3D preconditioner is developed to resolve the topological and physical gap between macros a…
▽ More
This paper presents an innovative approach to 3D mixed-size placement in heterogeneous face-to-face (F2F) bonded 3D ICs. We propose an analytical framework that utilizes a dedicated density model and a bistratal wirelength model, effectively handling macros and standard cells in a 3D solution space. A novel 3D preconditioner is developed to resolve the topological and physical gap between macros and standard cells. Additionally, we propose a mixed-integer linear programming (MILP) formulation for macro rotation to optimize wirelength. Our framework is implemented with full-scale GPU acceleration, leveraging an adaptive 3D density accumulation algorithm and an incremental wirelength gradient algorithm. Experimental results on ICCAD 2023 contest benchmarks demonstrate that our framework can achieve 5.9% quality score improvement compared to the first-place winner with 4.0x runtime speedup. Additional experiments on modern RISC-V designs further validate the generalizability and superiority of our framework.
△ Less
Submitted 13 August, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models
Authors:
Haoyuan Wu,
Xinyun Zhang,
Peng Xu,
Peiyu Liao,
Xufeng Yao,
Bei Yu
Abstract:
Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. In light of the rapidly increasing size of pre-trained VLMs, parameter-efficient transfer learning (PETL) has garnered attention as a viable alternative to full fine-tuning. One such approach is the adapter, which introduces a few trainable parameters into the pre-traine…
▽ More
Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. In light of the rapidly increasing size of pre-trained VLMs, parameter-efficient transfer learning (PETL) has garnered attention as a viable alternative to full fine-tuning. One such approach is the adapter, which introduces a few trainable parameters into the pre-trained models while preserving the original parameters during adaptation. In this paper, we present a novel modeling framework that recasts adapter tuning after attention as a graph message passing process on attention graphs, where the projected query and value features and attention matrix constitute the node features and the graph adjacency matrix, respectively. Within this framework, tuning adapters in VLMs necessitates handling heterophilic graphs, owing to the disparity between the projected query and value space. To address this challenge, we propose a new adapter architecture, $p$-adapter, which employs $p$-Laplacian message passing in Graph Neural Networks (GNNs). Specifically, the attention weights are re-normalized based on the features, and the features are then aggregated using the calibrated attention matrix, enabling the dynamic exploitation of information with varying frequencies in the heterophilic attention graphs. We conduct extensive experiments on different pre-trained VLMs and multi-modal tasks, including visual question answering, visual entailment, and image captioning. The experimental results validate our method's significant superiority over other PETL methods.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Analytical Die-to-Die 3D Placement with Bistratal Wirelength Model and GPU Acceleration
Authors:
Peiyu Liao,
Yuxuan Zhao,
Dawei Guo,
Yibo Lin,
Bei Yu
Abstract:
In this paper, we present a new analytical 3D placement framework with a bistratal wirelength model for F2F-bonded 3D ICs with heterogeneous technology nodes based on the electrostatic-based density model. The proposed framework, enabled GPU-acceleration, is capable of efficiently determining node partitioning and locations simultaneously, leveraging the dedicated 3D wirelength model and density m…
▽ More
In this paper, we present a new analytical 3D placement framework with a bistratal wirelength model for F2F-bonded 3D ICs with heterogeneous technology nodes based on the electrostatic-based density model. The proposed framework, enabled GPU-acceleration, is capable of efficiently determining node partitioning and locations simultaneously, leveraging the dedicated 3D wirelength model and density model. The experimental results on ICCAD 2022 contest benchmarks demonstrate that our proposed 3D placement framework can achieve up to 6.1% wirelength improvement and 4.1% on average compared to the first-place winner with much fewer vertical interconnections and up to 9.8x runtime speedup. Notably, the proposed framework also outperforms the state-of-the-art 3D analytical placer by up to 3.3% wirelength improvement and 2.1% on average with up to 8.8x acceleration on large cases using GPUs.
△ Less
Submitted 11 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired
Authors:
Jia-Jyu Su,
Pang-Chen Liao,
Yen-Ting Lin,
Wu-Hao Li,
Guan-Ting Liou,
Cheng-Che Kao,
Wei-Cheng Chen,
Jen-Chieh Chiang,
Wen-Yang Chang,
Pin-Han Lin,
Chen-Yu Chiang
Abstract:
Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of…
▽ More
Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of the developed personalized TTS systems, for the VoiceBanking project. The developed corpus is named after the VoiceBank-2023 speech corpus because of its release year. The corpus contains 29.78 hours of utterances with prompts of short paragraphs and common phrases spoken by 111 native Mandarin speakers. The corpus is labeled with information about gender, degree of speech impairment, types of users, transcription, SNRs, and speaking rates. The VoiceBank-2023 is available by request for non-commercial use and welcomes all parties to join the VoiceBanking project to improve the services for the speech impaired.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
PV-SSD: A Multi-Modal Point Cloud Feature Fusion Method for Projection Features and Variable Receptive Field Voxel Features
Authors:
Yongxin Shao,
Aihong Tan,
Zhetao Sun,
Enhui Zheng,
Tianhong Yan,
Peng Liao
Abstract:
LiDAR-based 3D object detection and classification is crucial for autonomous driving. However, real-time inference from extremely sparse 3D data is a formidable challenge. To address this problem, a typical class of approaches transforms the point cloud cast into a regular data representation (voxels or projection maps). Then, it performs feature extraction with convolutional neural networks. Howe…
▽ More
LiDAR-based 3D object detection and classification is crucial for autonomous driving. However, real-time inference from extremely sparse 3D data is a formidable challenge. To address this problem, a typical class of approaches transforms the point cloud cast into a regular data representation (voxels or projection maps). Then, it performs feature extraction with convolutional neural networks. However, such methods often result in a certain degree of information loss due to down-sampling or over-compression of feature information. This paper proposes a multi-modal point cloud feature fusion method for projection features and variable receptive field voxel features (PV-SSD) based on projection and variable voxelization to solve the information loss problem. We design a two-branch feature extraction structure with a 2D convolutional neural network to extract the point cloud's projection features in bird's-eye view to focus on the correlation between local features. A voxel feature extraction branch is used to extract local fine-grained features. Meanwhile, we propose a voxel feature extraction method with variable sensory fields to reduce the information loss of voxel branches due to downsampling. It avoids missing critical point information by selecting more useful feature points based on feature point weights for the detection task. In addition, we propose a multi-modal feature fusion module for point clouds. To validate the effectiveness of our method, we tested it on the KITTI dataset and ONCE dataset.
△ Less
Submitted 13 April, 2024; v1 submitted 13 August, 2023;
originally announced August 2023.
-
Shaping a Smarter Electromagnetic Landscape: IAB, NCR, and RIS in 5G Standard and Future 6G
Authors:
Chao-Kai Wen,
Lung-Sheng Tsai,
Arman Shojaeifard,
Pei-Kai Liao,
Kai-Kit Wong,
Chan-Byoung Chae
Abstract:
The main objective of 5G and beyond networks is to provide an optimal user experience in terms of throughput and reliability, irrespective of location and time. To achieve this, traditional fixed macro base station deployments are being replaced by more innovative and flexible solutions, such as wireless backhaul and relays. This article focuses on the evolution and standardization of these advanc…
▽ More
The main objective of 5G and beyond networks is to provide an optimal user experience in terms of throughput and reliability, irrespective of location and time. To achieve this, traditional fixed macro base station deployments are being replaced by more innovative and flexible solutions, such as wireless backhaul and relays. This article focuses on the evolution and standardization of these advancements, which are shaping the electromagnetic landscape. Specifically, we explore Integrated Access and Backhaul (IAB) nodes, which offer a cost-efficient and agile alternative to fiber backhaul. We also discuss Network-Controlled Repeaters (NCRs) and the emergence of Reconfigurable Intelligent Surfaces (RIS) actively adapting the wireless environment. The article provides an overview of the 5G features and ongoing developments in 3GPP Release 18 related to these intelligent EM entities, highlighting the expected evolution of future wireless networks in terms of architecture, operations, and control signals.
△ Less
Submitted 18 January, 2024; v1 submitted 6 August, 2023;
originally announced August 2023.
-
MIMO Evolution toward 6G: End-User-Centric Collaborative MIMO
Authors:
Lung-Sheng Tsai,
Shang-Ling Shih,
Pei-Kai Liao,
Chao-Kai Wen
Abstract:
In 6G, the trend of transitioning from massive antenna elements to even more massive ones is continued. However, installing additional antennas in the limited space of user equipment (UE) is challenging, resulting in limited capacity scaling gain for end users, despite network side support for increasing numbers of antennas. To address this issue, we propose an end-user-centric collaborative MIMO…
▽ More
In 6G, the trend of transitioning from massive antenna elements to even more massive ones is continued. However, installing additional antennas in the limited space of user equipment (UE) is challenging, resulting in limited capacity scaling gain for end users, despite network side support for increasing numbers of antennas. To address this issue, we propose an end-user-centric collaborative MIMO (UE-CoMIMO) framework that groups several fixed or portable devices to provide a virtual abundance of antennas. This article outlines how advanced L1 relays and conventional relays enable device collaboration to offer diversity, rank, and localization enhancements. We demonstrate through system-level simulations how the UE-CoMIMO approaches lead to significant performance gains. Lastly, we discuss necessary research efforts to make UE-CoMIMO available for 6G and future research directions.
△ Less
Submitted 14 November, 2023; v1 submitted 20 May, 2023;
originally announced May 2023.
-
LATTE: Label-efficient Incident Phenotyping from Longitudinal Electronic Health Records
Authors:
Jun Wen,
Jue Hou,
Clara-Lea Bonzel,
Yihan Zhao,
Victor M. Castro,
Vivian S. Gainer,
Dana Weisenfeld,
Tianrun Cai,
Yuk-Lam Ho,
Vidul A. Panickan,
Lauren Costa,
Chuan Hong,
J. Michael Gaziano,
Katherine P. Liao,
Junwei Lu,
Kelly Cho,
Tianxi Cai
Abstract:
Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnotyping (LATTE) algorithm to accurately annotate the timing of clinical eve…
▽ More
Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embedding vectors from large-scale EHR data as prior knowledge, LATTE selects predictive EHR features in a concept re-weighting module by mining their relationship to the target event and compresses their information into longitudinal visit embeddings through a visit attention learning network. LATTE employs a recurrent neural network to capture the sequential dependency between the target event and visit embeddings before/after it. To improve label efficiency, LATTE constructs highly informative longitudinal silver-standard labels from large-scale unlabeled patients to perform unsupervised pre-training and semi-supervised joint training. Finally, LATTE enhances cross-site portability via contrastive representation learning. LATTE is evaluated on three analyses: the onset of type-2 diabetes, heart failure, and the onset and relapses of multiple sclerosis. We use various evaluation metrics present in the literature including the $ABC_{gain}$, the proportion of reduction in the area between the observed event indicator and the predicted cumulative incidences in reference to the prediction per incident prevalence. LATTE consistently achieves substantial improvement over benchmark methods such as SAMGEP and RETAIN in all settings.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling
Authors:
Susobhan Ghosh,
Raphael Kim,
Prasidh Chhabria,
Raaz Dwivedi,
Predrag Klasnja,
Peng Liao,
Kelly Zhang,
Susan Murphy
Abstract:
There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this pro…
▽ More
There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.
△ Less
Submitted 7 August, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Brain informed transfer learning for categorizing construction hazards
Authors:
Xiaoshan Zhou,
Pin-Chao Liao
Abstract:
A transfer learning paradigm is proposed for "knowledge" transfer between the human brain and convolutional neural network (CNN) for a construction hazard categorization task. Participants' brain activities are recorded using electroencephalogram (EEG) measurements when viewing the same images (target dataset) as the CNN. The CNN is pretrained on the EEG data and then fine-tuned on the constructio…
▽ More
A transfer learning paradigm is proposed for "knowledge" transfer between the human brain and convolutional neural network (CNN) for a construction hazard categorization task. Participants' brain activities are recorded using electroencephalogram (EEG) measurements when viewing the same images (target dataset) as the CNN. The CNN is pretrained on the EEG data and then fine-tuned on the construction scene images. The results reveal that the EEG-pretrained CNN achieves a 9 % higher accuracy compared with a network with same architecture but randomly initialized parameters on a three-class classification task. Brain activity from the left frontal cortex exhibits the highest performance gains, thus indicating high-level cognitive processing during hazard recognition. This work is a step toward improving machine learning algorithms by learning from human-brain signals recorded via a commercially available brain-computer interface. More generalized visual recognition systems can be effectively developed based on this approach of "keep human in the loop".
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
A privacy-preserving data storage and service framework based on deep learning and blockchain for construction workers' wearable IoT sensors
Authors:
Xiaoshan Zhou,
Pin-Chao Liao
Abstract:
Classifying brain signals collected by wearable Internet of Things (IoT) sensors, especially brain-computer interfaces (BCIs), is one of the fastest-growing areas of research. However, research has mostly ignored the secure storage and privacy protection issues of collected personal neurophysiological data. Therefore, in this article, we try to bridge this gap and propose a secure privacy-preservi…
▽ More
Classifying brain signals collected by wearable Internet of Things (IoT) sensors, especially brain-computer interfaces (BCIs), is one of the fastest-growing areas of research. However, research has mostly ignored the secure storage and privacy protection issues of collected personal neurophysiological data. Therefore, in this article, we try to bridge this gap and propose a secure privacy-preserving protocol for implementing BCI applications. We first transformed brain signals into images and used generative adversarial network to generate synthetic signals to protect data privacy. Subsequently, we applied the paradigm of transfer learning for signal classification. The proposed method was evaluated by a case study and results indicate that real electroencephalogram data augmented with artificially generated samples provide superior classification performance. In addition, we proposed a blockchain-based scheme and developed a prototype on Ethereum, which aims to make storing, querying and sharing personal neurophysiological data and analysis reports secure and privacy-aware. The rights of three main transaction bodies - construction workers, BCI service providers and project managers - are described and the advantages of the proposed system are discussed. We believe this paper provides a well-rounded solution to safeguard private data against cyber-attacks, level the playing field for BCI application developers, and to the end improve professional well-being in the industry.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
Weighing votes in human-machine collaboration for hazard recognition: Inferring hazard perceptual threshold and decision confidence from electroencephalogram wavelets
Authors:
Xiaoshan Zhou,
Pin-Chao Liao
Abstract:
Purpose: Human-machine collaboration is a promising strategy to improve hazard inspection. However, research on the effective integration of opinions from humans with machines for optimal group decision making is lacking. Hence, considering the benefits of a brain-computer interface (BCI) to enable intuitive commutation, this study proposes a novel method to predict human hazard response choices a…
▽ More
Purpose: Human-machine collaboration is a promising strategy to improve hazard inspection. However, research on the effective integration of opinions from humans with machines for optimal group decision making is lacking. Hence, considering the benefits of a brain-computer interface (BCI) to enable intuitive commutation, this study proposes a novel method to predict human hazard response choices and decision confidence from brain activities for a superior confidence-weighted voting strategy. Methodology: First, we developed a Bayesian inference-based algorithm to ascertain the decision threshold above which a hazard is reported from human brain signals. This method was tested empirically with electroencephalogram (EEG) data collected in a laboratory setting and cross-validated using behavioral indices of the signal detection theory. Subsequently, based on numerical simulations, the decision criteria for low-, medium-, and high-confidence level differentiations characterized by parietal alpha-band EEG power were determined. Findings : The investigated hazard recognition task was described as a process of probabilistic inference involving a decision uncertainty evaluation. The results demonstrated the feasibility of EEG measurements in observing human internal representations of hazard discrimination. Moreover, the optimal criteria to differentiate between low-, medium-, and high-confidence levels were obtained by benchmarking against an optimal Bayesian observer. Originality: This research demonstrates the potential of a BCI as an effective channel for telecommunication, laying the foundation for the design of future hazard detection techniques in the collaborative human-machine systems research field.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
EEG-based performance-driven adaptive automated hazard alerting system in security surveillance support
Authors:
Xiaoshan Zhou,
Pin-Chao Liao
Abstract:
Computer-vision technologies have emerged to assist security surveillance. However, automation alert/alarm systems often apply a low-beta threshold to avoid misses and generates excessive false alarms. This study proposed an adaptive hazard diagnosis and alarm system with adjustable alert threshold levels based on environmental scenarios and operator's hazard recognition performance. We recorded e…
▽ More
Computer-vision technologies have emerged to assist security surveillance. However, automation alert/alarm systems often apply a low-beta threshold to avoid misses and generates excessive false alarms. This study proposed an adaptive hazard diagnosis and alarm system with adjustable alert threshold levels based on environmental scenarios and operator's hazard recognition performance. We recorded electroencephalogram (EEG) data during hazard recognition tasks. The linear ballistic accumulator model was used to decompose the response time into several psychological subcomponents, which were further estimated by a Markov chain Monte Carlo algorithm and compared among different types of hazardous scenarios. Participants were most cautious about falling hazards, followed by electricity hazards, and had the least conservative attitude toward structural hazards. Participants were classified into three performance-level subgroups using a latent profile analysis based on task accuracy. We applied the transfer learning paradigm to classify subgroups based on their time-frequency representations of EEG data. Additionally, two continual learning strategies were investigated to ensure a robust adaptation of the model to predict participants' performance levels in different hazardous scenarios. These findings can be leveraged in real-world brain-computer interface applications, which will provide human trust in automation and promote the successful implementation of alarm technologies.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Implementation of Trained Factorization Machine Recommendation System on Quantum Annealer
Authors:
Chen-Yu Liu,
Hsin-Yu Wang,
Pei-Yen Liao,
Ching-Jui Lai,
Min-Hsiu Hsieh
Abstract:
Factorization Machine (FM) is the most commonly used model to build a recommendation system since it can incorporate side information to improve performance. However, producing item suggestions for a given user with a trained FM is time-consuming. It requires a run-time of $O((N_m \log N_m)^2)$, where $N_m$ is the number of items in the dataset. To address this problem, we propose a quadratic unco…
▽ More
Factorization Machine (FM) is the most commonly used model to build a recommendation system since it can incorporate side information to improve performance. However, producing item suggestions for a given user with a trained FM is time-consuming. It requires a run-time of $O((N_m \log N_m)^2)$, where $N_m$ is the number of items in the dataset. To address this problem, we propose a quadratic unconstrained binary optimization (QUBO) scheme to combine with FM and apply quantum annealing (QA) computation. Compared to classical methods, this hybrid algorithm provides a faster than quadratic speedup in finding good user suggestions. We then demonstrate the aforementioned computational advantage on current NISQ hardware by experimenting with a real example on a D-Wave annealer.
△ Less
Submitted 8 November, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
The ArtBench Dataset: Benchmarking Generative Models with Artworks
Authors:
Peiyuan Liao,
Xiuyu Li,
Xihui Liu,
Kurt Keutzer
Abstract:
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. It comprises 60,000 images of artwork from 10 distinctive artistic styles, with 5,000 training images and 1,000 testing images per style. ArtBench-10 has several advantages over previous artwork datasets. Firstly, it is class-balanced while most previou…
▽ More
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. It comprises 60,000 images of artwork from 10 distinctive artistic styles, with 5,000 training images and 1,000 testing images per style. ArtBench-10 has several advantages over previous artwork datasets. Firstly, it is class-balanced while most previous artwork datasets suffer from the long tail class distributions. Secondly, the images are of high quality with clean annotations. Thirdly, ArtBench-10 is created with standardized data collection, annotation, filtering, and preprocessing procedures. We provide three versions of the dataset with different resolutions ($32\times32$, $256\times256$, and original image size), formatted in a way that is easy to be incorporated by popular machine learning frameworks. We also conduct extensive benchmarking experiments using representative image synthesis models with ArtBench-10 and present in-depth analysis. The dataset is available at https://github.com/liaopeiyuan/artbench under a Fair Use license.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Learning Weakly-Supervised Contrastive Representations
Authors:
Yao-Hung Hubert Tsai,
Tianqin Li,
Weixin Liu,
Peiyuan Liao,
Ruslan Salakhutdinov,
Louis-Philippe Morency
Abstract:
We argue that a form of the valuable information provided by the auxiliary information is its implied data clustering information. For instance, considering hashtags as auxiliary information, we can hypothesize that an Instagram image will be semantically more similar with the same hashtags. With this intuition, we present a two-stage weakly-supervised contrastive learning approach. The first stag…
▽ More
We argue that a form of the valuable information provided by the auxiliary information is its implied data clustering information. For instance, considering hashtags as auxiliary information, we can hypothesize that an Instagram image will be semantically more similar with the same hashtags. With this intuition, we present a two-stage weakly-supervised contrastive learning approach. The first stage is to cluster data according to its auxiliary information. The second stage is to learn similar representations within the same cluster and dissimilar representations for data from different clusters. Our empirical experiments suggest the following three contributions. First, compared to conventional self-supervised representations, the auxiliary-information-infused representations bring the performance closer to the supervised representations, which use direct downstream labels as supervision signals. Second, our approach performs the best in most cases, when comparing our approach with other baseline representation learning methods that also leverage auxiliary data information. Third, we show that our approach also works well with unsupervised constructed clusters (e.g., no auxiliary information), resulting in a strong unsupervised representation learning approach.
△ Less
Submitted 18 February, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Collage: Seamless Integration of Deep Learning Backends with Automatic Placement
Authors:
Byungsoo Jeon,
Sunghyun Park,
Peiyuan Liao,
Sheng Xu,
Tianqi Chen,
Zhihao Jia
Abstract:
The strong demand for efficient and performant deployment of Deep Learning (DL) applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast advancement, it is crucial for modern DL frameworks to efficiently integrate a variety of optimized tensor algebra libraries and runtimes as their backends and generate the fastest possible executable using these backends. Howe…
▽ More
The strong demand for efficient and performant deployment of Deep Learning (DL) applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast advancement, it is crucial for modern DL frameworks to efficiently integrate a variety of optimized tensor algebra libraries and runtimes as their backends and generate the fastest possible executable using these backends. However, current DL frameworks require significant manual effort and expertise to integrate every new backend while failing to unleash its full potential. Given the fast-evolving nature of the DL ecosystem, this manual approach often slows down continuous innovations across different layers; it prevents hardware vendors from the fast deployment of their cutting-edge libraries, DL framework developers must repeatedly adjust their hand-coded rules to accommodate new versions of libraries, and machine learning practitioners need to wait for the integration of new technologies and often encounter unsatisfactory performance.
In this paper, we propose Collage, a DL framework that offers seamless integration of DL backends. Collage provides an expressive backend registration interface that allows users to precisely specify the capability of various backends. By leveraging the specifications of available backends, Collage automatically searches for an optimized backend placement strategy for a given workload and execution environment. Our evaluation shows that Collage outperforms the best existing framework for each hardware by $1.26\times$, $1.43\times$, $1.40\times$ on average on NVIDIA's RTX 2070 GPU, V100 GPU, and Intel's Xeon 8259CL CPU, respectively. Collage has been open-sourced and deployed in Apache TVM.
△ Less
Submitted 27 October, 2022; v1 submitted 31 October, 2021;
originally announced November 2021.
-
Integrating Auxiliary Information in Self-supervised Learning
Authors:
Yao-Hung Hubert Tsai,
Tianqin Li,
Weixin Liu,
Peiyuan Liao,
Ruslan Salakhutdinov,
Louis-Philippe Morency
Abstract:
This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful information about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the struct…
▽ More
This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful information about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the structural information from the auxiliary information, we present to construct data clusters according to the auxiliary information. Then, we introduce the Clustering InfoNCE (Cl-InfoNCE) objective that learns similar representations for augmented variants of data from the same cluster and dissimilar representations for data from different clusters. Our approach contributes as follows: 1) Comparing to conventional self-supervised representations, the auxiliary-information-infused self-supervised representations bring the performance closer to the supervised representations; 2) The presented Cl-InfoNCE can also work with unsupervised constructed clusters (e.g., k-means clusters) and outperform strong clustering-based self-supervised learning approaches, such as the Prototypical Contrastive Learning (PCL) method; 3) We show that Cl-InfoNCE may be a better approach to leverage the data clustering information, by comparing it to the baseline approach - learning to predict the clustering assignments with cross-entropy loss. For analysis, we connect the goodness of the learned representations with the statistical relationships: i) the mutual information between the labels and the clusters and ii) the conditional entropy of the clusters given the labels.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Robust Batch Policy Learning in Markov Decision Processes
Authors:
Zhengling Qi,
Peng Liao
Abstract:
We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution. Given a pre-collected dataset of multiple traject…
▽ More
We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution. Given a pre-collected dataset of multiple trajectories generated by some behavior policy, our goal is to learn a robust policy in a pre-specified policy class that can maximize the smallest value of this set. Leveraging the theory of semi-parametric statistics, we develop a statistically efficient policy learning method for estimating the de ned robust optimal policy. A rate-optimal regret bound up to a logarithmic factor is established in terms of total decision points in the dataset.
△ Less
Submitted 9 November, 2021; v1 submitted 8 November, 2020;
originally announced November 2020.
-
Google Landmark Recognition 2020 Competition Third Place Solution
Authors:
Qishen Ha,
Bo Liu,
Fuxu Liu,
Peiyuan Liao
Abstract:
We present our third place solution to the Google Landmark Recognition 2020 competition. It is an ensemble of global features only Sub-center ArcFace models. We introduce dynamic margins for ArcFace loss, a family of tune-able margin functions of class size, designed to deal with the extreme imbalance in GLDv2 dataset. Progressive finetuning and careful postprocessing are also key to the solution.…
▽ More
We present our third place solution to the Google Landmark Recognition 2020 competition. It is an ensemble of global features only Sub-center ArcFace models. We introduce dynamic margins for ArcFace loss, a family of tune-able margin functions of class size, designed to deal with the extreme imbalance in GLDv2 dataset. Progressive finetuning and careful postprocessing are also key to the solution. Our two submissions scored 0.6344 and 0.6289 on private leaderboard, both ranking third place out of 736 teams.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Information Obfuscation of Graph Neural Networks
Authors:
Peiyuan Liao,
Han Zhao,
Keyulu Xu,
Tommi Jaakkola,
Geoffrey Gordon,
Stefanie Jegelka,
Ruslan Salakhutdinov
Abstract:
While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representation learning in many applications, the neighborhood aggregation scheme exposes additional vulnerabilities to adversaries seeking to extract node-level information about sensitive attributes. In this paper, we study the problem of protecting sensitive attributes by information obfuscation when learning w…
▽ More
While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representation learning in many applications, the neighborhood aggregation scheme exposes additional vulnerabilities to adversaries seeking to extract node-level information about sensitive attributes. In this paper, we study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance. Our method creates a strong defense against inference attacks, while only suffering small loss in task performance. Theoretically, we analyze the effectiveness of our framework against a worst-case adversary, and characterize an inherent trade-off between maximizing predictive accuracy and minimizing information leakage. Experiments across multiple datasets from recommender systems, knowledge graphs and quantum chemistry demonstrate that the proposed approach provides a robust defense across various graph structures and tasks, while producing competitive GNN encoders for downstream tasks.
△ Less
Submitted 13 June, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
IntelligentPooling: Practical Thompson Sampling for mHealth
Authors:
Sabina Tomkins,
Peng Liao,
Predrag Klasnja,
Susan Murphy
Abstract:
In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobi…
▽ More
In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In this work we are concerned with the following challenges: 1) individuals who are in the same context can exhibit differential response to treatments 2) only a limited amount of data is available for learning on any one individual, and 3) non-stationary responses to treatment. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop IntelligentPooling. IntelligentPooling learns personalized treatment policies thus addressing challenge one. To address the second challenge, IntelligentPooling updates each user's degree of personalization while making use of available data on other users to speed up learning. Lastly, IntelligentPooling allows responsivity to vary as a function of a user's time since beginning treatment, thus addressing challenge three. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art. We demonstrate the promise of this approach and its ability to learn from even a small group of users in a live clinical trial.
△ Less
Submitted 12 December, 2020; v1 submitted 31 July, 2020;
originally announced August 2020.
-
Rapidly Personalizing Mobile Health Treatment Policies with Limited Data
Authors:
Sabina Tomkins,
Peng Liao,
Predrag Klasnja,
Serena Yeung,
Susan Murphy
Abstract:
In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challengi…
▽ More
In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challenging. We present IntelligentPooling, which learns personalized policies via an adaptive, principled use of other users' data. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art across all generative models. Additionally, we inspect the behavior of this approach in a live clinical trial, demonstrating its ability to learn from even a small group of users.
△ Less
Submitted 23 February, 2020;
originally announced February 2020.
-
Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health
Authors:
Peng Liao,
Predrag Klasnja,
Susan Murphy
Abstract:
Due to the recent advancements in wearables and sensing technology, health scientists are increasingly developing mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHea…
▽ More
Due to the recent advancements in wearables and sensing technology, health scientists are increasingly developing mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHealth intervention policies, often called just-in-time adaptive interventions, are decision rules that map an individual's current state (e.g., individual's past behaviors as well as current observations of time, location, social activity, stress and urges to smoke) to a particular treatment at each of many time points. The vast majority of current mHealth interventions deploy expert-derived policies. In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy. Our measure of performance is the average of proximal outcomes over a long time period should the particular mHealth policy be followed. We provide an estimator as well as confidence intervals. This work is motivated by HeartSteps, an mHealth physical activity intervention.
△ Less
Submitted 22 July, 2020; v1 submitted 30 December, 2019;
originally announced December 2019.
-
Improving Abstractive Text Summarization with History Aggregation
Authors:
Pengcheng Liao,
Chuang Zhang,
Xiaojun Chen,
Xiaofei Zhou
Abstract:
Recent neural sequence to sequence models have provided feasible solutions for abstractive summarization. However, such models are still hard to tackle long text dependency in the summarization task. A high-quality summarization system usually depends on strong encoder which can refine important information from long input texts so that the decoder can generate salient summaries from the encoder's…
▽ More
Recent neural sequence to sequence models have provided feasible solutions for abstractive summarization. However, such models are still hard to tackle long text dependency in the summarization task. A high-quality summarization system usually depends on strong encoder which can refine important information from long input texts so that the decoder can generate salient summaries from the encoder's memory. In this paper, we propose an aggregation mechanism based on the Transformer model to address the challenge of long text representation. Our model can review history information to make encoder hold more memory capacity. Empirically, we apply our aggregation mechanism to the Transformer model and experiment on CNN/DailyMail dataset to achieve higher quality summaries compared to several strong baseline models on the ROUGE metrics.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity
Authors:
Peng Liao,
Kristjan Greenewald,
Predrag Klasnja,
Susan Murphy
Abstract:
With the recent evolution of mobile health technologies, health scientists are increasingly interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via notification on mobile device and designed to help the user prevent negative health outcomes and promote the adoption and maintenance of healthy behaviors. A JITAI involves a sequence of decision rules (i.e., trea…
▽ More
With the recent evolution of mobile health technologies, health scientists are increasingly interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via notification on mobile device and designed to help the user prevent negative health outcomes and promote the adoption and maintenance of healthy behaviors. A JITAI involves a sequence of decision rules (i.e., treatment policy) that takes the user's current context as input and specifies whether and what type of an intervention should be provided at the moment. In this paper, we develop a Reinforcement Learning (RL) algorithm that continuously learns and improves the treatment policy embedded in the JITAI as the data is being collected from the user. This work is motivated by our collaboration on designing the RL algorithm in HeartSteps V2 based on data from HeartSteps V1. HeartSteps is a physical activity mobile health application. The RL algorithm developed in this paper is being used in HeartSteps V2 to decide, five times per day, whether to deliver a context-tailored activity suggestion.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
CAE-ADMM: Implicit Bitrate Optimization via ADMM-based Pruning in Compressive Autoencoders
Authors:
Haimeng Zhao,
Peiyuan Liao
Abstract:
We introduce ADMM-pruned Compressive AutoEncoder (CAE-ADMM) that uses Alternative Direction Method of Multipliers (ADMM) to optimize the trade-off between distortion and efficiency of lossy image compression. Specifically, ADMM in our method is to promote sparsity to implicitly optimize the bitrate, different from entropy estimators used in the previous research. The experiments on public datasets…
▽ More
We introduce ADMM-pruned Compressive AutoEncoder (CAE-ADMM) that uses Alternative Direction Method of Multipliers (ADMM) to optimize the trade-off between distortion and efficiency of lossy image compression. Specifically, ADMM in our method is to promote sparsity to implicitly optimize the bitrate, different from entropy estimators used in the previous research. The experiments on public datasets show that our method outperforms the original CAE and some traditional codecs in terms of SSIM/MS-SSIM metrics, at reasonable inference speed.
△ Less
Submitted 8 February, 2019; v1 submitted 22 January, 2019;
originally announced January 2019.
-
Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling
Authors:
Randell Cotta,
Mingyang Hu,
Dan Jiang,
Peizhou Liao
Abstract:
We evaluate the impact of probabilistically-constructed digital identity data collected from Sep. to Dec. 2017 (approx.), in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying use…
▽ More
We evaluate the impact of probabilistically-constructed digital identity data collected from Sep. to Dec. 2017 (approx.), in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying user. The identity data allows to generate "identity-based", rather than "identifier-based", user models, giving a fuller picture of the interests of the users underlying the identifiers. We employ off-policy techniques to evaluate the potential of identity-powered lookalike models without incurring the risk of allowing untested models to direct large amounts of ad spend or the large cost of performing A/B tests. We add to historical work on off-policy evaluation by noting a significant type of "finite-sample bias" that occurs for studies combining modestly-sized datasets and evaluation metrics involving rare events (e.g., conversions). We illustrate this bias using a simulation study that later informs the handling of inverse propensity weights in our analyses on real data. We demonstrate significant lift in identity-powered lookalikes versus an identity-ignorant baseline: on average ~70% lift in conversion rate. This rises to factors of ~(4-32)x for identifiers having little data themselves, but that can be inferred to belong to users with substantial data to aggregate across identifiers. This implies that identity-powered user modeling is especially important in the context of identifiers having very short lifespans (i.e., frequently churned cookies). Our work motivates and informs the use of probabilistically-constructed identities in marketing. It also deepens the canon of examples in which off-policy learning has been employed to evaluate the complex systems of the internet economy.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
Visual Attention Network for Low Dose CT
Authors:
Wenchao Du,
Hu Chen,
Peixi Liao,
Hongyu Yang,
Ge Wang,
Yi Zhang
Abstract:
Noise and artifacts are intrinsic to low dose CT (LDCT) data acquisition, and will significantly affect the imaging performance. Perfect noise removal and image restoration is intractable in the context of LDCT due to the statistical and technical uncertainties. In this paper, we apply the generative adversarial network (GAN) framework with a visual attention mechanism to deal with this problem in…
▽ More
Noise and artifacts are intrinsic to low dose CT (LDCT) data acquisition, and will significantly affect the imaging performance. Perfect noise removal and image restoration is intractable in the context of LDCT due to the statistical and technical uncertainties. In this paper, we apply the generative adversarial network (GAN) framework with a visual attention mechanism to deal with this problem in a data-driven/machine learning fashion. Our main idea is to inject visual attention knowledge into the learning process of GAN to provide a powerful prior of the noise distribution. By doing this, both the generator and discriminator networks are empowered with visual attention information so they will not only pay special attention to noisy regions and surrounding structures but also explicitly assess the local consistency of the recovered regions. Our experiments qualitatively and quantitatively demonstrate the effectiveness of the proposed method with clinic CT images.
△ Less
Submitted 13 June, 2019; v1 submitted 30 October, 2018;
originally announced October 2018.
-
Full Workspace Generation of Serial-link Manipulators by Deep Learning based Jacobian Estimation
Authors:
Peiyuan Liao,
Jiajun Mao
Abstract:
Apart from solving complicated problems that require a certain level of intelligence, fine-tuned deep neural networks can also create fast algorithms for slow, numerical tasks. In this paper, we introduce an improved version of [1]'s work, a fast, deep-learning framework capable of generating the full workspace of serial-link manipulators. The architecture consists of two neural networks: an estim…
▽ More
Apart from solving complicated problems that require a certain level of intelligence, fine-tuned deep neural networks can also create fast algorithms for slow, numerical tasks. In this paper, we introduce an improved version of [1]'s work, a fast, deep-learning framework capable of generating the full workspace of serial-link manipulators. The architecture consists of two neural networks: an estimation net that approximates the manipulator Jacobian, and a confidence net that measures the confidence of the approximation. We also introduce M3 (Manipulability Maps of Manipulators), a MATLAB robotics library based on [2](RTB), the datasets generated by which are used by this work. Results have shown that not only are the neural networks significantly faster than numerical inverse kinematics, it also offers superior accuracy when compared to other machine learning alternatives. Implementations of the algorithm (based on Keras[3]), including benchmark evaluation script, are available at https://github.com/liaopeiyuan/Jacobian-Estimation . The M3 Library APIs and datasets are also available at https://github.com/liaopeiyuan/M3 .
△ Less
Submitted 13 September, 2018; v1 submitted 31 August, 2018;
originally announced September 2018.
-
Deep Neural Network Based Subspace Learning of Robotic Manipulator Workspace Mapping
Authors:
Peiyuan Liao
Abstract:
The manipulator workspace mapping is an important problem in robotics and has attracted significant attention in the community. However, most of the pre-existing algorithms have expensive time complexity due to the reliance on sophisticated kinematic equations. To solve this problem, this paper introduces subspace learning (SL), a variant of subspace embedding, where a set of robot and scope param…
▽ More
The manipulator workspace mapping is an important problem in robotics and has attracted significant attention in the community. However, most of the pre-existing algorithms have expensive time complexity due to the reliance on sophisticated kinematic equations. To solve this problem, this paper introduces subspace learning (SL), a variant of subspace embedding, where a set of robot and scope parameters is mapped to the corresponding workspace by a deep neural network (DNN). Trained on a large dataset of around $\mathbf{6\times 10^4}$ samples obtained from a MATLAB$^\circledR$ implementation of a classical method and sampling of designed uniform distributions, the experiments demonstrate that the embedding significantly reduces run-time from $\mathbf{5.23 \times 10^3}$ s of traditional discretization method to $\mathbf{0.224}$ s, with high accuracies (average F-measure is $\mathbf{0.9665}$ with batch gradient descent and resilient backpropagation).
△ Less
Submitted 24 April, 2018; v1 submitted 24 April, 2018;
originally announced April 2018.
-
Analysis of the Relation between Artificial Intelligence and the Internet from the Perspective of Brain Science
Authors:
Feng Liu,
Yong Shi,
Peijia Lia
Abstract:
Artificial intelligence (AI) like deep learning, cloud AI computation has been advancing at a rapid pace since 2014. There is no doubt that the prosperity of AI is inseparable with the development of the Internet. However, there has been little attention to the link between AI and the internet. This paper explores them with brain insights mainly from four views:1) How is the general relation betwe…
▽ More
Artificial intelligence (AI) like deep learning, cloud AI computation has been advancing at a rapid pace since 2014. There is no doubt that the prosperity of AI is inseparable with the development of the Internet. However, there has been little attention to the link between AI and the internet. This paper explores them with brain insights mainly from four views:1) How is the general relation between artificial intelligence and Internet of Things, cloud computing, big data and Industrial Internet from the perspective of brain science. 2) Construction of a new AI system model with the Internet and brain science.
△ Less
Submitted 2 January, 2018;
originally announced January 2018.
-
Group-driven Reinforcement Learning for Personalized mHealth Intervention
Authors:
Feiyun Zhu,
Jun Guo,
Zheng Xu,
Peng Liao,
Junzhou Huang
Abstract:
Due to the popularity of smartphones and wearable devices nowadays, mobile health (mHealth) technologies are promising to bring positive and wide impacts on people's health. State-of-the-art decision-making methods for mHealth rely on some ideal assumptions. Those methods either assume that the users are completely homogenous or completely heterogeneous. However, in reality, a user might be simila…
▽ More
Due to the popularity of smartphones and wearable devices nowadays, mobile health (mHealth) technologies are promising to bring positive and wide impacts on people's health. State-of-the-art decision-making methods for mHealth rely on some ideal assumptions. Those methods either assume that the users are completely homogenous or completely heterogeneous. However, in reality, a user might be similar with some, but not all, users. In this paper, we propose a novel group-driven reinforcement learning method for the mHealth. We aim to understand how to share information among similar users to better convert the limited user information into sharper learned RL policies. Specifically, we employ the K-means clustering method to group users based on their trajectory information similarity and learn a shared RL policy for each group. Extensive experiment results have shown that our method can achieve clear gains over the state-of-the-art RL methods for mHealth.
△ Less
Submitted 13 August, 2017;
originally announced August 2017.
-
LEARN: Learned Experts' Assessment-based Reconstruction Network for Sparse-data CT
Authors:
Hu Chen,
Yi Zhang,
Yunjin Chen,
Junfeng Zhang,
Weihua Zhang,
Huaiqiaing Sun,
Yang Lv,
Peixi Liao,
Jiliu Zhou,
Ge Wang
Abstract:
Compressive sensing (CS) has proved effective for tomographic reconstruction from sparsely collected data or under-sampled measurements, which are practically important for few-view CT, tomosynthesis, interior tomography, and so on. To perform sparse-data CT, the iterative reconstruction commonly use regularizers in the CS framework. Currently, how to choose the parameters adaptively for regulariz…
▽ More
Compressive sensing (CS) has proved effective for tomographic reconstruction from sparsely collected data or under-sampled measurements, which are practically important for few-view CT, tomosynthesis, interior tomography, and so on. To perform sparse-data CT, the iterative reconstruction commonly use regularizers in the CS framework. Currently, how to choose the parameters adaptively for regularization is a major open problem. In this paper, inspired by the idea of machine learning especially deep learning, we unfold a state-of-the-art "fields of experts" based iterative reconstruction scheme up to a number of iterations for data-driven training, construct a Learned Experts' Assessment-based Reconstruction Network ("LEARN") for sparse-data CT, and demonstrate the feasibility and merits of our LEARN network. The experimental results with our proposed LEARN network produces a competitive performance with the well-known Mayo Clinic Low-Dose Challenge Dataset relative to several state-of-the-art methods, in terms of artifact reduction, feature preservation, and computational speed. This is consistent to our insight that because all the regularization terms and parameters used in the iterative reconstruction are now learned from the training data, our LEARN network utilizes application-oriented knowledge more effectively and recovers underlying images more favorably than competing algorithms. Also, the number of layers in the LEARN network is only 12, reducing the computational complexity of typical iterative algorithms by orders of magnitude.
△ Less
Submitted 10 February, 2018; v1 submitted 30 July, 2017;
originally announced July 2017.