-
SegMatch: A semi-supervised learning method for surgical instrument segmentation
Authors:
Meng Wei,
Charlie Budd,
Luis C. Garcia-Peraza-Herrera,
Reuben Dorent,
Miaojing Shi,
Tom Vercauteren
Abstract:
Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining…
▽ More
Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining consistency regularization and pseudo labelling, and adapts it for the purpose of segmentation. In our proposed SegMatch, the unlabelled images are weakly augmented and fed into the segmentation model to generate a pseudo-label to enforce the unsupervised loss against the output of the model for the adversarial augmented image on the pixels with a high confidence score. Our adaptation for segmentation tasks includes carefully considering the equivariance and invariance properties of the augmentation functions we rely on. To increase the relevance of our augmentations, we depart from using only handcrafted augmentations and introduce a trainable adversarial augmentation strategy. Our algorithm was evaluated on the MICCAI Instrument Segmentation Challenge datasets Robust-MIS 2019 and EndoVis 2017. Our results demonstrate that adding unlabelled data for training purposes allows us to surpass the performance of fully supervised approaches which are limited by the availability of training data in these challenges. SegMatch also outperforms a range of state-of-the-art semi-supervised learning semantic segmentation models in different labelled to unlabelled data ratios.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Gapless superconducting state and mirage gap in altermagnets
Authors:
Miaomiao Wei,
Longjun Xiang,
Fuming Xu,
Lei Zhang,
Gaomin Tang,
Jian Wang
Abstract:
The interplay between spin-orbit interaction (SOI) and magnetism produces interesting phenomena in superconductors. When a two-dimensional (2D) system with strong SOI is coupled to an $s$-wave superconductor, an in-plane magnetic field can drive the system into a gapless superconducting state and induce a mirage gap at finite energies for an Ising superconductor. In this work, we demonstrate that…
▽ More
The interplay between spin-orbit interaction (SOI) and magnetism produces interesting phenomena in superconductors. When a two-dimensional (2D) system with strong SOI is coupled to an $s$-wave superconductor, an in-plane magnetic field can drive the system into a gapless superconducting state and induce a mirage gap at finite energies for an Ising superconductor. In this work, we demonstrate that when an $s$-wave superconductor is proximitized to an altermagnet, the intrinsic anisotropic spin splitting of the altermagnet can result in a gapless superconducting state and a pair of mirage gaps at finite energy. The gapless superconductivity exhibits spin-polarized segmented Fermi surfaces, with coexisting spin-singlet and spin-triplet pairings that have a $d$-wave character. Importantly, the gapless superconducting and mirage gap features are quantified through quantum transport. Our results suggest that altermagnet is an ideal platform for studying gapless superconducting states and mirage gap physics.
△ Less
Submitted 14 May, 2024; v1 submitted 31 July, 2023;
originally announced August 2023.
-
Franck-Condon Simulation of Vibrationally-Resolved X-ray Spectra for Diatomic Systems: Validation of Harmonic Approximation and Density Functional Theory
Authors:
Lu Zhang,
Minrui Wei,
Guoyan Ge,
Weijie Hua
Abstract:
Under the Franck-Condon approximation, we systematically validated the performance of density functional theory (DFT) and the effects of anharmonicity in simulating C/N/O K-edge vibrationally-resolved X-ray spectra of common diatomic molecules. To get ``transparent'' validations, vibronic fine structures of only the lowest 1s excited or ionized state in the X-ray absorption (XAS) or photoelectron…
▽ More
Under the Franck-Condon approximation, we systematically validated the performance of density functional theory (DFT) and the effects of anharmonicity in simulating C/N/O K-edge vibrationally-resolved X-ray spectra of common diatomic molecules. To get ``transparent'' validations, vibronic fine structures of only the lowest 1s excited or ionized state in the X-ray absorption (XAS) or photoelectron (XPS) spectra were investigated. All 6 systems (N$_2$, N$_2^+$; NO, NO$^+$; CO, CO$^+$) were studied within the harmonic oscillator (HO) approximation using DFT with four functionals (BLYP, BP86, B3LYP, M06-2X) for 10 XAS and 4 XPS spectra, and excellent agreement between theoretical and experimental spectra was found in most systems, except O1s XAS of NO, CO, and NO$^+$. We analyzed and established a connection between their complex vibronic structures (many weak oscillating features within a broad peak) and the significant geometrical changes induced by the O1s hole. The three spectra were well reproduced with anharmonic (AH) calculations by using quantum wavepacket dynamics based on potential energy curves (PECs) generated by DFT methods or multiconfigurational levels, highlighting sensitivity to the anharmonic effect and the PEC quality. In other examples of XAS (CO$^+$, C1s and O1s; NO, N1s) corresponding to smaller structural changes, HO and AH approaches lead to similar fine structures, which are dominated by 0-0 and 0-1 transitions. This study highlights the use of DFT with selected functionals for such diatomic calculations due to its easy execution and generally reliable accuracy. Functional dependence in diatomic systems is generally more pronounced than in polyatomic ones. We found that BLYP, BP86, and B3LYP functionals consistently exhibited high accuracy in predicting spectral profiles, bond lengths, and vibrational frequencies, which slightly outperformed M06-2X.
△ Less
Submitted 20 November, 2023; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Pick the Best Pre-trained Model: Towards Transferability Estimation for Medical Image Segmentation
Authors:
Yuncheng Yang,
Meng Wei,
Junjun He,
Jie Yang,
Jin Ye,
Yun Gu
Abstract:
Transfer learning is a critical technique in training deep neural networks for the challenging medical image segmentation task that requires enormous resources. With the abundance of medical image data, many research institutions release models trained on various datasets that can form a huge pool of candidate source models to choose from. Hence, it's vital to estimate the source models' transfera…
▽ More
Transfer learning is a critical technique in training deep neural networks for the challenging medical image segmentation task that requires enormous resources. With the abundance of medical image data, many research institutions release models trained on various datasets that can form a huge pool of candidate source models to choose from. Hence, it's vital to estimate the source models' transferability (i.e., the ability to generalize across different downstream tasks) for proper and efficient model reuse. To make up for its deficiency when applying transfer learning to medical image segmentation, in this paper, we therefore propose a new Transferability Estimation (TE) method. We first analyze the drawbacks of using the existing TE algorithms for medical image segmentation and then design a source-free TE framework that considers both class consistency and feature variety for better estimation. Extensive experiments show that our method surpasses all current algorithms for transferability estimation in medical image segmentation. Code is available at https://github.com/EndoluminalSurgicalVision-IMR/CCFV
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
In Defense of Clip-based Video Relation Detection
Authors:
Meng Wei,
Long Chen,
Wei Ji,
Xiaoyu Yue,
Roger Zimmermann
Abstract:
Video Visual Relation Detection (VidVRD) aims to detect visual relationship triplets in videos using spatial bounding boxes and temporal boundaries. Existing VidVRD methods can be broadly categorized into bottom-up and top-down paradigms, depending on their approach to classifying relations. Bottom-up methods follow a clip-based approach where they classify relations of short clip tubelet pairs an…
▽ More
Video Visual Relation Detection (VidVRD) aims to detect visual relationship triplets in videos using spatial bounding boxes and temporal boundaries. Existing VidVRD methods can be broadly categorized into bottom-up and top-down paradigms, depending on their approach to classifying relations. Bottom-up methods follow a clip-based approach where they classify relations of short clip tubelet pairs and then merge them into long video relations. On the other hand, top-down methods directly classify long video tubelet pairs. While recent video-based methods utilizing video tubelets have shown promising results, we argue that the effective modeling of spatial and temporal context plays a more significant role than the choice between clip tubelets and video tubelets. This motivates us to revisit the clip-based paradigm and explore the key success factors in VidVRD. In this paper, we propose a Hierarchical Context Model (HCM) that enriches the object-based spatial context and relation-based temporal context based on clips. We demonstrate that using clip tubelets can achieve superior performance compared to most video-based methods. Additionally, using clip tubelets offers more flexibility in model designs and helps alleviate the limitations associated with video tubelets, such as the challenging long-term object tracking problem and the loss of temporal information in long-term tubelet feature compression. Extensive experiments conducted on two challenging VidVRD benchmarks validate that our HCM achieves a new state-of-the-art performance, highlighting the effectiveness of incorporating advanced spatial and temporal context modeling within the clip-based paradigm.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator
Authors:
Zhe Zhu,
Honghua Chen,
Xing He,
Weiming Wang,
Jing Qin,
Mingqiang Wei
Abstract:
In this paper, we propose a novel network, SVDFormer, to tackle two specific challenges in point cloud completion: understanding faithful global shapes from incomplete point clouds and generating high-accuracy local structures. Current methods either perceive shape patterns using only 3D coordinates or import extra images with well-calibrated intrinsic parameters to guide the geometry estimation o…
▽ More
In this paper, we propose a novel network, SVDFormer, to tackle two specific challenges in point cloud completion: understanding faithful global shapes from incomplete point clouds and generating high-accuracy local structures. Current methods either perceive shape patterns using only 3D coordinates or import extra images with well-calibrated intrinsic parameters to guide the geometry estimation of the missing parts. However, these approaches do not always fully leverage the cross-modal self-structures available for accurate and high-quality point cloud completion. To this end, we first design a Self-view Fusion Network that leverages multiple-view depth image information to observe incomplete self-shape and generate a compact global shape. To reveal highly detailed structures, we then introduce a refinement module, called Self-structure Dual-generator, in which we incorporate learned shape priors and geometric self-similarities for producing new points. By perceiving the incompleteness of each point, the dual-path design disentangles refinement strategies conditioned on the structural type of each point. SVDFormer absorbs the wisdom of self-structures, avoiding any additional paired information such as color images with precisely calibrated camera intrinsic parameters. Comprehensive experiments indicate that our method achieves state-of-the-art performance on widely-used benchmarks. Code will be available at https://github.com/czvvd/SVDFormer.
△ Less
Submitted 12 August, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Graphene/silicon heterojunction for reconfigurable phase-relevant activation function in coherent optical neural networks
Authors:
Chuyu Zhong,
Kun Liao,
Tianxiang Dai,
Maoliang Wei,
Hui Ma,
Jianghong Wu,
Zhibin Zhang,
Yuting Ye,
Ye Luo,
Zequn Chen,
Jialing Jian,
Chulei Sun,
Bo Tang,
Peng Zhang,
Ruonan Liu,
Junying Li,
Jianyi Yang,
Lan Li,
Kaihui Liu,
Xiaoyong Hu,
Hongtao Lin
Abstract:
Optical neural networks (ONNs) herald a new era in information and communication technologies and have implemented various intelligent applications. In an ONN, the activation function (AF) is a crucial component determining the network performances and on-chip AF devices are still in development. Here, we first demonstrate on-chip reconfigurable AF devices with phase activation fulfilled by dual-f…
▽ More
Optical neural networks (ONNs) herald a new era in information and communication technologies and have implemented various intelligent applications. In an ONN, the activation function (AF) is a crucial component determining the network performances and on-chip AF devices are still in development. Here, we first demonstrate on-chip reconfigurable AF devices with phase activation fulfilled by dual-functional graphene/silicon (Gra/Si) heterojunctions. With optical modulation and detection in one device, time delays are shorter, energy consumption is lower, reconfigurability is higher and the device footprint is smaller than other on-chip AF strategies. The experimental modulation voltage (power) of our Gra/Si heterojunction achieves as low as 1 V (0.5 mW), superior to many pure silicon counterparts. In the photodetection aspect, a high responsivity of over 200 mA/W is realized. Special nonlinear functions generated are fed into a complex-valued ONN to challenge handwritten letters and image recognition tasks, showing improved accuracy and potential of high-efficient, all-component-integration on-chip ONN. Our results offer new insights for on-chip ONN devices and pave the way to high-performance integrated optoelectronic computing circuits.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events
Authors:
Yu Gu,
Sheng Zhang,
Naoto Usuyama,
Yonas Woldesenbet,
Cliff Wong,
Praneeth Sanapathi,
Mu Wei,
Naveen Valluri,
Erika Strandberg,
Tristan Naumann,
Hoifung Poon
Abstract:
Large language models (LLMs), such as GPT-4, have demonstrated remarkable capabilities across a wide range of tasks, including health applications. In this paper, we study how LLMs can be used to scale biomedical knowledge curation. We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised le…
▽ More
Large language models (LLMs), such as GPT-4, have demonstrated remarkable capabilities across a wide range of tasks, including health applications. In this paper, we study how LLMs can be used to scale biomedical knowledge curation. We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised learning, substantial gains can be attained over out-of-box LLMs, with additional advantages such as cost, efficiency, and white-box model access.
We conduct a case study on adverse drug event (ADE) extraction, which is an important area for improving care. On standard ADE extraction evaluation, a GPT-3.5 distilled PubMedBERT model attained comparable accuracy as supervised state-of-the-art models without using any labeled data. Despite being over 1,000 times smaller, the distilled model outperformed its teacher GPT-3.5 by over 6 absolute points in F1 and GPT-4 by over 5 absolute points.
Ablation studies on distillation model choice (e.g., PubMedBERT vs BioGPT) and ADE extraction architecture shed light on best practice for biomedical knowledge extraction. Similar gains were attained by distillation for other standard biomedical knowledge extraction tasks such as gene-disease associations and protected health information, further illustrating the promise of this approach.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Vibronic fine structure in the nitrogen 1s photoelectron spectra from Franck-Condon simulations II: Indoles
Authors:
Minrui Wei,
Lu Zhang,
Guangjun Tian,
Weijie Hua
Abstract:
The vibronic coupling effect in nitrogen 1s X-ray photoelectron spectra (XPS) was systematically studied for a family of 17 bicyclic indole molecules by combining Franck-Condon simulations (including the Duschinsky rotation effect) and density functional theory. The simulated vibrationally-resolved spectra of 4 molecules agree well with available experiments. Reliable predictions for this family f…
▽ More
The vibronic coupling effect in nitrogen 1s X-ray photoelectron spectra (XPS) was systematically studied for a family of 17 bicyclic indole molecules by combining Franck-Condon simulations (including the Duschinsky rotation effect) and density functional theory. The simulated vibrationally-resolved spectra of 4 molecules agree well with available experiments. Reliable predictions for this family further allowed us to summarize rules for spectral evolution in response to three types of common structural changes (side chain substitution, CH$\leftrightarrow$N replacement, and isomerization). Interestingly, vibronic properties of amine and imine nitrogen are clearly separated: they show negative and positive $Δ$ZPE (zero-point vibration energy of the core-ionized with respect to the ground state), respectively, indicating flatter and steeper PESs induced by the N 1s ionization; amine N's show stronger mode mixing effects than imine N's; the 1s ionizations on two types of nitrogens led to distinct changes in local bond lengths and angles. The rules are useful for a basic understanding of vibronic coupling in this family, and the precise spectra are useful for future reference and data mining studies.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
In Silico Tools in PROTACs design
Authors:
Mengman Wei
Abstract:
PROTACs, as a highly promising new. therapeutic paradigm, have attracted widespread attention from the academic and pharmaceutical communities in recent years. To date, the design and validation of PROTACs molecule's druggability primarily rely on experimental approaches, making the development process of this kind of drug molecule time-consuming. Computer-aided tools for PROTACs design may offer…
▽ More
PROTACs, as a highly promising new. therapeutic paradigm, have attracted widespread attention from the academic and pharmaceutical communities in recent years. To date, the design and validation of PROTACs molecule's druggability primarily rely on experimental approaches, making the development process of this kind of drug molecule time-consuming. Computer-aided tools for PROTACs design may offer a potential solution to expedite the design process and enhance its efficiency. This mini review briefly summarizes the in silico tools for PROTACs drug molecule design reported recently.
△ Less
Submitted 9 July, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Automatic Unit Test Generation for Deep Learning Frameworks based on API Knowledge
Authors:
Arunkaleeshwaran Narayanan,
Nima Shiri harzevili,
Junjie Wang,
Lin Shi,
Moshi Wei,
Song Wang
Abstract:
Many automatic unit test generation tools that can generate unit test cases with high coverage over a program have been proposed. However, most of these tools are ineffective on deep learning (DL) frameworks due to the fact that many of deep learning APIs expect inputs that follow specific API knowledge. To fill this gap, we propose MUTester to generate unit test cases for APIs of deep learning fr…
▽ More
Many automatic unit test generation tools that can generate unit test cases with high coverage over a program have been proposed. However, most of these tools are ineffective on deep learning (DL) frameworks due to the fact that many of deep learning APIs expect inputs that follow specific API knowledge. To fill this gap, we propose MUTester to generate unit test cases for APIs of deep learning frameworks by leveraging the API constraints mined from the corresponding API documentation and the API usage patterns mined from code fragments in Stack Overflow (SO). Particularly, we first propose a set of 18 rules for mining API constraints from the API documents. We then use the frequent itemset mining technique to mine the API usage patterns from a large corpus of machine learning API related code fragments collected from SO. Finally, we use the above two types of API knowledge to guide the test generation of existing test generators for deep learning frameworks. To evaluate the performance of MUTester, we first collect 1,971 APIs from four widely-used deep learning frameworks (i.e., Scikit-learn, PyTorch, TensorFlow, and CNTK) and for each API, we further extract its API knowledge, i.e., API constraints and API usage. Given an API, MUTester combines its API knowledge with existing test generators (e.g., search-based test generator PyEvosuite and random test generator PyRandoop) to generate test cases to test the API. Results of our experiment show that MUTester can significantly improve the corresponding test generation methods and the improvement in code coverage is 15.7% to 27.0% on average. In addition, it can help reduce around 19.0% of invalid tests generated by the existing test generators. Our user study with 16 developers further demonstrates the practicality of MUTester in generating test cases for deep learning frameworks.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Complete bipartite graphs without small rainbow stars
Authors:
Weizhen Chen,
Meng Ji,
Yaping Mao,
Meiqin Wei
Abstract:
The $k$-edge-colored bipartite Gallai-Ramsey number $\operatorname{bgr}_k(G:H)$ is defined as the minimum integer $n$ such that $n^2\geq k$ and for every $N\geq n$, every edge-coloring (using all $k$ colors) of complete bipartite graph $K_{N,N}$ contains a rainbow copy of $G$ or a monochromatic copy of $H$. In this paper, we first study the structural theorem on the complete bipartite graph…
▽ More
The $k$-edge-colored bipartite Gallai-Ramsey number $\operatorname{bgr}_k(G:H)$ is defined as the minimum integer $n$ such that $n^2\geq k$ and for every $N\geq n$, every edge-coloring (using all $k$ colors) of complete bipartite graph $K_{N,N}$ contains a rainbow copy of $G$ or a monochromatic copy of $H$. In this paper, we first study the structural theorem on the complete bipartite graph $K_{n,n}$ with no rainbow copy of $K_{1,3}$. Next, we utilize the results to prove the exact values of $\operatorname{bgr}_{k}(P_4: H)$, $\operatorname{bgr}_{k}(P_5: H)$, $\operatorname{bgr}_{k}(K_{1,3}: H)$, where $H$ is a various union of cycles and paths and stars.
△ Less
Submitted 13 December, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Pareto Optimal Learning for Estimating Large Language Model Errors
Authors:
Theodore Zhao,
Mu Wei,
J. Samuel Preston,
Hoifung Poon
Abstract:
Large Language Models (LLMs) have shown impressive abilities in many applications. When a concrete and precise answer is desired, it is important to have a quantitative estimation of the potential error rate. However, this can be challenging due to the text-in-text-out nature of generative models. We present a method based on Pareto optimization that generates a risk score to estimate the probabil…
▽ More
Large Language Models (LLMs) have shown impressive abilities in many applications. When a concrete and precise answer is desired, it is important to have a quantitative estimation of the potential error rate. However, this can be challenging due to the text-in-text-out nature of generative models. We present a method based on Pareto optimization that generates a risk score to estimate the probability of error in an LLM response by integrating multiple sources of information. We prove theoretically that the error estimator optimized in our framework aligns with the LLM and the information sources in an Pareto optimal manner. Experimental results show that the risk scores estimated by our method are well correlated with the true LLM error rate, thus facilitating error correction. By dynamically combining with prompting strategies such as self-verification and information retrieval, we demonstrate the proposed method can be utilized to increase the performance of an LLM, surpassing state-of-the-art task specific models.
△ Less
Submitted 22 May, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
BBCA-LEDGER: High Throughput Consensus meets Low Latency
Authors:
Chrysoula Stathakopoulou,
Michael Wei,
Maofan Yin,
Hongbo Zhang,
Dahlia Malkhi
Abstract:
This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks.…
▽ More
This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks. Finalizing log slots in parallel achieves both high throughput and low latency. BBCA-LEDGER is composed of two principal protocols that interweave together, a low-latency/high-throughput happy path, and a high-throughput DAG-based fallback path. The happy path employs a novel primitive called BBCA, a consistent broadcast enforcing unique slot numbering. In steady state, BBCA ensures that a transaction can be committed with low latency, in just 3 network steps. Under network partitions or faults, we harness recent advances in BFT and build a fallback mechanism on a direct acyclic graph (DAG) created by BBCA broadcasts. In this manner, BBCA-LEDGER exhibits the throughput benefits of DAG-based BFT in face of gaps.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs
Authors:
Lance Ying,
Katherine M. Collins,
Megan Wei,
Cedegao E. Zhang,
Tan Zhi-Xuan,
Adrian Weller,
Joshua B. Tenenbaum,
Lionel Wong
Abstract:
Human beings are social creatures. We routinely reason about other agents, and a crucial component of this social reasoning is inferring people's goals as we learn about their actions. In many settings, we can perform intuitive but reliable goal inference from language descriptions of agents, actions, and the background environments. In this paper, we study this process of language driving and inf…
▽ More
Human beings are social creatures. We routinely reason about other agents, and a crucial component of this social reasoning is inferring people's goals as we learn about their actions. In many settings, we can perform intuitive but reliable goal inference from language descriptions of agents, actions, and the background environments. In this paper, we study this process of language driving and influencing social reasoning in a probabilistic goal inference domain. We propose a neuro-symbolic model that carries out goal inference from linguistic inputs of agent scenarios. The "neuro" part is a large language model (LLM) that translates language descriptions to code representations, and the "symbolic" part is a Bayesian inverse planning engine. To test our model, we design and run a human experiment on a linguistic goal inference task. Our model closely matches human response patterns and better predicts human judgements than using an LLM alone.
△ Less
Submitted 27 June, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
Cloud Behaviour on Tidally Locked Rocky Planets from Global High-resolution Modeling
Authors:
Jun Yang,
Yixiao Zhang,
Zuntao Fu,
Mingyu Yan,
Xinyi Song,
Mengyu Wei,
Jiachen Liu,
Feng Ding,
Zhihong Tan
Abstract:
Determining the behaviour of convection and clouds is one of the biggest challenges in our understanding of exoplanetary climates. Given the lack of in situ observations, one of the most preferable approaches is to use cloud-resolving or cloud-permitting models (CPM). Here we present CPM simulations in a quasi-global domain with high spatial resolution (4$\times$4 km grid) and explicit convection…
▽ More
Determining the behaviour of convection and clouds is one of the biggest challenges in our understanding of exoplanetary climates. Given the lack of in situ observations, one of the most preferable approaches is to use cloud-resolving or cloud-permitting models (CPM). Here we present CPM simulations in a quasi-global domain with high spatial resolution (4$\times$4 km grid) and explicit convection to study the cloud regime of 1 to 1 tidally locked rocky planets orbiting around low-mass stars. We show that the substellar region is covered by deep convective clouds and cloud albedo increases with increasing stellar flux. The CPM produces relatively less cloud liquid water concentration, smaller cloud coverage, lower cloud albedo, and deeper H2O spectral features than previous general circulation model (GCM) simulations employing empirical convection and cloud parameterizations. Furthermore, cloud streets--long bands of low-level clouds oriented nearly parallel to the direction of the mean boundary-layer winds--appear in the CPM and substantially affect energy balance and surface precipitation at a local level.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Text Promptable Surgical Instrument Segmentation with Vision-Language Models
Authors:
Zijian Zhou,
Oluwatosin Alabi,
Meng Wei,
Tom Vercauteren,
Miaojing Shi
Abstract:
In this paper, we propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments in minimally invasive surgeries. We redefine the task as text promptable, thereby enabling a more nuanced comprehension of surgical instruments and adaptability to new instrument types. Inspired by recent advancemen…
▽ More
In this paper, we propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments in minimally invasive surgeries. We redefine the task as text promptable, thereby enabling a more nuanced comprehension of surgical instruments and adaptability to new instrument types. Inspired by recent advancements in vision-language models, we leverage pretrained image and text encoders as our model backbone and design a text promptable mask decoder consisting of attention- and convolution-based prompting schemes for surgical instrument segmentation prediction. Our model leverages multiple text prompts for each surgical instrument through a new mixture of prompts mechanism, resulting in enhanced segmentation performance. Additionally, we introduce a hard instrument area reinforcement module to improve image feature comprehension and segmentation precision. Extensive experiments on several surgical instrument segmentation datasets demonstrate our model's superior performance and promising generalization capability. To our knowledge, this is the first implementation of a promptable approach to surgical instrument segmentation, offering significant potential for practical application in the field of robotic-assisted surgery. Code is available at https://github.com/franciszzj/TP-SIS.
△ Less
Submitted 8 November, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Security Knowledge-Guided Fuzzing of Deep Learning Libraries
Authors:
Nima Shiri Harzevili,
Mohammad Mahdi Mohajer,
Moshi Wei,
Hung Viet Pham,
Song Wang
Abstract:
Recently, many Deep Learning fuzzers have been proposed for testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only support a limited set of corner case test inputs. Furthermore, a substantial number of developer APIs crucial for library development remain untested, as they are typ…
▽ More
Recently, many Deep Learning fuzzers have been proposed for testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only support a limited set of corner case test inputs. Furthermore, a substantial number of developer APIs crucial for library development remain untested, as they are typically not well-documented and lack clear usage guidelines.
To fill this gap, we propose a novel fuzzer named Orion, which combines guided test input generation and corner case test input generation based on a set of fuzzing rules constructed from historical data that is known to trigger vulnerabilities in the implementation of DL APIs. To extract the fuzzing rules, we first conduct an empirical study regarding the root cause analysis of 376 vulnerabilities in two of the most popular DL libraries, i.e., PyTorch and TensorFlow. We then construct the rules based on the root causes of the historical vulnerabilities.
Our evaluation shows that Orion reports 135 vulnerabilities on the latest releases of TensorFlow and PyTorch, 76 of which were confirmed by the library developers. Among the 76 confirmed vulnerabilities, 69 are previously unknown, and 7 have already been fixed. The rest are awaiting further confirmation. Regarding end-user APIs, Orion was able to detect 31.8% and 90% more vulnerabilities on TensorFlow and PyTorch, respectively, compared to the state-of-the-art conventional fuzzer, i.e., DeepRel. When compared to the state-of-the-art LLM-based DL fuzzer, AtlasFuzz, Orion detected 13.63% more vulnerabilities on TensorFlow and 18.42% more vulnerabilities on PyTorch. Regarding developer APIs, Orion stands out by detecting 117% more vulnerabilities on TensorFlow and 100% more vulnerabilities on PyTorch compared to the most relevant fuzzer designed for developer APIs, such as FreeFuzz.
△ Less
Submitted 24 December, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
The First LHAASO Catalog of Gamma-Ray Sources
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
We present the first catalog of very-high energy and ultra-high energy gamma-ray sources detected by the Large High Altitude Air Shower Observatory (LHAASO). The catalog was compiled using 508 days of data collected by the Water Cherenkov Detector Array (WCDA) from March 2021 to September 2022 and 933 days of data recorded by the Kilometer Squared Array (KM2A) from January 2020 to September 2022.…
▽ More
We present the first catalog of very-high energy and ultra-high energy gamma-ray sources detected by the Large High Altitude Air Shower Observatory (LHAASO). The catalog was compiled using 508 days of data collected by the Water Cherenkov Detector Array (WCDA) from March 2021 to September 2022 and 933 days of data recorded by the Kilometer Squared Array (KM2A) from January 2020 to September 2022. This catalog represents the main result from the most sensitive large coverage gamma-ray survey of the sky above 1 TeV, covering declination from $-$20$^{\circ}$ to 80$^{\circ}$. In total, the catalog contains 90 sources with an extended size smaller than $2^\circ$ and a significance of detection at $> 5σ$. Based on our source association criteria, 32 new TeV sources are proposed in this study. Among the 90 sources, 43 sources are detected with ultra-high energy ($E > 100$ TeV) emission at $> 4σ$ significance level. We provide the position, extension, and spectral characteristics of all the sources in this catalog.
△ Less
Submitted 27 November, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Localization of chiral edge states by the non-Hermitian skin effect
Authors:
Gui-Geng Liu,
Subhaskar Mandal,
Peiheng Zhou,
Xiang Xi,
Rimi Banerjee,
Yuan-Hang Hu,
Minggui Wei,
Maoren Wang,
Qiang Wang,
Zhen Gao,
Hongsheng Chen,
Yihao Yang,
Yidong Chong,
Baile Zhang
Abstract:
Quantum Hall systems host chiral edge states extending along the one-dimensional boundary of any two-dimensional sample. In solid state materials, the edge states serve as perfectly robust transport channels that produce a quantised Hall conductance; due to their chirality, and the topological protection by the Chern number of the bulk bandstructure, they cannot be spatially localized by defects o…
▽ More
Quantum Hall systems host chiral edge states extending along the one-dimensional boundary of any two-dimensional sample. In solid state materials, the edge states serve as perfectly robust transport channels that produce a quantised Hall conductance; due to their chirality, and the topological protection by the Chern number of the bulk bandstructure, they cannot be spatially localized by defects or disorder. Here, we show experimentally that the chiral edge states of a lossy quantum Hall system can be localized. In a gyromagnetic photonic crystal exhibiting the quantum Hall topological phase, an appropriately structured loss configuration imparts the edge states' complex energy spectrum with a feature known as point-gap winding. This intrinsically non-Hermitian topological invariant is distinct from the Chern number invariant of the bulk (which remains intact) and induces mode localization via the "non-Hermitian skin effect". The interplay of the two topological phenomena - the Chern number and point-gap winding - gives rise to a non-Hermitian generalisation of the paradigmatic Chern-type bulk-boundary correspondence principle. Compared to previous realisations of the non-Hermitian skin effect, the skin modes in this system have superior robustness against local defects and disorders.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI
Authors:
Luhang Sun,
Mian Wei,
Yibing Sun,
Yoo Ji Suh,
Liwei Shen,
Sijia Yang
Abstract:
Generative AI models like DALL-E 2 can interpret textual prompts and generate high-quality images exhibiting human creativity. Though public enthusiasm is booming, systematic auditing of potential gender biases in AI-generated images remains scarce. We addressed this gap by examining the prevalence of two occupational gender biases (representational and presentational biases) in 15,300 DALL-E 2 im…
▽ More
Generative AI models like DALL-E 2 can interpret textual prompts and generate high-quality images exhibiting human creativity. Though public enthusiasm is booming, systematic auditing of potential gender biases in AI-generated images remains scarce. We addressed this gap by examining the prevalence of two occupational gender biases (representational and presentational biases) in 15,300 DALL-E 2 images spanning 153 occupations, and assessed potential bias amplification by benchmarking against 2021 census labor statistics and Google Images. Our findings reveal that DALL-E 2 underrepresents women in male-dominated fields while overrepresenting them in female-dominated occupations. Additionally, DALL-E 2 images tend to depict more women than men with smiling faces and downward-pitching heads, particularly in female-dominated (vs. male-dominated) occupations. Our computational algorithm auditing study demonstrates more pronounced representational and presentational biases in DALL-E 2 compared to Google Images and calls for feminist interventions to prevent such bias-laden AI-generated images to feedback into the media ecology.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
The Good, the Bad, and the Missing: Neural Code Generation for Machine Learning Tasks
Authors:
Jiho Shin,
Moshi Wei,
Junjie Wang,
Lin Shi,
Song Wang
Abstract:
Machine learning (ML) has been increasingly used in a variety of domains, while solving ML programming tasks poses unique challenges because of the fundamentally different nature and construction from general programming tasks, especially for developers who do not have ML backgrounds. Automatic code generation that produces a code snippet from a natural language description can be a promising tech…
▽ More
Machine learning (ML) has been increasingly used in a variety of domains, while solving ML programming tasks poses unique challenges because of the fundamentally different nature and construction from general programming tasks, especially for developers who do not have ML backgrounds. Automatic code generation that produces a code snippet from a natural language description can be a promising technique to accelerate ML programming tasks. In recent years, although many deep learning-based neural code generation models have been proposed with high accuracy, the fact that most of them are mainly evaluated on general programming tasks calls into question their effectiveness and usefulness in ML programming tasks. In this paper, we set out to investigate the effectiveness of existing neural code generation models on ML programming tasks. For our analysis, we select six state-of-the-art neural code generation models, and evaluate their performance on four widely used ML libraries, with newly-created 83K pairs of natural-language described ML programming tasks. Our empirical study reveals some good, bad, and missing aspects of neural code generation models on ML tasks, with a few major ones listed below. (Good) Neural code generation models perform significantly better on ML tasks than on non-ML tasks. (Bad) Most of the generated code is semantically incorrect. (Bad) Code generation models cannot significantly improve developers' completion time. (Good) The generated code can help developers write more correct code by providing developers with clues for using correct APIs. (Missing) The observation from our user study reveals the missing aspects of code generation for ML tasks, e.g., decomposing code generation for divide-and-conquer into two tasks: API sequence identification and API usage generation.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Measurement of ultra-high-energy diffuse gamma-ray emission of the Galactic plane from 10 TeV to 1 PeV with LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The diffuse Galactic $γ$-ray emission, mainly produced via interactions between cosmic rays and the interstellar medium and/or radiation field, is a very important probe of the distribution, propagation, and interaction of cosmic rays in the Milky Way. In this work we report the measurements of diffuse $γ$-rays from the Galactic plane between 10 TeV and 1 PeV energies, with the square kilometer ar…
▽ More
The diffuse Galactic $γ$-ray emission, mainly produced via interactions between cosmic rays and the interstellar medium and/or radiation field, is a very important probe of the distribution, propagation, and interaction of cosmic rays in the Milky Way. In this work we report the measurements of diffuse $γ$-rays from the Galactic plane between 10 TeV and 1 PeV energies, with the square kilometer array of the Large High Altitude Air Shower Observatory (LHAASO). Diffuse emissions from the inner ($15^{\circ}<l<125^{\circ}$, $|b|<5^{\circ}$) and outer ($125^{\circ}<l<235^{\circ}$, $|b|<5^{\circ}$) Galactic plane are detected with $29.1σ$ and $12.7σ$ significance, respectively. The outer Galactic plane diffuse emission is detected for the first time in the very- to ultra-high-energy domain ($E>10$~TeV). The energy spectrum in the inner Galaxy regions can be described by a power-law function with an index of $-2.99\pm0.04$, which is different from the curved spectrum as expected from hadronic interactions between locally measured cosmic rays and the line-of-sight integrated gas content. Furthermore, the measured flux is higher by a factor of $\sim3$ than the prediction. A similar spectrum with an index of $-2.99\pm0.07$ is found in the outer Galaxy region, and the absolute flux for $10\lesssim E\lesssim60$ TeV is again higher than the prediction for hadronic cosmic ray interactions. The latitude distributions of the diffuse emission are consistent with the gas distribution, while the longitude distributions show clear deviation from the gas distribution. The LHAASO measurements imply that either additional emission sources exist or cosmic ray intensities have spatial variations.
△ Less
Submitted 19 August, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Don't worry about mistakes! Glass Segmentation Network via Mistake Correction
Authors:
Chengyu Zheng,
Peng Li,
Xiao-Ping Zhang,
Xuequan Lu,
Mingqiang Wei
Abstract:
Recall one time when we were in an unfamiliar mall. We might mistakenly think that there exists or does not exist a piece of glass in front of us. Such mistakes will remind us to walk more safely and freely at the same or a similar place next time. To absorb the human mistake correction wisdom, we propose a novel glass segmentation network to detect transparent glass, dubbed GlassSegNet. Motivated…
▽ More
Recall one time when we were in an unfamiliar mall. We might mistakenly think that there exists or does not exist a piece of glass in front of us. Such mistakes will remind us to walk more safely and freely at the same or a similar place next time. To absorb the human mistake correction wisdom, we propose a novel glass segmentation network to detect transparent glass, dubbed GlassSegNet. Motivated by this human behavior, GlassSegNet utilizes two key stages: the identification stage (IS) and the correction stage (CS). The IS is designed to simulate the detection procedure of human recognition for identifying transparent glass by global context and edge information. The CS then progressively refines the coarse prediction by correcting mistake regions based on gained experience. Extensive experiments show clear improvements of our GlassSegNet over thirty-four state-of-the-art methods on three benchmark datasets.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Measurement of the cosmic p+He energy spectrum from 50 GeV to 0.5 PeV with the DAMPE space mission
Authors:
DAMPE Collaboration,
F. Alemanno,
C. Altomare,
Q. An,
P. Azzarello,
F. C. T. Barbato,
P. Bernardini,
X. J. Bi,
I. Cagnoli,
M. S. Cai,
E. Casilli,
E. Catanzani,
J. Chang,
D. Y. Chen,
J. L. Chen,
Z. F. Chen,
P. Coppin,
M. Y. Cui,
T. S. Cui,
Y. X. Cui,
H. T. Dai,
A. De Benedittis,
I. De Mitri,
F. de Palma,
M. Deliyergiyev
, et al. (130 additional authors not shown)
Abstract:
Recent observations of the light component of the cosmic-ray spectrum have revealed unexpected features that motivate further and more precise measurements up to the highest energies. The Dark Matter Particle Explorer is a satellite-based cosmic-ray experiment that has been operational since December 2015, continuously collecting data on high-energy cosmic particles with very good statistics, ener…
▽ More
Recent observations of the light component of the cosmic-ray spectrum have revealed unexpected features that motivate further and more precise measurements up to the highest energies. The Dark Matter Particle Explorer is a satellite-based cosmic-ray experiment that has been operational since December 2015, continuously collecting data on high-energy cosmic particles with very good statistics, energy resolution, and particle identification capabilities. In this work, the latest measurements of the energy spectrum of proton+helium in the energy range from 46 GeV to 464 TeV are presented. Among the most distinctive features of the spectrum, a spectral hardening at 600 GeV has been observed, along with a softening at 29 TeV measured with a 6.6σ significance. Moreover, the detector features and the analysis approach allowed for the extension of the spectral measurement up to the sub-PeV region. Even if with small statistical significance due to the low number of events, data suggest a new spectral hardening at about 150 TeV.
△ Less
Submitted 14 August, 2024; v1 submitted 31 March, 2023;
originally announced April 2023.
-
Joint Depth Estimation and Mixture of Rain Removal From a Single Image
Authors:
Yongzhen Wang,
Xuefeng Yan,
Yanbiao Niu,
Lina Gong,
Yanwen Guo,
Mingqiang Wei
Abstract:
Rainy weather significantly deteriorates the visibility of scene objects, particularly when images are captured through outdoor camera lenses or windshields. Through careful observation of numerous rainy photos, we have found that the images are generally affected by various rainwater artifacts such as raindrops, rain streaks, and rainy haze, which impact the image quality from both near and far d…
▽ More
Rainy weather significantly deteriorates the visibility of scene objects, particularly when images are captured through outdoor camera lenses or windshields. Through careful observation of numerous rainy photos, we have found that the images are generally affected by various rainwater artifacts such as raindrops, rain streaks, and rainy haze, which impact the image quality from both near and far distances, resulting in a complex and intertwined process of image degradation. However, current deraining techniques are limited in their ability to address only one or two types of rainwater, which poses a challenge in removing the mixture of rain (MOR). In this study, we propose an effective image deraining paradigm for Mixture of rain REmoval, called DEMore-Net, which takes full account of the MOR effect. Going beyond the existing deraining wisdom, DEMore-Net is a joint learning paradigm that integrates depth estimation and MOR removal tasks to achieve superior rain removal. The depth information can offer additional meaningful guidance information based on distance, thus better helping DEMore-Net remove different types of rainwater. Moreover, this study explores normalization approaches in image deraining tasks and introduces a new Hybrid Normalization Block (HNB) to enhance the deraining performance of DEMore-Net. Extensive experiments conducted on synthetic datasets and real-world MOR photos fully validate the superiority of the proposed DEMore-Net. Code is available at https://github.com/yz-wang/DEMore-Net.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
STCF Conceptual Design Report: Volume 1 -- Physics & Detector
Authors:
M. Achasov,
X. C. Ai,
R. Aliberti,
L. P. An,
Q. An,
X. Z. Bai,
Y. Bai,
O. Bakina,
A. Barnyakov,
V. Blinov,
V. Bobrovnikov,
D. Bodrov,
A. Bogomyagkov,
A. Bondar,
I. Boyko,
Z. H. Bu,
F. M. Cai,
H. Cai,
J. J. Cao,
Q. H. Cao,
Z. Cao,
Q. Chang,
K. T. Chao,
D. Y. Chen,
H. Chen
, et al. (413 additional authors not shown)
Abstract:
The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,…
▽ More
The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies.
△ Less
Submitted 5 October, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Search By Image: Deeply Exploring Beneficial Features for Beauty Product Retrieval
Authors:
Mingqiang Wei,
Qian Sun,
Haoran Xie,
Dong Liang,
Fu Lee Wang
Abstract:
Searching by image is popular yet still challenging due to the extensive interference arose from i) data variations (e.g., background, pose, visual angle, brightness) of real-world captured images and ii) similar images in the query dataset. This paper studies a practically meaningful problem of beauty product retrieval (BPR) by neural networks. We broadly extract different types of image features…
▽ More
Searching by image is popular yet still challenging due to the extensive interference arose from i) data variations (e.g., background, pose, visual angle, brightness) of real-world captured images and ii) similar images in the query dataset. This paper studies a practically meaningful problem of beauty product retrieval (BPR) by neural networks. We broadly extract different types of image features, and raise an intriguing question that whether these features are beneficial to i) suppress data variations of real-world captured images, and ii) distinguish one image from others which look very similar but are intrinsically different beauty products in the dataset, therefore leading to an enhanced capability of BPR. To answer it, we present a novel variable-attention neural network to understand the combination of multiple features (termed VM-Net) of beauty product images. Considering that there are few publicly released training datasets for BPR, we establish a new dataset with more than one million images classified into more than 20K categories to improve both the generalization and anti-interference abilities of VM-Net and other methods. We verify the performance of VM-Net and its competitors on the benchmark dataset Perfect-500K, where VM-Net shows clear improvements over the competitors in terms of MAP@7. The source code and dataset will be released upon publication.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
CMG-Net: An End-to-End Contact-Based Multi-Finger Dexterous Grasping Network
Authors:
Mingze Wei,
Yaomin Huang,
Zhiyuan Xu,
Ning Liu,
Zhengping Che,
Xinyu Zhang,
Chaomin Shen,
Feifei Feng,
Chun Shan,
Jian Tang
Abstract:
In this paper, we propose a novel representation for grasping using contacts between multi-finger robotic hands and objects to be manipulated. This representation significantly reduces the prediction dimensions and accelerates the learning process. We present an effective end-to-end network, CMG-Net, for grasping unknown objects in a cluttered environment by efficiently predicting multi-finger gra…
▽ More
In this paper, we propose a novel representation for grasping using contacts between multi-finger robotic hands and objects to be manipulated. This representation significantly reduces the prediction dimensions and accelerates the learning process. We present an effective end-to-end network, CMG-Net, for grasping unknown objects in a cluttered environment by efficiently predicting multi-finger grasp poses and hand configurations from a single-shot point cloud. Moreover, we create a synthetic grasp dataset that consists of five thousand cluttered scenes, 80 object categories, and 20 million annotations. We perform a comprehensive empirical study and demonstrate the effectiveness of our grasping representation and CMG-Net. Our work significantly outperforms the state-of-the-art for three-finger robotic hands. We also demonstrate that the model trained using synthetic data performs very well for real robots.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds
Authors:
Yun Liu,
Xuefeng Yan,
Zhilei Chen,
Zhiqi Li,
Zeyong Wei,
Mingqiang Wei
Abstract:
Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity and sparsity. We propose a geometrically and adaptively masked auto-encoder for self-supervised learning on point clouds, termed \textit{PointGame}. PointGame contains two core components: GATE…
▽ More
Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity and sparsity. We propose a geometrically and adaptively masked auto-encoder for self-supervised learning on point clouds, termed \textit{PointGame}. PointGame contains two core components: GATE and EAT. GATE stands for the geometrical and adaptive token embedding module; it not only absorbs the conventional wisdom of geometric descriptors that captures the surface shape effectively, but also exploits adaptive saliency to focus on the salient part of a point cloud. EAT stands for the external attention-based Transformer encoder with linear computational complexity, which increases the efficiency of the whole pipeline. Unlike cutting-edge unsupervised learning models, PointGame leverages geometric descriptors to perceive surface shapes and adaptively mines discriminative features from training data. PointGame showcases clear advantages over its competitors on various downstream tasks under both global and local fine-tuning strategies. The code and pre-trained models will be publicly available.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Super-Resolution Neural Operator
Authors:
Min Wei,
Xuesong Zhang
Abstract:
We propose Super-resolution Neural Operator (SRNO), a deep operator learning framework that can resolve high-resolution (HR) images at arbitrary scales from the low-resolution (LR) counterparts. Treating the LR-HR image pairs as continuous functions approximated with different grid sizes, SRNO learns the mapping between the corresponding function spaces. From the perspective of approximation theor…
▽ More
We propose Super-resolution Neural Operator (SRNO), a deep operator learning framework that can resolve high-resolution (HR) images at arbitrary scales from the low-resolution (LR) counterparts. Treating the LR-HR image pairs as continuous functions approximated with different grid sizes, SRNO learns the mapping between the corresponding function spaces. From the perspective of approximation theory, SRNO first embeds the LR input into a higher-dimensional latent representation space, trying to capture sufficient basis functions, and then iteratively approximates the implicit image function with a kernel integral mechanism, followed by a final dimensionality reduction step to generate the RGB representation at the target coordinates. The key characteristics distinguishing SRNO from prior continuous SR works are: 1) the kernel integral in each layer is efficiently implemented via the Galerkin-type attention, which possesses non-local properties in the spatial domain and therefore benefits the grid-free continuum; and 2) the multilayer attention architecture allows for the dynamic latent basis update, which is crucial for SR problems to "hallucinate" high-frequency information from the LR image. Experiments show that SRNO outperforms existing continuous SR methods in terms of both accuracy and running time. Our code is at https://github.com/2y7c3/Super-Resolution-Neural-Operator
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Spin alignment of vector mesons from quark dynamics in a rotating medium
Authors:
Minghua Wei,
Mei Huang
Abstract:
Vorticities in heavy-ion collisions (HICs) are supposed to induce spin alignment and polarization phenomena of quarks and mesons. In this work, we analyze the spin alignment of vector mesons $φ$ and $ρ$ induced by rotation from quark dynamics in the framework of the Nambu-Jona-Lasinio (NJL) model. The rotating angular velocity induces mass splitting of spin components for vector $φ,ρ$ mesons…
▽ More
Vorticities in heavy-ion collisions (HICs) are supposed to induce spin alignment and polarization phenomena of quarks and mesons. In this work, we analyze the spin alignment of vector mesons $φ$ and $ρ$ induced by rotation from quark dynamics in the framework of the Nambu-Jona-Lasinio (NJL) model. The rotating angular velocity induces mass splitting of spin components for vector $φ,ρ$ mesons $M_{φ,ρ}(Ω)\simeq M_{φ,ρ}(Ω=0)-s_{z}Ω$. This behavior contributes to the spin alignment of vector mesons $φ,ρ$ in an equilibrium medium and naturally explains the negative deviation of $ρ_{00}-1/3$ for vector mesons. Incidentally, the positive deviation of $ρ_{00}-1/3$ under the magnetic field can also be easily understood from quark dynamics.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Authors:
Sheng Zhang,
Yanbo Xu,
Naoto Usuyama,
Hanwen Xu,
Jaspreet Bagga,
Robert Tinn,
Sam Preston,
Rajesh Rao,
Mu Wei,
Naveen Valluri,
Cliff Wong,
Andrea Tupini,
Yu Wang,
Matt Mazzola,
Swadheen Shukla,
Lars Liden,
Jianfeng Gao,
Angela Crabtree,
Brian Piening,
Carlo Bifulco,
Matthew P. Lungren,
Tristan Naumann,
Sheng Wang,
Hoifung Poon
Abstract:
Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel d…
▽ More
Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets such as MIMIC-CXR, and spans a diverse range of biomedical image types. PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles. Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing. We conducted extensive experiments and ablation studies on standard biomedical imaging tasks from retrieval to classification to visual question-answering (VQA). BiomedCLIP achieved new state-of-the-art results in a wide range of standard datasets, substantially outperforming prior approaches. Intriguingly, by large-scale pretraining on diverse biomedical image types, BiomedCLIP even outperforms state-of-the-art radiology-specific models such as BioViL in radiology-specific tasks such as RSNA pneumonia detection. In summary, BiomedCLIP is a fully open-access foundation model that achieves state-of-the-art performance on various biomedical tasks, paving the way for transformative multimodal biomedical discovery and applications. We release our models at https://aka.ms/biomedclip to facilitate future research in multimodal biomedical AI.
△ Less
Submitted 8 January, 2025; v1 submitted 1 March, 2023;
originally announced March 2023.
-
ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer
Authors:
Shanshan Li,
Pan Gao,
Xiaoyang Tan,
Mingqiang Wei
Abstract:
Problems such as equipment defects or limited viewpoints will lead the captured point clouds to be incomplete. Therefore, recovering the complete point clouds from the partial ones plays an vital role in many practical tasks, and one of the keys lies in the prediction of the missing part. In this paper, we propose a novel point cloud completion approach namely ProxyFormer that divides point clouds…
▽ More
Problems such as equipment defects or limited viewpoints will lead the captured point clouds to be incomplete. Therefore, recovering the complete point clouds from the partial ones plays an vital role in many practical tasks, and one of the keys lies in the prediction of the missing part. In this paper, we propose a novel point cloud completion approach namely ProxyFormer that divides point clouds into existing (input) and missing (to be predicted) parts and each part communicates information through its proxies. Specifically, we fuse information into point proxy via feature and position extractor, and generate features for missing point proxies from the features of existing point proxies. Then, in order to better perceive the position of missing points, we design a missing part sensitive transformer, which converts random normal distribution into reasonable position information, and uses proxy alignment to refine the missing proxies. It makes the predicted point proxies more sensitive to the features and positions of the missing part, and thus make these proxies more suitable for subsequent coarse-to-fine processes. Experimental results show that our method outperforms state-of-the-art completion networks on several benchmark datasets and has the fastest inference speed. Code is available at https://github.com/I2-Multimedia-Lab/ProxyFormer.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Reconfiguration, Interrupted Aging and Enhanced Dynamics of a Colloidal Gel using Photo-Switchable Active Doping
Authors:
Mengshi Wei,
Matan Ben Zion,
Olivier Dauchot
Abstract:
We study light-activated quasi-2d gels made of a colloidal network doped with Janus particles. Following the gel formation, the internal dynamics of the gel are monitored before, during, and after the light activation. We monitor both the structure and dynamics, before, during and after the illumination period. The mobility of the passive particles exhibits a characteristic scale-dependent respons…
▽ More
We study light-activated quasi-2d gels made of a colloidal network doped with Janus particles. Following the gel formation, the internal dynamics of the gel are monitored before, during, and after the light activation. We monitor both the structure and dynamics, before, during and after the illumination period. The mobility of the passive particles exhibits a characteristic scale-dependent response. Immediately following light activation, the gel displays large-scale reorganization, followed by progressive, short-scale displacements throughout the activation period. Albeit subtle structural changes (including pore opening and widening and shortening of strands) the colloidal network remains connected, and the gel maintains its structural integrity. Once activity is switched off, the gel keeps the memory of the structure inherited from the active phase. Remarkably, the motility remains larger than that of the gel, before the active period. The system has turned into a genuinely different gel, with frozen dynamics, but with more space for thermal fluctuations. The above conclusions remain valid long after the activity period.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
"There's so much responsibility on users right now:" Expert Advice for Staying Safer From Hate and Harassment
Authors:
Miranda Wei,
Sunny Consolvo,
Patrick Gage Kelley,
Tadayoshi Kohno,
Franziska Roesner,
Kurt Thomas
Abstract:
Online hate and harassment poses a threat to the digital safety of people globally. In light of this risk, there is a need to equip as many people as possible with advice to stay safer online. We interviewed 24 experts to understand what threats and advice internet users should prioritize to prevent or mitigate harm. As part of this, we asked experts to evaluate 45 pieces of existing hate-and-hara…
▽ More
Online hate and harassment poses a threat to the digital safety of people globally. In light of this risk, there is a need to equip as many people as possible with advice to stay safer online. We interviewed 24 experts to understand what threats and advice internet users should prioritize to prevent or mitigate harm. As part of this, we asked experts to evaluate 45 pieces of existing hate-and-harassment-specific digital-safety advice to understand why they felt advice was viable or not. We find that experts frequently had competing perspectives for which threats and advice they would prioritize. We synthesize sources of disagreement, while also highlighting the primary threats and advice where experts concurred. Our results inform immediate efforts to protect users from online hate and harassment, as well as more expansive socio-technical efforts to establish enduring safety.
△ Less
Submitted 29 August, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Learning from Stochastic Labels
Authors:
Meng Wei,
Zhongnian Li,
Yong Zhou,
Qiaoyu Guo,
Xinzheng Xu
Abstract:
Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we design a novel labeling mechanism called stochastic label. In this setting, stochastic label includes two cases: 1) identify a correct class label from a small…
▽ More
Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we design a novel labeling mechanism called stochastic label. In this setting, stochastic label includes two cases: 1) identify a correct class label from a small number of randomly given labels; 2) annotate the instance with None label when given labels do not contain correct class label. In this paper, we propose a novel suitable approach to learn from these stochastic labels. We obtain an unbiased estimator that utilizes less supervised information in stochastic labels to train a multi-class classifier. Additionally, it is theoretically justifiable by deriving the estimation error bound of the proposed method. Finally, we conduct extensive experiments on widely-used benchmark datasets to validate the superiority of our method by comparing it with existing state-of-the-art methods.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
PointSmile: Point Self-supervised Learning via Curriculum Mutual Information
Authors:
Xin Li,
Mingqiang Wei,
Songcan Chen
Abstract:
Self-supervised learning is attracting wide attention in point cloud processing. However, it is still not well-solved to gain discriminative and transferable features of point clouds for efficient training on downstream tasks, due to their natural sparsity and irregularity. We propose PointSmile, a reconstruction-free self-supervised learning paradigm by maximizing curriculum mutual information (C…
▽ More
Self-supervised learning is attracting wide attention in point cloud processing. However, it is still not well-solved to gain discriminative and transferable features of point clouds for efficient training on downstream tasks, due to their natural sparsity and irregularity. We propose PointSmile, a reconstruction-free self-supervised learning paradigm by maximizing curriculum mutual information (CMI) across the replicas of point cloud objects. From the perspective of how-and-what-to-learn, PointSmile is designed to imitate human curriculum learning, i.e., starting with an easy curriculum and gradually increasing the difficulty of that curriculum. To solve "how-to-learn", we introduce curriculum data augmentation (CDA) of point clouds. CDA encourages PointSmile to learn from easy samples to hard ones, such that the latent space can be dynamically affected to create better embeddings. To solve "what-to-learn", we propose to maximize both feature- and class-wise CMI, for better extracting discriminative features of point clouds. Unlike most of existing methods, PointSmile does not require a pretext task, nor does it require cross-modal data to yield rich latent representations. We demonstrate the effectiveness and robustness of PointSmile in downstream tasks including object classification and segmentation. Extensive results show that our PointSmile outperforms existing self-supervised methods, and compares favorably with popular fully-supervised methods on various standard architectures.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Quantum Fluctuation of the Quantum Geometric Tensor and its Manifestation as Intrinsic Hall Signatures in Time-Reversal Invariant Systems
Authors:
Miaomiao Wei,
Luyang Wang,
Bin Wang,
Longjun Xiang,
Fuming Xu,
Baigeng Wang,
Jian Wang
Abstract:
In time-reversal invariant systems, all charge Hall effects predicted so far are extrinsic effects due to the dependence on the relaxation time. We explore intrinsic Hall signatures by studying quantum noise spectrum of the Hall current in time-reversal invariant systems, and discover intrinsic thermal Hall noises in both linear and nonlinear regimes. As the band geometric characteristics, quantum…
▽ More
In time-reversal invariant systems, all charge Hall effects predicted so far are extrinsic effects due to the dependence on the relaxation time. We explore intrinsic Hall signatures by studying quantum noise spectrum of the Hall current in time-reversal invariant systems, and discover intrinsic thermal Hall noises in both linear and nonlinear regimes. As the band geometric characteristics, quantum geometric tensor and Berry curvature play critical roles in various Hall effects, so are their quantum fluctuations. It is found that the thermal Hall noise in linear order of the electric field is purely intrinsic, and the second-order thermal Hall noise has both intrinsic and extrinsic contributions. In particular, the intrinsic part of the second-order thermal Hall noise is a manifestation of the quantum fluctuation of quantum geometric tensor, which widely exists as long as Berry curvature is nonzero. These intrinsic thermal Hall noises provide direct measurable means to band geometric information, including Berry curvature related quantities and quantum fluctuation of quantum geometric tensor.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Rethinking Real-world Image Deraining via An Unpaired Degradation-Conditioned Diffusion Model
Authors:
Yiyang Shen,
Mingqiang Wei,
Yongzhen Wang,
Xueyang Fu,
Jing Qin
Abstract:
Recent diffusion models have exhibited great potential in generative modeling tasks. Part of their success can be attributed to the ability of training stable on huge sets of paired synthetic data. However, adapting these models to real-world image deraining remains difficult for two aspects. First, collecting a large-scale paired real-world clean/rainy dataset is unavailable while regular conditi…
▽ More
Recent diffusion models have exhibited great potential in generative modeling tasks. Part of their success can be attributed to the ability of training stable on huge sets of paired synthetic data. However, adapting these models to real-world image deraining remains difficult for two aspects. First, collecting a large-scale paired real-world clean/rainy dataset is unavailable while regular conditional diffusion models heavily rely on paired data for training. Second, real-world rain usually reflects real-world scenarios with a variety of unknown rain degradation types, which poses a significant challenge for the generative modeling process. To meet these challenges, we propose RainDiff, the first real-world image deraining paradigm based on diffusion models, serving as a new standard bar for real-world image deraining. We address the first challenge by introducing a stable and non-adversarial unpaired cycle-consistent architecture that can be trained, end-to-end, with only unpaired data for supervision; and the second challenge by proposing a degradation-conditioned diffusion model that refines the desired output via a diffusive generative process conditioned by learned priors of multiple rain degradations. Extensive experiments confirm the superiority of our RainDiff over existing unpaired/semi-supervised methods and show its competitive advantages over several fully-supervised ones.
△ Less
Submitted 1 May, 2024; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Noise-induced stabilization of dynamical states with broken time-reversal symmetry
Authors:
Trevyn F. Q. Larson,
Lingfei Zhao,
Ethan G. Arnault,
Ming-Tso Wei,
Andrew Seredinski,
Hengming Li,
Kenji Watanabe,
Takashi Tanaguchi,
François Amet,
Gleb Finkelstein
Abstract:
Under a high frequency drive, Josephson junctions demonstrate "Shapiro steps" of quantized voltage. These are dynamically stabilized states, in which the phase across the junction locks to the external drive. We explore the stochastic switching between two symmetric steps at $\frac{\hbarω}{2e}$ and $-\frac{\hbarω}{2e}$. Surprisingly, the switching rate exhibits a pronounced non-monotonicity as a f…
▽ More
Under a high frequency drive, Josephson junctions demonstrate "Shapiro steps" of quantized voltage. These are dynamically stabilized states, in which the phase across the junction locks to the external drive. We explore the stochastic switching between two symmetric steps at $\frac{\hbarω}{2e}$ and $-\frac{\hbarω}{2e}$. Surprisingly, the switching rate exhibits a pronounced non-monotonicity as a function of temperature, violating the general expectation that transitions should become faster with temperature. We explain this behavior by realizing that the system retains memory of the dynamic state from which it is switching, thereby breaking the conventional simplifying assumptions about separations of time scales.
△ Less
Submitted 13 August, 2024; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Ultra-Low-Frequency Radio Astronomy Observations from a Selenocentric Orbit: first results of the Longjiang-2 experiment
Authors:
Jingye Yan,
Ji Wu,
Leonid I. Gurvits,
Lin Wu,
Li Deng,
Fei Zhao,
Li Zhou,
Ailan Lan,
Wenjie Fan,
Min Yi,
Yang Yang,
Zhen Yang,
Mingchuan Wei,
Jinsheng Guo,
Shi Qiu,
Fan Wu,
Chaoran Hu,
Xuelei Chen,
Hanna Rothkaehl,
Marek Morawski
Abstract:
This paper introduces the first results of observations with the Ultra-Long-Wavelength (ULW) -- Low Frequency Interferometer and Spectrometer (LFIS) on board the selenocentric satellite Longjiang-2. We present a brief description of the satellite and focus on the LFIS payload. The in-orbit commissioning confirmed a reliable operational status of the instrumentation. We also present results of a tr…
▽ More
This paper introduces the first results of observations with the Ultra-Long-Wavelength (ULW) -- Low Frequency Interferometer and Spectrometer (LFIS) on board the selenocentric satellite Longjiang-2. We present a brief description of the satellite and focus on the LFIS payload. The in-orbit commissioning confirmed a reliable operational status of the instrumentation. We also present results of a transition observation, which offers unique measurements on several novel aspects. We estimate the RFI suppression required for such a radio astronomy instrumentation at the Moon distances from Earth to be of the order of 80 dB. We analyse a method of separating Earth- and satellite-originated radio frequency interference (RFI). It is found that the RFI level at frequencies lower than a few MHz is smaller than the receiver noise floor.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Envisioning a Human-AI collaborative system to transform policies into decision models
Authors:
Vanessa Lopez,
Gabriele Picco,
Inge Vejsbjerg,
Thanh Lam Hoang,
Yufang Hou,
Marco Luca Sbodio,
John Segrave-Daly,
Denisa Moga,
Sean Swords,
Miao Wei,
Eoin Carroll
Abstract:
Regulations govern many aspects of citizens' daily lives. Governments and businesses routinely automate these in the form of coded rules (e.g., to check a citizen's eligibility for specific benefits). However, the path to automation is long and challenging. To address this, recent global initiatives for digital government, proposing to simultaneously express policy in natural language for human co…
▽ More
Regulations govern many aspects of citizens' daily lives. Governments and businesses routinely automate these in the form of coded rules (e.g., to check a citizen's eligibility for specific benefits). However, the path to automation is long and challenging. To address this, recent global initiatives for digital government, proposing to simultaneously express policy in natural language for human consumption as well as computationally amenable rules or code, are gathering broad public-sector interest. We introduce the problem of semi-automatically building decision models from eligibility policies for social services, and present an initial emerging approach to shorten the route from policy documents to executable, interpretable and standardised decision models using AI, NLP and Knowledge Graphs. Despite the many open domain challenges, in this position paper we explore the enormous potential of AI to assist government agencies and policy experts in scaling the production of both human-readable and machine executable policy rules, while improving transparency, interpretability, traceability and accountability of the decision making.
△ Less
Submitted 1 November, 2022;
originally announced December 2022.
-
Nonlinear eco-evolutionary games with global environmental fluctuations and local environmental feedbacks
Authors:
Yishen Jiang,
Xin Wang,
Longzhao Liu,
Ming Wei,
Jingwu Zhao,
Zhiming Zheng,
Shaoting Tang
Abstract:
Environmental changes play a critical role in determining the evolution of social dilemmas in many natural or social systems. Generally, the environmental changes include two prominent aspects: the global time-dependent fluctuations and the local strategy-dependent feedbacks. However, the impacts of these two types of environmental changes have only been studied separately, a complete picture of t…
▽ More
Environmental changes play a critical role in determining the evolution of social dilemmas in many natural or social systems. Generally, the environmental changes include two prominent aspects: the global time-dependent fluctuations and the local strategy-dependent feedbacks. However, the impacts of these two types of environmental changes have only been studied separately, a complete picture of the environmental effects exerted by the combination of these two aspects remains unclear. Here we develop a theoretical framework that integrates group strategic behaviors with their general dynamic environments, where the global environmental fluctuations are associated with a nonlinear factor in public goods game and the local environmental feedbacks are described by the `eco-evolutionary game'. We show how the coupled dynamics of local game-environment evolution differs in static and dynamic global environments. In particular, we find the emergence of cyclic evolutions of group cooperation and local environment, which forms an interior irregular loop in the phase plane, depending on the relative changing speed of both global and local environments compared to the strategic change. Our results provide important insights toward how diverse evolutionary outcomes could emerge from the nonlinear interactions between strategies and the changing environments.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Predicting Energy Consumption of Ground Robots On Uneven Terrains
Authors:
Minghan Wei,
Volkan Isler
Abstract:
Optimizing energy consumption for robot navigation in fields requires energy-cost maps. However, obtaining such a map is still challenging, especially for large, uneven terrains. Physics-based energy models work for uniform, flat surfaces but do not generalize well to these terrains. Furthermore, slopes make the energy consumption at every location directional and add to the complexity of data col…
▽ More
Optimizing energy consumption for robot navigation in fields requires energy-cost maps. However, obtaining such a map is still challenging, especially for large, uneven terrains. Physics-based energy models work for uniform, flat surfaces but do not generalize well to these terrains. Furthermore, slopes make the energy consumption at every location directional and add to the complexity of data collection and energy prediction. In this paper, we address these challenges in a data-driven manner. We consider a function which takes terrain geometry and robot motion direction as input and outputs expected energy consumption. The function is represented as a ResNet-based neural network whose parameters are learned from field-collected data. The prediction accuracy of our method is within 12% of the ground truth in our test environments that are unseen during training. We compare our method to a baseline method in the literature: a method using a basic physics-based model. We demonstrate that our method significantly outperforms it by more than 10% measured by the prediction error. More importantly, our method generalizes better when applied to test data from new environments with various slope angles and navigation directions.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Data-Driven Key Performance Indicators and Datasets for Building Energy Flexibility: A Review and Perspectives
Authors:
H. Li,
H. Johra,
F. de Andrade Pereira,
T. Hong,
J. Le Dreau,
A. Maturo,
M. Wei,
Y. Liu,
A. Saberi-Derakhtenjani,
Z. Nagy,
A. Marszal-Pomianowska,
D. Finn,
S. Miyata,
K. Kaspar,
K. Nweye,
Z. O Neill,
F. Pallonetto,
B. Dong
Abstract:
Energy flexibility, through short-term demand-side management (DSM) and energy storage technologies, is now seen as a major key to balancing the fluctuating supply in different energy grids with the energy demand of buildings. This is especially important when considering the intermittent nature of ever-growing renewable energy production, as well as the increasing dynamics of electricity demand i…
▽ More
Energy flexibility, through short-term demand-side management (DSM) and energy storage technologies, is now seen as a major key to balancing the fluctuating supply in different energy grids with the energy demand of buildings. This is especially important when considering the intermittent nature of ever-growing renewable energy production, as well as the increasing dynamics of electricity demand in buildings. This paper provides a holistic review of (1) data-driven energy flexibility key performance indicators (KPIs) for buildings in the operational phase and (2) open datasets that can be used for testing energy flexibility KPIs. The review identifies a total of 81 data-driven KPIs from 91 recent publications. These KPIs were categorized and analyzed according to their type, complexity, scope, key stakeholders, data requirement, baseline requirement, resolution, and popularity. Moreover, 330 building datasets were collected and evaluated. Of those, 16 were deemed adequate to feature building performing demand response or building-to-grid (B2G) services. The DSM strategy, building scope, grid type, control strategy, needed data features, and usability of these selected 16 datasets were analyzed. This review reveals future opportunities to address limitations in the existing literature: (1) developing new data-driven methodologies to specifically evaluate different energy flexibility strategies and B2G services of existing buildings; (2) developing baseline-free KPIs that could be calculated from easily accessible building sensors and meter data; (3) devoting non-engineering efforts to promote building energy flexibility, such as designing utility programs, standardizing energy flexibility quantification and verification processes; and (4) curating datasets with proper description for energy flexibility assessments.
△ Less
Submitted 9 May, 2023; v1 submitted 22 November, 2022;
originally announced November 2022.
-
ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object Detection
Authors:
Yiyang Shen,
Rongwei Yu,
Peng Wu,
Haoran Xie,
Lina Gong,
Jing Qin,
Mingqiang Wei
Abstract:
LiDAR and camera, as two different sensors, supply geometric (point clouds) and semantic (RGB images) information of 3D scenes. However, it is still challenging for existing methods to fuse data from the two cross sensors, making them complementary for quality 3D object detection (3OD). We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the mult…
▽ More
LiDAR and camera, as two different sensors, supply geometric (point clouds) and semantic (RGB images) information of 3D scenes. However, it is still challenging for existing methods to fuse data from the two cross sensors, making them complementary for quality 3D object detection (3OD). We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the multi-scale features of camera Images and LiDAR point clouds. ImLiDAR enables to provide the detection head with cross-sensor yet robustly fused features. To achieve this, two core designs exist in ImLiDAR. First, we propose a cross-sensor dynamic message propagation module to combine the best of the multi-scale image and point features. Second, we raise a direct set prediction problem that allows designing an effective set-based detector to tackle the inconsistency of the classification and localization confidences, and the sensitivity of hand-tuned hyperparameters. Besides, the novel set-based detector can be detachable and easily integrated into various detection networks. Comparisons on both the KITTI and SUN-RGBD datasets show clear visual and numerical improvements of our ImLiDAR over twenty-three state-of-the-art 3OD methods.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
HSGNet: Object Re-identification with Hierarchical Similarity Graph Network
Authors:
Fei Shen,
Mengwan Wei,
Junchi Ren
Abstract:
Object re-identification method is made up of backbone network, feature aggregation, and loss function. However, most backbone networks lack a special mechanism to handle rich scale variations and mine discriminative feature representations. In this paper, we firstly design a hierarchical similarity graph module (HSGM) to reduce the conflict of backbone and re-identification networks. The designed…
▽ More
Object re-identification method is made up of backbone network, feature aggregation, and loss function. However, most backbone networks lack a special mechanism to handle rich scale variations and mine discriminative feature representations. In this paper, we firstly design a hierarchical similarity graph module (HSGM) to reduce the conflict of backbone and re-identification networks. The designed HSGM builds a rich hierarchical graph to mine the mapping relationships between global-local and local-local. Secondly, we divide the feature map along with the spatial and channel directions in each hierarchical graph. The HSGM applies the spatial features and channel features extracted from different locations as nodes, respectively, and utilizes the similarity scores between nodes to construct spatial and channel similarity graphs. During the learning process of HSGM, we utilize a learnable parameter to re-optimize the importance of each position, as well as evaluate the correlation between different nodes. Thirdly, we develop a novel hierarchical similarity graph network (HSGNet) by embedding the HSGM in the backbone network. Furthermore, HSGM can be easily embedded into backbone networks of any depth to improve object re-identification ability. Finally, extensive experiments on three large-scale object datasets demonstrate that the proposed HSGNet is superior to state-of-the-art object re-identification approaches.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
Development of a hardened THz energy meter for use on the kilojoule-scale, short-pulse OMEGA EP laser
Authors:
G. Bruhaug,
H. G. Rinderknecht,
Y. E,
M. S. Wei,
R. B. Brannon,
D. Guy,
R. G. Peck,
N. Landis,
G. Brent,
R. Fairbanks,
C. McAtee,
T. Walker,
T. Buczek,
M. Krieger,
M. H. Romanofsky,
C. Mileham,
K. G. Francis,
X. C. Zhang,
G. W. Collins,
J. R. Rygg
Abstract:
A highly adaptable and robust THz energy meter has been designed and implemented to detect energetic THz pulses from high intensity (greater than 1E18 watts per square centimeter) laser plasma interactions on OMEGA EP. THz radiation from the laser driven target is detected by a shielded pyrometer. A second identical pyrometer is used for background subtraction. The detector can be configured to de…
▽ More
A highly adaptable and robust THz energy meter has been designed and implemented to detect energetic THz pulses from high intensity (greater than 1E18 watts per square centimeter) laser plasma interactions on OMEGA EP. THz radiation from the laser driven target is detected by a shielded pyrometer. A second identical pyrometer is used for background subtraction. The detector can be configured to detect THz pulses in the 1 mm to 30 microns (0.3 to 10 THz) range and pulse energies from joules to microjoules via changes in filtration, aperture size and position. Additional polarization selective filtration can also be used to determine THz pulse polarization. The design incorporates significant radiation and EMP shielding to survive and operate within the OMEGA EP radiation environment. We describe the design, operational principle, calibration and testing of the THz energy meter. The pyrometers were calibrated using a benchtop laser and show linear sensitivity up to 1000 nJ of absorbed energy. Initial results from four OMEGA EP THz experiments detected up to 15 microjoules at the detector, which can correspond to 100s of mJ depending on THz emission and reflection models.
△ Less
Submitted 18 January, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
PointSee: Image Enhances Point Cloud
Authors:
Lipeng Gu,
Xuefeng Yan,
Peng Cui,
Lina Gong,
Haoran Xie,
Fu Lee Wang,
Jin Qin,
Mingqiang Wei
Abstract:
There is a trend to fuse multi-modal information for 3D object detection (3OD). However, the challenging problems of low lightweightness, poor flexibility of plug-and-play, and inaccurate alignment of features are still not well-solved, when designing multi-modal fusion newtorks. We propose PointSee, a lightweight, flexible and effective multi-modal fusion solution to facilitate various 3OD networ…
▽ More
There is a trend to fuse multi-modal information for 3D object detection (3OD). However, the challenging problems of low lightweightness, poor flexibility of plug-and-play, and inaccurate alignment of features are still not well-solved, when designing multi-modal fusion newtorks. We propose PointSee, a lightweight, flexible and effective multi-modal fusion solution to facilitate various 3OD networks by semantic feature enhancement of LiDAR point clouds assembled with scene images. Beyond the existing wisdom of 3OD, PointSee consists of a hidden module (HM) and a seen module (SM): HM decorates LiDAR point clouds using 2D image information in an offline fusion manner, leading to minimal or even no adaptations of existing 3OD networks; SM further enriches the LiDAR point clouds by acquiring point-wise representative semantic features, leading to enhanced performance of existing 3OD networks. Besides the new architecture of PointSee, we propose a simple yet efficient training strategy, to ease the potential inaccurate regressions of 2D object detection networks. Extensive experiments on the popular outdoor/indoor benchmarks show numerical improvements of our PointSee over twenty-two state-of-the-arts.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.