-
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Authors:
Sheng Chen,
Peiyu He,
Jiaxin Hu,
Ziyang Liu,
Yansheng Wang,
Tao Xu,
Chi Zhang,
Chongchong Zhang,
Chao An,
Shiyu Cai,
Duo Cao,
Kangping Chen,
Shuai Chu,
Tianwei Chu,
Mingdi Dan,
Min Du,
Weiwei Fang,
Pengyou Fu,
Junkai Hu,
Xiaowei Jiang,
Zhaodi Jiang,
Fuxuan Li,
Jun Li,
Minghui Li,
Mingyao Li
, et al. (46 additional authors not shown)
Abstract:
Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal L…
▽ More
Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal LLM, processes vision and language inputs to perform self and goal localization using a hybrid topological-semantic graph as the global map, and outperforms traditional visual place recognition methods. Astra-Local, a multitask network, handles local path planning and odometry estimation. Its 4D spatial-temporal encoder, trained through self-supervised learning, generates robust 4D features for downstream tasks. The planning head utilizes flow matching and a novel masked ESDF loss to minimize collision risks for generating local trajectories, and the odometry head integrates multi-sensor inputs via a transformer encoder to predict the relative pose of the robot. Deployed on real in-house mobile robots, Astra achieves high end-to-end mission success rate across diverse indoor environments.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Concept-Based Unsupervised Domain Adaptation
Authors:
Xinyue Xu,
Yueying Hu,
Hui Tang,
Yi Qin,
Lu Mi,
Hao Wang,
Xiaomeng Li
Abstract:
Concept Bottleneck Models (CBMs) enhance interpretability by explaining predictions through human-understandable concepts but typically assume that training and test data share the same distribution. This assumption often fails under domain shifts, leading to degraded performance and poor generalization. To address these limitations and improve the robustness of CBMs, we propose the Concept-based…
▽ More
Concept Bottleneck Models (CBMs) enhance interpretability by explaining predictions through human-understandable concepts but typically assume that training and test data share the same distribution. This assumption often fails under domain shifts, leading to degraded performance and poor generalization. To address these limitations and improve the robustness of CBMs, we propose the Concept-based Unsupervised Domain Adaptation (CUDA) framework. CUDA is designed to: (1) align concept representations across domains using adversarial training, (2) introduce a relaxation threshold to allow minor domain-specific differences in concept distributions, thereby preventing performance drop due to over-constraints of these distributions, (3) infer concepts directly in the target domain without requiring labeled concept data, enabling CBMs to adapt to diverse domains, and (4) integrate concept learning into conventional domain adaptation (DA) with theoretical guarantees, improving interpretability and establishing new benchmarks for DA. Experiments demonstrate that our approach significantly outperforms the state-of-the-art CBM and DA methods on real-world datasets.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge
Authors:
Wenhan Mu,
Ling Xu,
Shuren Pei,
Le Mi,
Huichi Zhou
Abstract:
The widespread adoption of code language models in software engineering tasks has exposed vulnerabilities to adversarial attacks, especially the identifier substitution attacks. Although existing identifier substitution attackers demonstrate high success rates, they often produce adversarial examples with unnatural code patterns. In this paper, we systematically assess the quality of adversarial e…
▽ More
The widespread adoption of code language models in software engineering tasks has exposed vulnerabilities to adversarial attacks, especially the identifier substitution attacks. Although existing identifier substitution attackers demonstrate high success rates, they often produce adversarial examples with unnatural code patterns. In this paper, we systematically assess the quality of adversarial examples using LLM-as-a-Judge. Our analysis reveals that over 80% of adversarial examples generated by state-of-the-art identifier substitution attackers (e.g., ALERT) are actually detectable. Based on this insight, we propose EP-Shield, a unified framework for evaluating and purifying identifier substitution attacks via naturalness-aware reasoning. Specifically, we first evaluate the naturalness of code and identify the perturbed adversarial code, then purify it so that the victim model can restore correct prediction. Extensive experiments demonstrate the superiority of EP-Shield over adversarial fine-tuning (up to 83.36% improvement) and its lightweight design 7B parameters) with GPT-4-level performance.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
The Mini-SiTian Array: Imaging Processing Pipeline
Authors:
Kai Xiao,
Zhirui Li,
Yang Huang,
Jie Zheng,
Haibo Yuan,
Junju Du,
Linying Mi,
Hongrui Gu,
Yongkang Sun,
Bowen Zhang,
Shunxuan He,
Henggeng Han,
Min He,
Ruifeng Shi,
Yu Zhang,
Chuanjie Zheng,
Zexi Niu,
Guiting Tian,
Hu Zou,
Yongna Mao,
Hong Wu,
Jifeng Liu
Abstract:
As a pathfinder of the SiTian project, the Mini-SiTian (MST) array, employed three commercial CMOS cameras, represents a next-generation, cost-effective optical time-domain survey project. This paper focuses primarily on the precise data processing pipeline designed for wide-field, CMOS-based devices, including the removal of instrumental effects, astrometry, photometry, and flux calibration. When…
▽ More
As a pathfinder of the SiTian project, the Mini-SiTian (MST) array, employed three commercial CMOS cameras, represents a next-generation, cost-effective optical time-domain survey project. This paper focuses primarily on the precise data processing pipeline designed for wide-field, CMOS-based devices, including the removal of instrumental effects, astrometry, photometry, and flux calibration. When applying this pipeline to approximately 3000 observations taken in the Field 02 (f02) region by MST, the results demonstrate a remarkable astrometric precision of approximately 70--80\,mas (about 0.1\,pixel), an impressive calibration accuracy of approximately 1\,mmag in the MST zero points, and a photometric accuracy of about 4\,mmag for bright stars. Our studies demonstrate that MST CMOS can achieve photometric accuracy comparable to that of CCDs, highlighting the feasibility of large-scale CMOS-based optical time-domain surveys and their potential applications for cost optimization in future large-scale time-domain surveys, like the SiTian project.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Dynamical model-based experiment design for drug repositioning
Authors:
Atte Aalto,
La Mi,
Diego A. Blanco-Mora,
Jorge Goncalves
Abstract:
Computational methods in drug repositioning can help to conserve resources. In particular, methods based on biological networks are showing promise. Considering only the network topology and knowledge on drug target genes is not sufficient for quantitative predictions or predictions involving drug combinations. We propose an iterative procedure alternating between system identification and drug re…
▽ More
Computational methods in drug repositioning can help to conserve resources. In particular, methods based on biological networks are showing promise. Considering only the network topology and knowledge on drug target genes is not sufficient for quantitative predictions or predictions involving drug combinations. We propose an iterative procedure alternating between system identification and drug response experiments. Data from experiments are used to improve the model and drug effect knowledge, which is then used to select drugs for the next experiments. Using simulated data, we show that the procedure can identify nearly optimal drug combinations.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
Authors:
Silin Gao,
Sheryl Mathew,
Li Mi,
Sepideh Mamooler,
Mengjie Zhao,
Hiromi Wakaki,
Yuki Mitsufuji,
Syrielle Montariol,
Antoine Bosselut
Abstract:
Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to the lack of knowledge constraints used for planning the stories. In this work, we propose a new benchmark, VinaBench, to addres…
▽ More
Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to the lack of knowledge constraints used for planning the stories. In this work, we propose a new benchmark, VinaBench, to address this challenge. Our benchmark annotates the underlying commonsense and discourse constraints in visual narrative samples, offering systematic scaffolds for learning the implicit strategies of visual storytelling. Based on the incorporated narrative constraints, we further propose novel metrics to closely evaluate the consistency of generated narrative images and the alignment of generations with the input textual narrative. Our results across three generative vision models demonstrate that learning with VinaBench's knowledge constraints effectively improves the faithfulness and cohesion of generated visual narratives.
△ Less
Submitted 3 April, 2025; v1 submitted 26 March, 2025;
originally announced March 2025.
-
MCM: Multi-layer Concept Map for Efficient Concept Learning from Masked Images
Authors:
Yuwei Sun,
Lu Mi,
Ippei Fujisawa,
Ryota Kanai
Abstract:
Masking strategies commonly employed in natural language processing are still underexplored in vision tasks such as concept learning, where conventional methods typically rely on full images. However, using masked images diversifies perceptual inputs, potentially offering significant advantages in concept learning with large-scale Transformer models. To this end, we propose Multi-layer Concept Map…
▽ More
Masking strategies commonly employed in natural language processing are still underexplored in vision tasks such as concept learning, where conventional methods typically rely on full images. However, using masked images diversifies perceptual inputs, potentially offering significant advantages in concept learning with large-scale Transformer models. To this end, we propose Multi-layer Concept Map (MCM), the first work to devise an efficient concept learning method based on masked images. In particular, we introduce an asymmetric concept learning architecture by establishing correlations between different encoder and decoder layers, updating concept tokens using backward gradients from reconstruction tasks. The learned concept tokens at various levels of granularity help either reconstruct the masked image patches by filling in gaps or guide the reconstruction results in a direction that reflects specific concepts. Moreover, we present both quantitative and qualitative results across a wide range of metrics, demonstrating that MCM significantly reduces computational costs by training on fewer than 75% of the total image patches while enhancing concept prediction performance. Additionally, editing specific concept tokens in the latent space enables targeted image generation from masked images, aligning both the visible contextual patches and the provided concepts. By further adjusting the testing time mask ratio, we could produce a range of reconstructions that blend the visible patches with the provided concepts, proportional to the chosen ratios.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Reasoning-Oriented and Analogy-Based Methods for Locating and Editing in Zero-Shot Event-Relational Reasoning
Authors:
Jingyao Tang,
Lishuang Li,
Liteng Mi,
Haiming Wu,
Hongbin Lu
Abstract:
Zero-shot event-relational reasoning is an important task in natural language processing, and existing methods jointly learn a variety of event-relational prefixes and inference-form prefixes to achieve such tasks. However, training prefixes consumes large computational resources and lacks interpretability. Additionally, learning various relational and inferential knowledge inefficiently exploits…
▽ More
Zero-shot event-relational reasoning is an important task in natural language processing, and existing methods jointly learn a variety of event-relational prefixes and inference-form prefixes to achieve such tasks. However, training prefixes consumes large computational resources and lacks interpretability. Additionally, learning various relational and inferential knowledge inefficiently exploits the connections between tasks. Therefore, we first propose a method for Reasoning-Oriented Locating and Editing (ROLE), which locates and edits the key modules of the language model for reasoning about event relations, enhancing interpretability and also resource-efficiently optimizing the reasoning ability. Subsequently, we propose a method for Analogy-Based Locating and Editing (ABLE), which efficiently exploits the similarities and differences between tasks to optimize the zero-shot reasoning capability. Experimental results show that ROLE improves interpretability and reasoning performance with reduced computational cost. ABLE achieves SOTA results in zero-shot reasoning.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
Active learning of neural population dynamics using two-photon holographic optogenetics
Authors:
Andrew Wagenmaker,
Lu Mi,
Marton Rozsa,
Matthew S. Bull,
Karel Svoboda,
Kayvon Daie,
Matthew D. Golub,
Kevin Jamieson
Abstract:
Recent advances in techniques for monitoring and perturbing neural populations have greatly enhanced our ability to study circuits in the brain. In particular, two-photon holographic optogenetics now enables precise photostimulation of experimenter-specified groups of individual neurons, while simultaneous two-photon calcium imaging enables the measurement of ongoing and induced activity across th…
▽ More
Recent advances in techniques for monitoring and perturbing neural populations have greatly enhanced our ability to study circuits in the brain. In particular, two-photon holographic optogenetics now enables precise photostimulation of experimenter-specified groups of individual neurons, while simultaneous two-photon calcium imaging enables the measurement of ongoing and induced activity across the neural population. Despite the enormous space of potential photostimulation patterns and the time-consuming nature of photostimulation experiments, very little algorithmic work has been done to determine the most effective photostimulation patterns for identifying the neural population dynamics. Here, we develop methods to efficiently select which neurons to stimulate such that the resulting neural responses will best inform a dynamical model of the neural population activity. Using neural population responses to photostimulation in mouse motor cortex, we demonstrate the efficacy of a low-rank linear dynamical systems model, and develop an active learning procedure which takes advantage of low-rank structure to determine informative photostimulation patterns. We demonstrate our approach on both real and synthetic data, obtaining in some cases as much as a two-fold reduction in the amount of data required to reach a given predictive power. Our active stimulation design method is based on a novel active learning procedure for low-rank regression, which may be of independent interest.
△ Less
Submitted 8 May, 2025; v1 submitted 3 December, 2024;
originally announced December 2024.
-
Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering
Authors:
Haoyuan Li,
Chang Xu,
Wen Yang,
Li Mi,
Huai Yu,
Haijian Zhang
Abstract:
Unmanned Aerial Vehicle (UAV) Cross-View Geo-Localization (CVGL) presents significant challenges due to the view discrepancy between oblique UAV images and overhead satellite images. Existing methods heavily rely on the supervision of labeled datasets to extract viewpoint-invariant features for cross-view retrieval. However, these methods have expensive training costs and tend to overfit the regio…
▽ More
Unmanned Aerial Vehicle (UAV) Cross-View Geo-Localization (CVGL) presents significant challenges due to the view discrepancy between oblique UAV images and overhead satellite images. Existing methods heavily rely on the supervision of labeled datasets to extract viewpoint-invariant features for cross-view retrieval. However, these methods have expensive training costs and tend to overfit the region-specific cues, showing limited generalizability to new regions. To overcome this issue, we propose an unsupervised solution that lifts the scene representation to 3d space from UAV observations for satellite image generation, providing robust representation against view distortion. By generating orthogonal images that closely resemble satellite views, our method reduces view discrepancies in feature representation and mitigates shortcuts in region-specific image pairing. To further align the rendered image's perspective with the real one, we design an iterative camera pose updating mechanism that progressively modulates the rendered query image with potential satellite targets, eliminating spatial offsets relative to the reference images. Additionally, this iterative refinement strategy enhances cross-view feature invariance through view-consistent fusion across iterations. As such, our unsupervised paradigm naturally avoids the problem of region-specific overfitting, enabling generic CVGL for UAV images without feature fine-tuning or data-driven training. Experiments on the University-1652 and SUES-200 datasets demonstrate that our approach significantly improves geo-localization accuracy while maintaining robustness across diverse regions. Notably, without model fine-tuning or paired training, our method achieves competitive performance with recent supervised methods.
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
Mining double-line spectroscopic candidates in the LAMOST medium-resolution spectroscopic survey using human-AI hybrid method
Authors:
Shan-shan Li,
Chun-qian Li,
Chang-hua Li,
Dong-wei Fan,
Yun-fei Xu,
Lin-ying Mi,
Chen-zhou Cui,
Jian-rong Shi
Abstract:
We utilize a hybrid approach that integrates the traditional cross-correlation function (CCF) and machine learning to detect spectroscopic multi-systems, specifically focusing on double-line spectroscopic binary (SB2). Based on the ninth data release (DR9) of the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST), which includes a medium-resolution survey (MRS) containing 29,920,58…
▽ More
We utilize a hybrid approach that integrates the traditional cross-correlation function (CCF) and machine learning to detect spectroscopic multi-systems, specifically focusing on double-line spectroscopic binary (SB2). Based on the ninth data release (DR9) of the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST), which includes a medium-resolution survey (MRS) containing 29,920,588 spectra, we identify 27,164 double-line and 3124 triple-line spectra, corresponding to 7096 SB2 candidates and 1903 triple-line spectroscopic binary (SB3) candidates, respectively, representing about 1% of the selection dataset from LAMOST-MRS DR9. Notably, 70.1% of the SB2 candidates and 89.6% of the SB3 candidates are newly identified. Compared to using only the traditional CCF technique, our method significantly improves the efficiency of detecting SB2, saves time on visual inspections by a factor of four.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Empower Vision Applications with LoRA LMM
Authors:
Liang Mi,
Weijun Wang,
Wenming Tu,
Qingfeng He,
Rui Kong,
Xinyu Fang,
Yazhu Dong,
Yikang Zhang,
Yunchun Li,
Meng Li,
Haipeng Dai,
Guihai Chen,
Yunxin Liu
Abstract:
Large Multimodal Models (LMMs) have shown significant progress in various complex vision tasks with the solid linguistic and reasoning capacity inherited from large language models (LMMs). Low-rank adaptation (LoRA) offers a promising method to integrate external knowledge into LMMs, compensating for their limitations on domain-specific tasks. However, the existing LoRA model serving is excessivel…
▽ More
Large Multimodal Models (LMMs) have shown significant progress in various complex vision tasks with the solid linguistic and reasoning capacity inherited from large language models (LMMs). Low-rank adaptation (LoRA) offers a promising method to integrate external knowledge into LMMs, compensating for their limitations on domain-specific tasks. However, the existing LoRA model serving is excessively computationally expensive and causes extremely high latency. In this paper, we present an end-to-end solution that empowers diverse vision tasks and enriches vision applications with LoRA LMMs. Our system, VaLoRA, enables accurate and efficient vision tasks by 1) an accuracy-aware LoRA adapter generation approach that generates LoRA adapters rich in domain-specific knowledge to meet application-specific accuracy requirements, 2) an adaptive-tiling LoRA adapters batching operator that efficiently computes concurrent heterogeneous LoRA adapters, and 3) a flexible LoRA adapter orchestration mechanism that manages application requests and LoRA adapters to achieve the lowest average response latency. We prototype VaLoRA on five popular vision tasks on three LMMs. Experiment results reveal that VaLoRA improves 24-62% of the accuracy compared to the original LMMs and reduces 20-89% of the latency compared to the state-of-the-art LoRA model serving systems.
△ Less
Submitted 3 April, 2025; v1 submitted 1 November, 2024;
originally announced November 2024.
-
Sharp palindromic criterion for semi-uniform dynamical localization
Authors:
Svetlana Jitomirskaya,
Wencai Liu,
Lufang Mi
Abstract:
We develop a sharp palindromic argument for general 1D operators, that proves absence of semi-uniform localization in the regime of exponential symmetry-based resonances. This provides the first examples of operators with dynamical localization but no SULE/SUDL, as well as with nearly uniform distribution of centers of localization in absence of SULE. For the almost Mathieu operators, this also le…
▽ More
We develop a sharp palindromic argument for general 1D operators, that proves absence of semi-uniform localization in the regime of exponential symmetry-based resonances. This provides the first examples of operators with dynamical localization but no SULE/SUDL, as well as with nearly uniform distribution of centers of localization in absence of SULE. For the almost Mathieu operators, this also leads to a sharp arithmetic criterion for semi-uniformity of dynamical localization in the Diophantine case.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Region-based Content Enhancement for Efficient Video Analytics at the Edge
Authors:
Weijun Wang,
Liang Mi,
Shaowei Cen,
Haipeng Dai,
Yuanchun Li,
Xiaoming Fu,
Yunxin Liu
Abstract:
Video analytics is widespread in various applications serving our society. Recent advances of content enhancement in video analytics offer significant benefits for the bandwidth saving and accuracy improvement. However, existing content-enhanced video analytics systems are excessively computationally expensive and provide extremely low throughput. In this paper, we present region-based content enh…
▽ More
Video analytics is widespread in various applications serving our society. Recent advances of content enhancement in video analytics offer significant benefits for the bandwidth saving and accuracy improvement. However, existing content-enhanced video analytics systems are excessively computationally expensive and provide extremely low throughput. In this paper, we present region-based content enhancement, that enhances only the important regions in videos, to improve analytical accuracy. Our system, RegenHance, enables high-accuracy and high-throughput video analytics at the edge by 1) a macroblock-based region importance predictor that identifies the important regions fast and precisely, 2) a region-aware enhancer that stitches sparsely distributed regions into dense tensors and enhances them efficiently, and 3) a profile-based execution planer that allocates appropriate resources for enhancement and analytics components. We prototype RegenHance on five heterogeneous edge devices. Experiments on two analytical tasks reveal that region-based enhancement improves the overall accuracy of 10-19% and achieves 2-3x throughput compared to the state-of-the-art frame-based enhancement methods.
△ Less
Submitted 3 April, 2025; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Knowledge-aware Text-Image Retrieval for Remote Sensing Images
Authors:
Li Mi,
Xianjie Dai,
Javiera Castillo-Navarro,
Devis Tuia
Abstract:
Image-based retrieval in large Earth observation archives is challenging because one needs to navigate across thousands of candidate matches only with the query image as a guide. By using text as information supporting the visual query, the retrieval system gains in usability, but at the same time faces difficulties due to the diversity of visual signals that cannot be summarized by a short captio…
▽ More
Image-based retrieval in large Earth observation archives is challenging because one needs to navigate across thousands of candidate matches only with the query image as a guide. By using text as information supporting the visual query, the retrieval system gains in usability, but at the same time faces difficulties due to the diversity of visual signals that cannot be summarized by a short caption only. For this reason, as a matching-based task, cross-modal text-image retrieval often suffers from information asymmetry between texts and images. To address this challenge, we propose a Knowledge-aware Text-Image Retrieval (KTIR) method for remote sensing images. By mining relevant information from an external knowledge graph, KTIR enriches the text scope available in the search query and alleviates the information gaps between texts and images for better matching. Moreover, by integrating domain-specific knowledge, KTIR also enhances the adaptation of pre-trained vision-language models to remote sensing applications. Experimental results on three commonly used remote sensing text-image retrieval benchmarks show that the proposed knowledge-aware method leads to varied and consistent retrievals, outperforming state-of-the-art retrieval methods.
△ Less
Submitted 25 October, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
ConGeo: Robust Cross-view Geo-localization across Ground View Variations
Authors:
Li Mi,
Chang Xu,
Javiera Castillo-Navarro,
Syrielle Montariol,
Wen Yang,
Antoine Bosselut,
Devis Tuia
Abstract:
Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view. In real-world scenarios, the task requires accommodating diverse ground images captured by users with varying orientations and reduced field of views (FoVs). However, existing learning pipelines are orientation-specific or FoV-specific, demanding separate model…
▽ More
Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view. In real-world scenarios, the task requires accommodating diverse ground images captured by users with varying orientations and reduced field of views (FoVs). However, existing learning pipelines are orientation-specific or FoV-specific, demanding separate model training for different ground view variations. Such models heavily depend on the North-aligned spatial correspondence and predefined FoVs in the training data, compromising their robustness across different settings. To tackle this challenge, we propose ConGeo, a single- and cross-view Contrastive method for Geo-localization: it enhances robustness and consistency in feature representations to improve a model's invariance to orientation and its resilience to FoV variations, by enforcing proximity between ground view variations of the same location. As a generic learning objective for cross-view geo-localization, when integrated into state-of-the-art pipelines, ConGeo significantly boosts the performance of three base models on four geo-localization benchmarks for diverse ground view variations and outperforms competing methods that train separate models for each ground view variation.
△ Less
Submitted 4 September, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
ConVQG: Contrastive Visual Question Generation with Multimodal Guidance
Authors:
Li Mi,
Syrielle Montariol,
Javiera Castillo-Navarro,
Xianjie Dai,
Antoine Bosselut,
Devis Tuia
Abstract:
Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of Visual Question Generation (VQG) systems. Apart from being grounded to the image, existing VQG systems can use textual constraints, such as expected answers or knowledge triplets, to generate focused questions. These constraints allow VQG systems to…
▽ More
Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of Visual Question Generation (VQG) systems. Apart from being grounded to the image, existing VQG systems can use textual constraints, such as expected answers or knowledge triplets, to generate focused questions. These constraints allow VQG systems to specify the question content or leverage external commonsense knowledge that can not be obtained from the image content only. However, generating focused questions using textual constraints while enforcing a high relevance to the image content remains a challenge, as VQG systems often ignore one or both forms of grounding. In this work, we propose Contrastive Visual Question Generation (ConVQG), a method using a dual contrastive objective to discriminate questions generated using both modalities from those based on a single one. Experiments on both knowledge-aware and standard VQG benchmarks demonstrate that ConVQG outperforms the state-of-the-art methods and generates image-grounded, text-guided, and knowledge-rich questions. Our human evaluation results also show preference for ConVQG questions compared to non-contrastive baselines.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations
Authors:
Xinyue Xu,
Yi Qin,
Lu Mi,
Hao Wang,
Xiaomeng Li
Abstract:
Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correct…
▽ More
Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not help correct highly correlated concepts (e.g., "yellow belly"), leading to suboptimal final accuracy; (2) they cannot naturally quantify the complex conditional dependencies between different concepts and class labels (e.g., for an image with the class label "Kentucky Warbler" and a concept "black bill", what is the probability that the model correctly predicts another concept "black crown"), therefore failing to provide deeper insight into how a black-box model works. In response to these limitations, we propose Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural networks to define the joint energy of candidate (input, concept, class) tuples. With such a unified interface, prediction, concept correction, and conditional dependency quantification are then represented as conditional probabilities, which are generated by composing different energy functions. Our ECBMs address both limitations of existing CBMs, providing higher accuracy and richer concept interpretations. Empirical results show that our approach outperforms the state-of-the-art on real-world datasets.
△ Less
Submitted 30 December, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
BiSwift: Bandwidth Orchestrator for Multi-Stream Video Analytics on Edge
Authors:
Lin Sun,
Weijun Wang,
Tingting Yuan,
Liang Mi,
Haipeng Dai,
Yunxin Liu,
Xiaoming Fu
Abstract:
High-definition (HD) cameras for surveillance and road traffic have experienced tremendous growth, demanding intensive computation resources for real-time analytics. Recently, offloading frames from the front-end device to the back-end edge server has shown great promise. In multi-stream competitive environments, efficient bandwidth management and proper scheduling are crucial to ensure both high…
▽ More
High-definition (HD) cameras for surveillance and road traffic have experienced tremendous growth, demanding intensive computation resources for real-time analytics. Recently, offloading frames from the front-end device to the back-end edge server has shown great promise. In multi-stream competitive environments, efficient bandwidth management and proper scheduling are crucial to ensure both high inference accuracy and high throughput. To achieve this goal, we propose BiSwift, a bi-level framework that scales the concurrent real-time video analytics by a novel adaptive hybrid codec integrated with multi-level pipelines, and a global bandwidth controller for multiple video streams. The lower-level front-back-end collaborative mechanism (called adaptive hybrid codec) locally optimizes the accuracy and accelerates end-to-end video analytics for a single stream. The upper-level scheduler aims to accuracy fairness among multiple streams via the global bandwidth controller. The evaluation of BiSwift shows that BiSwift is able to real-time object detection on 9 streams with an edge device only equipped with an NVIDIA RTX3070 (8G) GPU. BiSwift improves 10%$\sim$21% accuracy and presents 1.2$\sim$9$\times$ throughput compared with the state-of-the-art video analytics pipelines.
△ Less
Submitted 4 February, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
Attention for Causal Relationship Discovery from Biological Neural Dynamics
Authors:
Ziyu Lu,
Anika Tabassum,
Shruti Kulkarni,
Lu Mi,
J. Nathan Kutz,
Eric Shea-Brown,
Seung-Hwan Lim
Abstract:
This paper explores the potential of the transformer models for learning Granger causality in networks with complex nonlinear dynamics at every node, as in neurobiological and biophysical networks. Our study primarily focuses on a proof-of-concept investigation based on simulated neural dynamics, for which the ground-truth causality is known through the underlying connectivity matrix. For transfor…
▽ More
This paper explores the potential of the transformer models for learning Granger causality in networks with complex nonlinear dynamics at every node, as in neurobiological and biophysical networks. Our study primarily focuses on a proof-of-concept investigation based on simulated neural dynamics, for which the ground-truth causality is known through the underlying connectivity matrix. For transformer models trained to forecast neuronal population dynamics, we show that the cross attention module effectively captures the causal relationship among neurons, with an accuracy equal or superior to that for the most popular Granger causality analysis method. While we acknowledge that real-world neurobiology data will bring further challenges, including dynamic connectivity and unobserved variability, this research offers an encouraging preliminary glimpse into the utility of the transformer model for causal representation learning in neuroscience.
△ Less
Submitted 23 November, 2023; v1 submitted 12 November, 2023;
originally announced November 2023.
-
Learning Time-Invariant Representations for Individual Neurons from Population Dynamics
Authors:
Lu Mi,
Trung Le,
Tianxing He,
Eli Shlizerman,
Uygar Sümbül
Abstract:
Neurons can display highly variable dynamics. While such variability presumably supports the wide range of behaviors generated by the organism, their gene expressions are relatively stable in the adult brain. This suggests that neuronal activity is a combination of its time-invariant identity and the inputs the neuron receives from the rest of the circuit. Here, we propose a self-supervised learni…
▽ More
Neurons can display highly variable dynamics. While such variability presumably supports the wide range of behaviors generated by the organism, their gene expressions are relatively stable in the adult brain. This suggests that neuronal activity is a combination of its time-invariant identity and the inputs the neuron receives from the rest of the circuit. Here, we propose a self-supervised learning based method to assign time-invariant representations to individual neurons based on permutation-, and population size-invariant summary of population recordings. We fit dynamical models to neuronal activity to learn a representation by considering the activity of both the individual and the neighboring population. Our self-supervised approach and use of implicit representations enable robust inference against imperfections such as partial overlap of neurons across sessions, trial-to-trial variability, and limited availability of molecular (transcriptomic) labels for downstream supervised tasks. We demonstrate our method on a public multimodal dataset of mouse cortical neuronal activity and transcriptomic labels. We report > 35% improvement in predicting the transcriptomic subclass identity and > 20% improvement in predicting class identity with respect to the state-of-the-art.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Astronomical Knowledge Entity Extraction in Astrophysics Journal Articles via Large Language Models
Authors:
Wujun Shao,
Pengli Ji,
Dongwei Fan,
Yaohua Hu,
Xiaoran Yan,
Chenzhou Cui,
Linying Mi,
Lang Chen,
Rui Zhang
Abstract:
Astronomical knowledge entities, such as celestial object identifiers, are crucial for literature retrieval and knowledge graph construction, and other research and applications in the field of astronomy. Traditional methods of extracting knowledge entities from texts face challenges like high manual effort, poor generalization, and costly maintenance. Consequently, there is a pressing need for im…
▽ More
Astronomical knowledge entities, such as celestial object identifiers, are crucial for literature retrieval and knowledge graph construction, and other research and applications in the field of astronomy. Traditional methods of extracting knowledge entities from texts face challenges like high manual effort, poor generalization, and costly maintenance. Consequently, there is a pressing need for improved methods to efficiently extract them. This study explores the potential of pre-trained Large Language Models (LLMs) to perform astronomical knowledge entity extraction (KEE) task from astrophysical journal articles using prompts. We propose a prompting strategy called Prompt-KEE, which includes five prompt elements, and design eight combination prompts based on them. Celestial object identifier and telescope name, two most typical astronomical knowledge entities, are selected to be experimental object. And we introduce four currently representative LLMs, namely Llama-2-70B, GPT-3.5, GPT-4, and Claude 2. To accommodate their token limitations, we construct two datasets: the full texts and paragraph collections of 30 articles. Leveraging the eight prompts, we test on full texts with GPT-4 and Claude 2, on paragraph collections with all LLMs. The experimental results demonstrated that pre-trained LLMs have the significant potential to perform KEE tasks in astrophysics journal articles, but there are differences in their performance. Furthermore, we analyze some important factors that influence the performance of LLMs in entity extraction and provide insights for future KEE tasks in astrophysical articles using LLMs.
△ Less
Submitted 17 January, 2024; v1 submitted 27 October, 2023;
originally announced October 2023.
-
LatticeGen: A Cooperative Framework which Hides Generated Text in a Lattice for Privacy-Aware Generation on Cloud
Authors:
Mengke Zhang,
Tianxing He,
Tianle Wang,
Lu Mi,
Fatemehsadat Mireshghallah,
Binyi Chen,
Hao Wang,
Yulia Tsvetkov
Abstract:
In the current user-server interaction paradigm of prompted generation with large language models (LLM) on cloud, the server fully controls the generation process, which leaves zero options for users who want to keep the generated text to themselves. We propose LatticeGen, a cooperative framework in which the server still handles most of the computation while the user controls the sampling operati…
▽ More
In the current user-server interaction paradigm of prompted generation with large language models (LLM) on cloud, the server fully controls the generation process, which leaves zero options for users who want to keep the generated text to themselves. We propose LatticeGen, a cooperative framework in which the server still handles most of the computation while the user controls the sampling operation. The key idea is that the true generated sequence is mixed with noise tokens by the user and hidden in a noised lattice. Considering potential attacks from a hypothetically malicious server and how the user can defend against it, we propose the repeated beam-search attack and the mixing noise scheme. In our experiments we apply LatticeGen to protect both prompt and generation. It is shown that while the noised lattice degrades generation quality, LatticeGen successfully protects the true generation to a remarkable degree under strong attacks (more than 50% of the semantic remains hidden as measured by BERTScore).
△ Less
Submitted 5 April, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
X-Ray2EM: Uncertainty-Aware Cross-Modality Image Reconstruction from X-Ray to Electron Microscopy in Connectomics
Authors:
Yicong Li,
Yaron Meirovitch,
Aaron T. Kuan,
Jasper S. Phelps,
Alexandra Pacureanu,
Wei-Chung Allen Lee,
Nir Shavit,
Lu Mi
Abstract:
Comprehensive, synapse-resolution imaging of the brain will be crucial for understanding neuronal computations and function. In connectomics, this has been the sole purview of volume electron microscopy (EM), which entails an excruciatingly difficult process because it requires cutting tissue into many thin, fragile slices that then need to be imaged, aligned, and reconstructed. Unlike EM, hard X-…
▽ More
Comprehensive, synapse-resolution imaging of the brain will be crucial for understanding neuronal computations and function. In connectomics, this has been the sole purview of volume electron microscopy (EM), which entails an excruciatingly difficult process because it requires cutting tissue into many thin, fragile slices that then need to be imaged, aligned, and reconstructed. Unlike EM, hard X-ray imaging is compatible with thick tissues, eliminating the need for thin sectioning, and delivering fast acquisition, intrinsic alignment, and isotropic resolution. Unfortunately, current state-of-the-art X-ray microscopy provides much lower resolution, to the extent that segmenting membranes is very challenging. We propose an uncertainty-aware 3D reconstruction model that translates X-ray images to EM-like images with enhanced membrane segmentation quality, showing its potential for developing simpler, faster, and more accurate X-ray based connectomics pipelines.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
The XPRESS Challenge: Xray Projectomic Reconstruction -- Extracting Segmentation with Skeletons
Authors:
Tri Nguyen,
Mukul Narwani,
Mark Larson,
Yicong Li,
Shuhan Xie,
Hanspeter Pfister,
Donglai Wei,
Nir Shavit,
Lu Mi,
Alexandra Pacureanu,
Wei-Chung Lee,
Aaron T. Kuan
Abstract:
The wiring and connectivity of neurons form a structural basis for the function of the nervous system. Advances in volume electron microscopy (EM) and image segmentation have enabled mapping of circuit diagrams (connectomics) within local regions of the mouse brain. However, applying volume EM over the whole brain is not currently feasible due to technological challenges. As a result, comprehensiv…
▽ More
The wiring and connectivity of neurons form a structural basis for the function of the nervous system. Advances in volume electron microscopy (EM) and image segmentation have enabled mapping of circuit diagrams (connectomics) within local regions of the mouse brain. However, applying volume EM over the whole brain is not currently feasible due to technological challenges. As a result, comprehensive maps of long-range connections between brain regions are lacking. Recently, we demonstrated that X-ray holographic nanotomography (XNH) can provide high-resolution images of brain tissue at a much larger scale than EM. In particular, XNH is wellsuited to resolve large, myelinated axon tracts (white matter) that make up the bulk of long-range connections (projections) and are critical for inter-region communication. Thus, XNH provides an imaging solution for brain-wide projectomics. However, because XNH data is typically collected at lower resolutions and larger fields-of-view than EM, accurate segmentation of XNH images remains an important challenge that we present here. In this task, we provide volumetric XNH images of cortical white matter axons from the mouse brain along with ground truth annotations for axon trajectories. Manual voxel-wise annotation of ground truth is a time-consuming bottleneck for training segmentation networks. On the other hand, skeleton-based ground truth is much faster to annotate, and sufficient to determine connectivity. Therefore, we encourage participants to develop methods to leverage skeleton-based training. To this end, we provide two types of ground-truth annotations: a small volume of voxel-wise annotations and a larger volume with skeleton-based annotations. Entries will be evaluated on how accurately the submitted segmentations agree with the ground-truth skeleton annotations.
△ Less
Submitted 24 February, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
AccDecoder: Accelerated Decoding for Neural-enhanced Video Analytics
Authors:
Tingting Yuan,
Liang Mi,
Weijun Wang,
Haipeng Dai,
Xiaoming Fu
Abstract:
The quality of the video stream is key to neural network-based video analytics. However, low-quality video is inevitably collected by existing surveillance systems because of poor quality cameras or over-compressed/pruned video streaming protocols, e.g., as a result of upstream bandwidth limit. To address this issue, existing studies use quality enhancers (e.g., neural super-resolution) to improve…
▽ More
The quality of the video stream is key to neural network-based video analytics. However, low-quality video is inevitably collected by existing surveillance systems because of poor quality cameras or over-compressed/pruned video streaming protocols, e.g., as a result of upstream bandwidth limit. To address this issue, existing studies use quality enhancers (e.g., neural super-resolution) to improve the quality of videos (e.g., resolution) and eventually ensure inference accuracy. Nevertheless, directly applying quality enhancers does not work in practice because it will introduce unacceptable latency. In this paper, we present AccDecoder, a novel accelerated decoder for real-time and neural-enhanced video analytics. AccDecoder can select a few frames adaptively via Deep Reinforcement Learning (DRL) to enhance the quality by neural super-resolution and then up-scale the unselected frames that reference them, which leads to 6-21% accuracy improvement. AccDecoder provides efficient inference capability via filtering important frames using DRL for DNN-based inference and reusing the results for the other frames via extracting the reference relationship among frames and blocks, which results in a latency reduction of 20-80% than baselines.
△ Less
Submitted 24 January, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Asymptotically stable polarization of multi-agent gradient flows over manifolds
Authors:
La Mi,
Jorge Gonçalves,
Johan Markdahl
Abstract:
Multi-agent systems are known to exhibit stable emergent behaviors, including polarization, over $\mathbb{R}^n$ or highly symmetric nonlinear spaces. In this article, we eschew linearity and symmetry of the underlying spaces, and study the stability of polarized equilibria of multi-agent gradient flows evolving on general hypermanifolds. The agents attract or repel each other according to the part…
▽ More
Multi-agent systems are known to exhibit stable emergent behaviors, including polarization, over $\mathbb{R}^n$ or highly symmetric nonlinear spaces. In this article, we eschew linearity and symmetry of the underlying spaces, and study the stability of polarized equilibria of multi-agent gradient flows evolving on general hypermanifolds. The agents attract or repel each other according to the partition of the communication graph that is connected but otherwise arbitrary. The manifolds are outfitted with geometric features styled ``dimples'' and ``pimples'' that characterize the absence of flatness. The signs of inter-agent couplings together with these geometric features give rise to stable polarization under various sufficient conditions. We propose tangible interpretation of the system in the context of opinion dynamics, and highlight throughout the text its versatility in modeling various aspects of the polarization phenomenon.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Photometric redshift estimation of galaxies in the DESI Legacy Imaging Surveys
Authors:
Changhua Li,
Yanxia Zhang,
Chenzhou Cui,
Dongwei Fan,
Yongheng Zhao,
Xue-Bing Wu,
Jing-Yi Zhang,
Yihan Tao,
Jun Han,
Yunfei Xu,
Shanshan Li,
Linying Mi,
Boliang He,
Zihan Kang,
Youfen Wang,
Hanxi Yang,
Sisi Yang
Abstract:
The accurate estimation of photometric redshifts plays a crucial role in accomplishing science objectives of the large survey projects. The template-fitting and machine learning are the two main types of methods applied currently. Based on the training set obtained by cross-correlating the DESI Legacy Imaging Surveys DR9 galaxy catalogue and SDSS DR16 galaxy catalogue, the two kinds of methods are…
▽ More
The accurate estimation of photometric redshifts plays a crucial role in accomplishing science objectives of the large survey projects. The template-fitting and machine learning are the two main types of methods applied currently. Based on the training set obtained by cross-correlating the DESI Legacy Imaging Surveys DR9 galaxy catalogue and SDSS DR16 galaxy catalogue, the two kinds of methods are used and optimized, such as EAZY for template-fitting approach and CATBOOST for machine learning. Then the created models are tested by the cross-matched samples of the DESI Legacy Imaging SurveysDR9 galaxy catalogue with LAMOST DR7, GAMA DR3 and WiggleZ galaxy catalogues. Moreover three machine learning methods (CATBOOST, Multi-Layer Perceptron and Random Forest) are compared, CATBOOST shows its superiority for our case. By feature selection and optimization of model parameters, CATBOOST can obtain higher accuracy with optical and infrared photometric information, the best performance ($MSE=0.0032$, $σ_{NMAD}=0.0156$ and $O=0.88$ per cent) with $g \le 24.0$, $r \le 23.4$ and $z \le 22.5$ is achieved. But EAZY can provide more accurate photometric redshift estimation for high redshift galaxies, especially beyond the redhisft range of training sample. Finally, we finish the redshift estimation of all DESI DR9 galaxies with CATBOOST and EAZY, which will contribute to the further study of galaxies and their properties.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
im2nerf: Image to Neural Radiance Field in the Wild
Authors:
Lu Mi,
Abhijit Kundu,
David Ross,
Frank Dellaert,
Noah Snavely,
Alireza Fathi
Abstract:
We propose im2nerf, a learning framework that predicts a continuous neural object representation given a single input image in the wild, supervised by only segmentation output from off-the-shelf recognition methods. The standard approach to constructing neural radiance fields takes advantage of multi-view consistency and requires many calibrated views of a scene, a requirement that cannot be satis…
▽ More
We propose im2nerf, a learning framework that predicts a continuous neural object representation given a single input image in the wild, supervised by only segmentation output from off-the-shelf recognition methods. The standard approach to constructing neural radiance fields takes advantage of multi-view consistency and requires many calibrated views of a scene, a requirement that cannot be satisfied when learning on large-scale image data in the wild. We take a step towards addressing this shortcoming by introducing a model that encodes the input image into a disentangled object representation that contains a code for object shape, a code for object appearance, and an estimated camera pose from which the object image is captured. Our model conditions a NeRF on the predicted object representation and uses volume rendering to generate images from novel views. We train the model end-to-end on a large collection of input images. As the model is only provided with single-view images, the problem is highly under-constrained. Therefore, in addition to using a reconstruction loss on the synthesized input view, we use an auxiliary adversarial loss on the novel rendered views. Furthermore, we leverage object symmetry and cycle camera pose consistency. We conduct extensive quantitative and qualitative experiments on the ShapeNet dataset as well as qualitative experiments on Open Images dataset. We show that in all cases, im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Subgraph Frequency Distribution Estimation using Graph Neural Networks
Authors:
Zhongren Chen,
Xinyue Xu,
Shengyi Jiang,
Hao Wang,
Lu Mi
Abstract:
Small subgraphs (graphlets) are important features to describe fundamental units of a large network. The calculation of the subgraph frequency distributions has a wide application in multiple domains including biology and engineering. Unfortunately due to the inherent complexity of this task, most of the existing methods are computationally intensive and inefficient. In this work, we propose GNNS,…
▽ More
Small subgraphs (graphlets) are important features to describe fundamental units of a large network. The calculation of the subgraph frequency distributions has a wide application in multiple domains including biology and engineering. Unfortunately due to the inherent complexity of this task, most of the existing methods are computationally intensive and inefficient. In this work, we propose GNNS, a novel representational learning framework that utilizes graph neural networks to sample subgraphs efficiently for estimating their frequency distribution. Our framework includes an inference model and a generative model that learns hierarchical embeddings of nodes, subgraphs, and graph types. With the learned model and embeddings, subgraphs are sampled in a highly scalable and parallel way and the frequency distribution estimation is then performed based on these sampled subgraphs. Eventually, our methods achieve comparable accuracy and a significant speedup by three orders of magnitude compared to existing methods.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Revisiting Latent-Space Interpolation via a Quantitative Evaluation Framework
Authors:
Lu Mi,
Tianxing He,
Core Francisco Park,
Hao Wang,
Yue Wang,
Nir Shavit
Abstract:
Latent-space interpolation is commonly used to demonstrate the generalization ability of deep latent variable models. Various algorithms have been proposed to calculate the best trajectory between two encodings in the latent space. In this work, we show how data labeled with semantically continuous attributes can be utilized to conduct a quantitative evaluation of latent-space interpolation algori…
▽ More
Latent-space interpolation is commonly used to demonstrate the generalization ability of deep latent variable models. Various algorithms have been proposed to calculate the best trajectory between two encodings in the latent space. In this work, we show how data labeled with semantically continuous attributes can be utilized to conduct a quantitative evaluation of latent-space interpolation algorithms, for variational autoencoders. Our framework can be used to complement the standard qualitative comparison, and also enables evaluation for domains (such as graph) in which the visualization is difficult. Interestingly, our experiments reveal that the superiority of interpolation algorithms could be domain-dependent. While normalised interpolation works best for the image domain, spherical linear interpolation achieves the best performance in the graph domain. Next, we propose a simple-yet-effective method to restrict the latent space via a bottleneck structure in the encoder. We find that all interpolation algorithms evaluated in this work can benefit from this restriction. Finally, we conduct interpolation-aware training with the labeled attributes, and show that this explicit supervision can improve the interpolation performance.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Predicate correlation learning for scene graph generation
Authors:
Leitian Tao,
Li Mi,
Nannan Li,
Xianhang Cheng,
Yaosi Hu,
Zhenzhong Chen
Abstract:
For a typical Scene Graph Generation (SGG) method, there is often a large gap in the performance of the predicates' head classes and tail classes. This phenomenon is mainly caused by the semantic overlap between different predicates as well as the long-tailed data distribution. In this paper, a Predicate Correlation Learning (PCL) method for SGG is proposed to address the above two problems by tak…
▽ More
For a typical Scene Graph Generation (SGG) method, there is often a large gap in the performance of the predicates' head classes and tail classes. This phenomenon is mainly caused by the semantic overlap between different predicates as well as the long-tailed data distribution. In this paper, a Predicate Correlation Learning (PCL) method for SGG is proposed to address the above two problems by taking the correlation between predicates into consideration. To describe the semantic overlap between strong-correlated predicate classes, a Predicate Correlation Matrix (PCM) is defined to quantify the relationship between predicate pairs, which is dynamically updated to remove the matrix's long-tailed bias. In addition, PCM is integrated into a Predicate Correlation Loss function ($L_{PC}$) to reduce discouraging gradients of unannotated classes. The proposed method is evaluated on Visual Genome benchmark, where the performance of the tail classes is significantly improved when built on the existing methods.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Visual Relationship Forecasting in Videos
Authors:
Li Mi,
Yangjun Ou,
Zhenzhong Chen
Abstract:
Real-world scenarios often require the anticipation of object interactions in unknown future, which would assist the decision-making process of both humans and agents. To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. Specifically, given a subject-object pair with H existing f…
▽ More
Real-world scenarios often require the anticipation of object interactions in unknown future, which would assist the decision-making process of both humans and agents. To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. Specifically, given a subject-object pair with H existing frames, VRF aims to predict their future interactions for the next T frames without visual evidence. To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video. These two datasets densely annotate 13 and 35 visual relationships in 1923 and 13447 video clips, respectively. In addition, we present a novel Graph Convolutional Transformer (GCT) framework, which captures both object-level and frame-level dependencies by spatio-temporal Graph Convolution Network and Transformer. Experimental results on both VRF-AG and VRF-VidOR datasets demonstrate that GCT outperforms the state-of-the-art sequence modelling methods on visual relationship forecasting.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
Authors:
Lu Mi,
Hang Zhao,
Charlie Nash,
Xiaohan Jin,
Jiyang Gao,
Chen Sun,
Cordelia Schmid,
Nir Shavit,
Yuning Chai,
Dragomir Anguelov
Abstract:
High Definition (HD) maps are maps with precise definitions of road lanes with rich semantics of the traffic rules. They are critical for several key stages in an autonomous driving system, including motion forecasting and planning. However, there are only a small amount of real-world road topologies and geometries, which significantly limits our ability to test out the self-driving stack to gener…
▽ More
High Definition (HD) maps are maps with precise definitions of road lanes with rich semantics of the traffic rules. They are critical for several key stages in an autonomous driving system, including motion forecasting and planning. However, there are only a small amount of real-world road topologies and geometries, which significantly limits our ability to test out the self-driving stack to generalize onto new unseen scenarios. To address this issue, we introduce a new challenging task to generate HD maps. In this work, we explore several autoregressive models using different data representations, including sequence, plain graph, and hierarchical graph. We propose HDMapGen, a hierarchical graph generation model capable of producing high-quality and diverse HD maps through a coarse-to-fine approach. Experiments on the Argoverse dataset and an in-house dataset show that HDMapGen significantly outperforms baseline methods. Additionally, we demonstrate that HDMapGen achieves high scalability and efficiency.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Identification of BASS DR3 Sources as Stars, Galaxies and Quasars by XGBoost
Authors:
Changhua Li,
Yanxia Zhang,
Chenzhou Cui,
Dongwei Fan,
Yongheng Zhao,
Xue-Bing Wu,
Boliang He,
Yunfei Xu,
Shanshan Li,
Jun Han,
Yihan Tao,
Linying Mi,
Hanxi Yang,
Sisi Yang
Abstract:
The Beijing-Arizona Sky Survey (BASS) Data Release 3 (DR3) catalogue was released in 2019, which contains the data from all BASS and the Mosaic z-band Legacy Survey (MzLS) observations during 2015 January and 2019 March, about 200 million sources. We cross-match BASS DR3 with spectral databases from the Sloan Digital Sky Survey (SDSS) and the Large Sky Area Multi-object Fiber Spectroscopic Telesco…
▽ More
The Beijing-Arizona Sky Survey (BASS) Data Release 3 (DR3) catalogue was released in 2019, which contains the data from all BASS and the Mosaic z-band Legacy Survey (MzLS) observations during 2015 January and 2019 March, about 200 million sources. We cross-match BASS DR3 with spectral databases from the Sloan Digital Sky Survey (SDSS) and the Large Sky Area Multi-object Fiber Spectroscopic Telescope (LAMOST) to obtain the spectroscopic classes of known samples. Then, the samples are cross-matched with ALLWISE database. Based on optical and infrared information of the samples, we use the XGBoost algorithm to construct different classifiers, including binary classification and multiclass classification. The accuracy of these classifiers with the best input pattern is larger than 90.0 per cent. Finally, all selected sources in the BASS DR3 catalogue are classified by these classifiers. The classification label and probabilities for individual sources are assigned by different classifiers. When the predicted results by binary classification are the same as multiclass classification with optical and infrared information, the number of star, galaxy and quasar candidates is separately 12 375 838 (P_S>0.95), 18 606 073 (P_G>0.95) and 798 928 (P_Q>0.95). For these sources without infrared information, the predicted results can be as a reference. Those candidates may be taken as input catalogue of LAMOST, DESI or other projects for follow up observation. The classified result will be of great help and reference for future research of the BASS DR3 sources.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Optical manipulation of Rashba-split 2-Dimensional Electron Gas
Authors:
M. Michiardi,
F. Boschini,
H. -H. Kung,
M. X. Na,
S. K. Y. Dufresne,
A. Currie,
G. Levy,
S. Zhdanovich,
A. K. Mills,
D. J. Jones,
J. L. Mi,
B. B. Iversen,
Ph. Hofmann,
A. Damascelli
Abstract:
In spintronic devices, the two main approaches to actively control the electrons' spin degree of freedom involve either static magnetic or electric fields. An alternative avenue relies on the application of optical fields to generate spin currents, which promises to bolster spin-device performance allowing for significantly faster and more efficient spin logic. To date, research has mainly focused…
▽ More
In spintronic devices, the two main approaches to actively control the electrons' spin degree of freedom involve either static magnetic or electric fields. An alternative avenue relies on the application of optical fields to generate spin currents, which promises to bolster spin-device performance allowing for significantly faster and more efficient spin logic. To date, research has mainly focused on the optical injection of spin currents through the photogalvanic effect, and little is known about the direct optical control of the intrinsic spin splitting. Here, to explore the all-optical manipulation of a material's spin properties, we consider the Rashba effect at a semiconductor interface. The Rashba effect has long been a staple in the field of spintronics owing to its superior tunability, which allows the observation of fully spin-dependent phenomena, such as the spin-Hall effect, spin-charge conversion, and spin-torque in semiconductor devices. In this work, by means of time and angle-resolved photoemission spectroscopy (TR-ARPES), we demonstrate that an ultrafast optical excitation can be used to manipulate the Rashba-induced spin splitting of a two-dimensional electron gas (2DEG) engineered at the surface of the topological insulator Bi$_{2}$Se$_{3}$. We establish that light-induced photovoltage and charge carrier redistribution -- which in concert modulate the spin-orbit coupling strength on a sub-picosecond timescale -- can offer an unprecedented platform for achieving all optically-driven THz spin logic devices.
△ Less
Submitted 2 June, 2022; v1 submitted 19 May, 2021;
originally announced May 2021.
-
Learning Guided Electron Microscopy with Active Acquisition
Authors:
Lu Mi,
Hao Wang,
Yaron Meirovitch,
Richard Schalek,
Srinivas C. Turaga,
Jeff W. Lichtman,
Aravinthan D. T. Samuel,
Nir Shavit
Abstract:
Single-beam scanning electron microscopes (SEM) are widely used to acquire massive data sets for biomedical study, material analysis, and fabrication inspection. Datasets are typically acquired with uniform acquisition: applying the electron beam with the same power and duration to all image pixels, even if there is great variety in the pixels' importance for eventual use. Many SEMs are now able t…
▽ More
Single-beam scanning electron microscopes (SEM) are widely used to acquire massive data sets for biomedical study, material analysis, and fabrication inspection. Datasets are typically acquired with uniform acquisition: applying the electron beam with the same power and duration to all image pixels, even if there is great variety in the pixels' importance for eventual use. Many SEMs are now able to move the beam to any pixel in the field of view without delay, enabling them, in principle, to invest their time budget more effectively with non-uniform imaging.
In this paper, we show how to use deep learning to accelerate and optimize single-beam SEM acquisition of images. Our algorithm rapidly collects an information-lossy image (e.g. low resolution) and then applies a novel learning method to identify a small subset of pixels to be collected at higher resolution based on a trade-off between the saliency and spatial diversity. We demonstrate the efficacy of this novel technique for active acquisition by speeding up the task of collecting connectomic datasets for neurobiology by up to an order of magnitude.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
GWOPS: A VO-technology Driven Tool to Search for the Electromagnetic Counterpart of Gravitational Wave Event
Authors:
Yunfei Xu,
Dong Xu,
Chenzhou Cui,
Dongwei Fan,
Zipei Zhu,
Bangyao Yu,
Changhua Li,
Jun Han,
Linying Mi,
Shanshan Li,
Boliang He,
Yihan Tao,
Hanxi Yang,
Sisi Yang
Abstract:
The search and follow-up observation of electromagnetic (EM) counterparts of gravitational waves (GW) is a current hot topic of GW cosmology. Due to the limitation of the accuracy of the GW observation facility at this stage, we can only get a rough sky-localization region for the GW event, and the typical area of the region is between 200 and 1500 square degrees. Since GW events occur in or near…
▽ More
The search and follow-up observation of electromagnetic (EM) counterparts of gravitational waves (GW) is a current hot topic of GW cosmology. Due to the limitation of the accuracy of the GW observation facility at this stage, we can only get a rough sky-localization region for the GW event, and the typical area of the region is between 200 and 1500 square degrees. Since GW events occur in or near galaxies, limiting the observation target to galaxies can significantly speedup searching for EM counterparts. Therefore, how to efficiently select host galaxy candidates in such a large GW localization region, how to arrange the observation sequence, and how to efficiently identify the GW source from observational data are the problems that need to be solved. International Virtual Observatory Alliance has developed a series of technical standards for data retrieval, interoperability and visualization. Based on the application of VO technologies, we construct the GW follow-up Observation Planning System (GWOPS). It consists of three parts: a pipeline to select host candidates of GW and sort their priorities for follow-up observation, an identification module to find the transient from follow-up observation data, and a visualization module to display GW-related data. GWOPS can rapidly respond to GW events. With GWOPS, the operations such as follow-up observation planning, data storage, data visualization, and transient identification can be efficiently coordinated, which will promote the success searching rate for GWs EM counterparts.
△ Less
Submitted 9 September, 2020; v1 submitted 7 September, 2020;
originally announced September 2020.
-
Towards an Astronomical Science Platform: Experiences and Lessons Learned from Chinese Virtual Observatory
Authors:
Chenzhou Cui,
Yihan Tao,
Changhua Li,
Dongwei Fan,
Jian Xiao,
Boliang He,
Shanshan Li,
Ce Yu,
Linying Mi,
Yunfei Xu,
Jun Han,
Sisi Yang,
Yongheng Zhao,
Yanjie Xue,
Jinxin Hao,
Liang Liu,
Xiao Chen,
Junyi Chen,
Hailong Zhang
Abstract:
In the era of big data astronomy, next generation telescopes and large sky surveys produce data sets at the TB or even PB level. Due to their large data volumes, these astronomical data sets are extremely difficult to transfer and analyze using personal computers or small clusters. In order to offer better access to data, data centers now generally provide online science platforms that enable anal…
▽ More
In the era of big data astronomy, next generation telescopes and large sky surveys produce data sets at the TB or even PB level. Due to their large data volumes, these astronomical data sets are extremely difficult to transfer and analyze using personal computers or small clusters. In order to offer better access to data, data centers now generally provide online science platforms that enable analysis close to the data. The Chinese Virtual Observatory (China-VO) is one of the member projects in the International Virtual Observatory Alliance and it is dedicated to providing a research and education environment where globally distributed astronomy archives are simple to find, access, and interoperate. In this study, we summarize highlights of the work conducted at the China-VO, as well the experiences and lessons learned during the full life-cycle management of astronomical data. Finally, We discuss the challenges and future trends for astronomical science platforms.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
IVOA HiPS Implementation in the Framework of WorldWide Telescope
Authors:
Yunfei Xu,
Chenzhou Cui,
Dongwei Fan,
Shanshan Li,
Changhua Li,
Jun Han,
Linying Mi,
Boliang He,
Hanxi Yang,
Yihan Tao,
Sisi Yang,
Lan He
Abstract:
The WorldWide Telescope(WWT) is a scientific visualization platform which can browse deep space images, star catalogs, and planetary remote sensing data from different observation facilities in a three-dimensional virtual scene. First launched and then open-sourced by Microsoft Research, the WWT is now managed by the American Astronomical Society (AAS). Hierarchical Progressive Survey (HiPS) is an…
▽ More
The WorldWide Telescope(WWT) is a scientific visualization platform which can browse deep space images, star catalogs, and planetary remote sensing data from different observation facilities in a three-dimensional virtual scene. First launched and then open-sourced by Microsoft Research, the WWT is now managed by the American Astronomical Society (AAS). Hierarchical Progressive Survey (HiPS) is an astronomical data release scheme proposed by Centre de Données astronomiques de Strasbourg (CDS) and has been accepted as a recommendation by International Virtual Observatory Alliance (IVOA). The HiPS solution has been adopted widely by many astronomical institutions for data release. Since WWT selected Hierarchical Triangular Mesh (HTM) as the standard for data visualization in the early stage of development, data released by HiPS cannot be visualized in WWT, which significantly limits the application of WWT. This paper introduces the implementation method for HiPS dataset visualization in WWT, and introduces HiPS data projection, mesh rendering, and data index implementation in WWT. Taking Chang'E-2 lunar probe data as an example, this paper introduces how to convert planetary remote sensing data into a HiPS dataset and integrate it into WWT. This paper also compares the efficiency and memory consumption of WWT loading its native data and HiPS data, and illustrates the application of HiPS in scientific data visualization and science education in WWT.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
Variational Wasserstein Barycenters for Geometric Clustering
Authors:
Liang Mi
Abstract:
We propose to compute Wasserstein barycenters (WBs) by solving for Monge maps with variational principle. We discuss the metric properties of WBs and explore their connections, especially the connections of Monge WBs, to K-means clustering and co-clustering. We also discuss the feasibility of Monge WBs on unbalanced measures and spherical domains. We propose two new problems -- regularized K-means…
▽ More
We propose to compute Wasserstein barycenters (WBs) by solving for Monge maps with variational principle. We discuss the metric properties of WBs and explore their connections, especially the connections of Monge WBs, to K-means clustering and co-clustering. We also discuss the feasibility of Monge WBs on unbalanced measures and spherical domains. We propose two new problems -- regularized K-means and Wasserstein barycenter compression. We demonstrate the use of VWBs in solving these clustering-related problems.
△ Less
Submitted 29 March, 2023; v1 submitted 24 February, 2020;
originally announced February 2020.
-
A Family of Pairwise Multi-Marginal Optimal Transports that Define a Generalized Metric
Authors:
Liang Mi,
Azadeh Sheikholeslami,
José Bento
Abstract:
The Optimal transport (OT) problem is rapidly finding its way into machine learning. Favoring its use are its metric properties. Many problems admit solutions with guarantees only for objects embedded in metric spaces, and the use of non-metrics can complicate solving them. Multi-marginal OT (MMOT) generalizes OT to simultaneously transporting multiple distributions. It captures important relation…
▽ More
The Optimal transport (OT) problem is rapidly finding its way into machine learning. Favoring its use are its metric properties. Many problems admit solutions with guarantees only for objects embedded in metric spaces, and the use of non-metrics can complicate solving them. Multi-marginal OT (MMOT) generalizes OT to simultaneously transporting multiple distributions. It captures important relations that are missed if the transport only involves two distributions. Research on MMOT, however, has been focused on its existence, uniqueness, practical algorithms, and the choice of cost functions. There is a lack of discussion on the metric properties of MMOT, which limits its theoretical and practical use. Here, we prove new generalized metric properties for a family of pairwise MMOTs. We first explain the difficulty of proving this via two negative results. Afterward, we prove the MMOTs' metric properties. Finally, we show that the generalized triangle inequality of this family of MMOTs cannot be improved. We illustrate the superiority of our MMOTs over other generalized metrics, and over non-metrics in both synthetic and real tasks.
△ Less
Submitted 22 December, 2022; v1 submitted 29 January, 2020;
originally announced January 2020.
-
Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate
Authors:
Lu Mi,
Hao Wang,
Yonglong Tian,
Hao He,
Nir Shavit
Abstract:
Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e.g., formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able…
▽ More
Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e.g., formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able to take an arbitrary model off the shelf and generate uncertainty estimation without retraining or redesigning it. To address this gap, we perform a systematic exploration into training-free uncertainty estimation for dense regression, an unrecognized yet important problem, and provide a theoretical construction justifying such estimations. We propose three simple and scalable methods to analyze the variance of outputs from a trained network under tolerable perturbations: infer-transformation, infer-noise, and infer-dropout. They operate solely during the inference, without the need to re-train, re-design, or fine-tune the models, as typically required by state-of-the-art uncertainty estimation methods. Surprisingly, even without involving such perturbations in training, our methods produce comparable or even better uncertainty estimation when compared to training-required state-of-the-art methods.
△ Less
Submitted 10 January, 2022; v1 submitted 27 September, 2019;
originally announced October 2019.
-
Hand-Gesture-Recognition Based Text Input Method for AR/VR Wearable Devices
Authors:
Nizamuddin Maitlo,
Yanbo Wang,
Chao Ping Chen,
Lantian Mi,
Wenbo Zhang
Abstract:
Static and dynamic hand movements are basic way for human-machine interactions. To recognize and classify these movements, first these movements are captured by the cameras mounted on the augmented reality (AR) or virtual reality (VR) wearable devices. The hand is segmented using segmentation method and its gestures are passed to hand gesture recognition algorithm, which depends on depth-wise sepa…
▽ More
Static and dynamic hand movements are basic way for human-machine interactions. To recognize and classify these movements, first these movements are captured by the cameras mounted on the augmented reality (AR) or virtual reality (VR) wearable devices. The hand is segmented using segmentation method and its gestures are passed to hand gesture recognition algorithm, which depends on depth-wise separable convolutional neural network for training, testing and finally running smoothly on mobile AR/VR devices, while maintaining the accuracy and balancing the load. A number of gestures are processed for identification of right gesture and to classify the gesture and ignore the all intermittent gestures. With proposed method, a user can write letters and numbers in air by just moving his/her hand in air. Gesture based operations are performed, and trajectory of hand is recorded as handwritten text. Finally, that handwritten text is processed for the text recognition.
△ Less
Submitted 2 April, 2020; v1 submitted 28 July, 2019;
originally announced July 2019.
-
On the existence of full dimensional KAM torus for nonlinear Schrödinger equation
Authors:
Hongzi Cong,
Lufang Mi,
Yunfeng Shi,
Yuan Wu
Abstract:
In this paper, we study the following nonlinear Schrödinger equation \begin{eqnarray}\label{maineq0} \textbf{i}u_{t}-u_{xx}+V*u+εf(x)|u|^4u=0,\ x\in\mathbb{T}=\mathbb{R}/2π\mathbb{Z}, \end{eqnarray} where $V*$ is the Fourier multiplier defined by $\widehat{(V* u})_n=V_{n}\widehat{u}_n, V_n\in[-1,1]$ and $f(x)$ is Gevrey smooth. It is shown that for $0\leq|ε|\ll1$, there is some…
▽ More
In this paper, we study the following nonlinear Schrödinger equation \begin{eqnarray}\label{maineq0} \textbf{i}u_{t}-u_{xx}+V*u+εf(x)|u|^4u=0,\ x\in\mathbb{T}=\mathbb{R}/2π\mathbb{Z}, \end{eqnarray} where $V*$ is the Fourier multiplier defined by $\widehat{(V* u})_n=V_{n}\widehat{u}_n, V_n\in[-1,1]$ and $f(x)$ is Gevrey smooth. It is shown that for $0\leq|ε|\ll1$, there is some $(V_n)_{n\in\mathbb{Z}}$ such that, the equation admits a time almost periodic solution (i.e., full dimensional KAM torus) in the Gevrey space. This extends results of Bourgain \cite{BJFA2005} and Cong-Liu-Shi-Yuan \cite{CLSY} to the case that the nonlinear perturbation depends explicitly on the space variable $x$. The main difficulty here is the absence of zero momentum of the equation.
△ Less
Submitted 28 February, 2019;
originally announced March 2019.
-
A Probe Towards Understanding GAN and VAE Models
Authors:
Lu Mi,
Macheng Shen,
Jingzhao Zhang
Abstract:
This project report compares some known GAN and VAE models proposed prior to 2017. There has been significant progress after we finished this report. We upload this report as an introduction to generative models and provide some personal interpretations supported by empirical evidence. Both generative adversarial network models and variational autoencoders have been widely used to approximate prob…
▽ More
This project report compares some known GAN and VAE models proposed prior to 2017. There has been significant progress after we finished this report. We upload this report as an introduction to generative models and provide some personal interpretations supported by empirical evidence. Both generative adversarial network models and variational autoencoders have been widely used to approximate probability distributions of data sets. Although they both use parametrized distributions to approximate the underlying data distribution, whose exact inference is intractable, their behaviors are very different. We summarize our experiment results that compare these two categories of models in terms of fidelity and mode collapse. We provide a hypothesis to explain their different behaviors and propose a new model based on this hypothesis. We further tested our proposed model on MNIST dataset and CelebA dataset.
△ Less
Submitted 17 December, 2018; v1 submitted 13 December, 2018;
originally announced December 2018.
-
Cross-Classification Clustering: An Efficient Multi-Object Tracking Technique for 3-D Instance Segmentation in Connectomics
Authors:
Yaron Meirovitch,
Lu Mi,
Hayk Saribekyan,
Alexander Matveev,
David Rolnick,
Nir Shavit
Abstract:
Pixel-accurate tracking of objects is a key element in many computer vision applications, often solved by iterated individual object tracking or instance segmentation followed by object matching. Here we introduce cross-classification clustering (3C), a technique that simultaneously tracks complex, interrelated objects in an image stack. The key idea in cross-classification is to efficiently turn…
▽ More
Pixel-accurate tracking of objects is a key element in many computer vision applications, often solved by iterated individual object tracking or instance segmentation followed by object matching. Here we introduce cross-classification clustering (3C), a technique that simultaneously tracks complex, interrelated objects in an image stack. The key idea in cross-classification is to efficiently turn a clustering problem into a classification problem by running a logarithmic number of independent classifications per image, letting the cross-labeling of these classifications uniquely classify each pixel to the object labels. We apply the 3C mechanism to achieve state-of-the-art accuracy in connectomics -- the nanoscale mapping of neural tissue from electron microscopy volumes. Our reconstruction system increases scalability by an order of magnitude over existing single-object tracking methods (such as flood-filling networks). This scalability is important for the deployment of connectomics pipelines, since currently the best performing techniques require computing infrastructures that are beyond the reach of most laboratories. Our algorithm may offer benefits in other domains that require pixel-accurate tracking of multiple objects, such as segmentation of videos and medical imagery.
△ Less
Submitted 15 June, 2019; v1 submitted 3 December, 2018;
originally announced December 2018.
-
Regularized Wasserstein Means for Aligning Distributional Data
Authors:
Liang Mi,
Wen Zhang,
Yalin Wang
Abstract:
We propose to align distributional data from the perspective of Wasserstein means. We raise the problem of regularizing Wasserstein means and propose several terms tailored to tackle different problems. Our formulation is based on the variational transportation to distribute a sparse discrete measure into the target domain. The resulting sparse representation well captures the desired property of…
▽ More
We propose to align distributional data from the perspective of Wasserstein means. We raise the problem of regularizing Wasserstein means and propose several terms tailored to tackle different problems. Our formulation is based on the variational transportation to distribute a sparse discrete measure into the target domain. The resulting sparse representation well captures the desired property of the domain while reducing the mapping cost. We demonstrate the scalability and robustness of our method with examples in domain adaptation, point set registration, and skeleton layout.
△ Less
Submitted 20 February, 2020; v1 submitted 2 December, 2018;
originally announced December 2018.
-
Variational Wasserstein Clustering
Authors:
Liang Mi,
Wen Zhang,
Xianfeng Gu,
Yalin Wang
Abstract:
We propose a new clustering method based on optimal transportation. We solve optimal transportation with variational principles, and investigate the use of power diagrams as transportation plans for aggregating arbitrary domains into a fixed number of clusters. We iteratively drive centroids through target domains while maintaining the minimum clustering energy by adjusting the power diagrams. Thu…
▽ More
We propose a new clustering method based on optimal transportation. We solve optimal transportation with variational principles, and investigate the use of power diagrams as transportation plans for aggregating arbitrary domains into a fixed number of clusters. We iteratively drive centroids through target domains while maintaining the minimum clustering energy by adjusting the power diagrams. Thus, we simultaneously pursue clustering and the Wasserstein distances between the centroids and the target domains, resulting in a measure-preserving mapping. We demonstrate the use of our method in domain adaptation, remeshing, and representation learning on synthetic and real data.
△ Less
Submitted 26 July, 2018; v1 submitted 23 June, 2018;
originally announced June 2018.
-
A hybrid architecture for astronomical computing
Authors:
Changhua Li,
Chenzhou Cui,
Boliang He,
Dongwei Fan,
Linying Mi,
Shanshan Li,
Sisi Yang,
Yunfei Xu,
Jun Han,
Junyi Chen,
Hailong Zhang,
Ce Yu,
Jian Xiao,
Chuanjun Wang,
Zihuang Cao,
Yufeng Fan,
Liang Liu,
Xiao Chen,
Wenming Song,
Kangyu Du
Abstract:
With many large science equipment constructing and putting into use, astronomy has stepped into the big data era. The new method and infrastructure of big data processing has become a new requirement of many astronomers. Cloud computing, Map/Reduce, Hadoop, Spark, etc. many new technology has sprung up in recent years. Comparing to the high performance computing(HPC), Data is the center of these n…
▽ More
With many large science equipment constructing and putting into use, astronomy has stepped into the big data era. The new method and infrastructure of big data processing has become a new requirement of many astronomers. Cloud computing, Map/Reduce, Hadoop, Spark, etc. many new technology has sprung up in recent years. Comparing to the high performance computing(HPC), Data is the center of these new technology. So, a new computing architecture infrastructure is necessary, which can be shared by both HPC and big data processing. Based on Astronomy Cloud project of Chinese Virtual Observatory (China-VO), we have made much efforts to optimize the designation of the hybrid computing platform. which include the hardware architecture, cluster management, Job and Resource scheduling.
△ Less
Submitted 18 January, 2018;
originally announced January 2018.