-
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer
Authors:
Qi Cai,
Jingwen Chen,
Yang Chen,
Yehao Li,
Fuchen Long,
Yingwei Pan,
Zhaofan Qiu,
Yiheng Zhang,
Fengbin Gao,
Peihan Xu,
Yimeng Wang,
Kai Yu,
Wenxuan Chen,
Ziwei Feng,
Zijian Gong,
Jianzhuang Pan,
Yi Peng,
Rui Tian,
Siyu Wang,
Bo Zhao,
Ting Yao,
Tao Mei
Abstract:
Recent advancements in image generative foundation models have prioritized quality improvements but often at the cost of increased computational complexity and inference latency. To address this critical trade-off, we introduce HiDream-I1, a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. HiDream-I1 is co…
▽ More
Recent advancements in image generative foundation models have prioritized quality improvements but often at the cost of increased computational complexity and inference latency. To address this critical trade-off, we introduce HiDream-I1, a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. HiDream-I1 is constructed with a new sparse Diffusion Transformer (DiT) structure. Specifically, it starts with a dual-stream decoupled design of sparse DiT with dynamic Mixture-of-Experts (MoE) architecture, in which two separate encoders are first involved to independently process image and text tokens. Then, a single-stream sparse DiT structure with dynamic MoE architecture is adopted to trigger multi-model interaction for image generation in a cost-efficient manner. To support flexiable accessibility with varied model capabilities, we provide HiDream-I1 in three variants: HiDream-I1-Full, HiDream-I1-Dev, and HiDream-I1-Fast.
Furthermore, we go beyond the typical text-to-image generation and remould HiDream-I1 with additional image conditions to perform precise, instruction-based editing on given images, yielding a new instruction-based image editing model namely HiDream-E1. Ultimately, by integrating text-to-image generation and instruction-based image editing, HiDream-I1 evolves to form a comprehensive image agent (HiDream-A1) capable of fully interactive image creation and refinement. To accelerate multi-modal AIGC research, we have open-sourced all the codes and model weights of HiDream-I1-Full, HiDream-I1-Dev, HiDream-I1-Fast, HiDream-E1 through our project websites: https://github.com/HiDream-ai/HiDream-I1 and https://github.com/HiDream-ai/HiDream-E1. All features can be directly experienced via https://vivago.ai/studio.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
MotionPro: A Precise Motion Controller for Image-to-Video Generation
Authors:
Zhongwei Zhang,
Fuchen Long,
Zhaofan Qiu,
Yingwei Pan,
Wu Liu,
Ting Yao,
Tao Mei
Abstract:
Animating images with interactive motion control has garnered popularity for image-to-video (I2V) generation. Modern approaches typically rely on large Gaussian kernels to extend motion trajectories as condition without explicitly defining movement region, leading to coarse motion control and failing to disentangle object and camera moving. To alleviate these, we present MotionPro, a precise motio…
▽ More
Animating images with interactive motion control has garnered popularity for image-to-video (I2V) generation. Modern approaches typically rely on large Gaussian kernels to extend motion trajectories as condition without explicitly defining movement region, leading to coarse motion control and failing to disentangle object and camera moving. To alleviate these, we present MotionPro, a precise motion controller that novelly leverages region-wise trajectory and motion mask to regulate fine-grained motion synthesis and identify target motion category (i.e., object or camera moving), respectively. Technically, MotionPro first estimates the flow maps on each training video via a tracking model, and then samples the region-wise trajectories to simulate inference scenario. Instead of extending flow through large Gaussian kernels, our region-wise trajectory approach enables more precise control by directly utilizing trajectories within local regions, thereby effectively characterizing fine-grained movements. A motion mask is simultaneously derived from the predicted flow maps to capture the holistic motion dynamics of the movement regions. To pursue natural motion control, MotionPro further strengthens video denoising by incorporating both region-wise trajectories and motion mask through feature modulation. More remarkably, we meticulously construct a benchmark, i.e., MC-Bench, with 1.1K user-annotated image-trajectory pairs, for the evaluation of both fine-grained and object-level I2V motion control. Extensive experiments conducted on WebVid-10M and MC-Bench demonstrate the effectiveness of MotionPro. Please refer to our project page for more results: https://zhw-zhang.github.io/MotionPro-page/.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
VerifyThisBench: Generating Code, Specifications, and Proofs All at Once
Authors:
Xun Deng,
Sicheng Zhong,
Andreas Veneris,
Fan Long,
Xujie Si
Abstract:
Large language models (LLMs) have demonstrated remarkable progress in code generation, but many existing benchmarks are approaching saturation and offer little guarantee on the trustworthiness of the generated programs, offering limited insight into deeper reasoning capabilities. We introduce VerifyThisBench, a new benchmark designed to evaluate LLMs on end-to-end program verification tasks that r…
▽ More
Large language models (LLMs) have demonstrated remarkable progress in code generation, but many existing benchmarks are approaching saturation and offer little guarantee on the trustworthiness of the generated programs, offering limited insight into deeper reasoning capabilities. We introduce VerifyThisBench, a new benchmark designed to evaluate LLMs on end-to-end program verification tasks that require interpreting natural language problem descriptions, formulating formal specifications, generating code, and constructing correctness proofs. Our evaluation reveals that even state-of-the-art (SOTA) models, such as o3-mini, achieve a pass rate of less than 4%, with many outputs failing to compile. To reduce task complexity, we further propose VerifyThisBenchXS, a variant in which partial implementations or proofs are provided. We systematically assess SOTA models on both benchmarks, uncovering key strengths and limitations in their formal reasoning and verification capabilities.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Secure Smart Contract with Control Flow Integrity
Authors:
Zhiyang Chen,
Sidi Mohamed Beillahi,
Pasha Barahimi,
Cyrus Minwalla,
Han Du,
Andreas Veneris,
Fan Long
Abstract:
Smart contracts power decentralized financial (DeFi) services but are vulnerable to complex security exploits that can lead to significant financial losses. Existing security measures often fail to adequately protect these contracts due to the composability of DeFi protocols and the increasing sophistication of attacks. Through a large-scale empirical study of historical transactions from the 30 h…
▽ More
Smart contracts power decentralized financial (DeFi) services but are vulnerable to complex security exploits that can lead to significant financial losses. Existing security measures often fail to adequately protect these contracts due to the composability of DeFi protocols and the increasing sophistication of attacks. Through a large-scale empirical study of historical transactions from the 30 hacked DeFi protocols, we discovered that while benign transactions typically exhibit a limited number of unique control flows, in stark contrast, attack transactions consistently introduce novel, previously unobserved control flows. Building on these insights, we developed CrossGuard, a novel framework that enforces control flow integrity in real-time to secure smart contracts. Crucially, CrossGuard does not require prior knowledge of specific hacks; instead, it dynamically enforces control flow whitelisting policies and applies simplification heuristics at runtime. This approach monitors and prevents potential attacks by reverting all transactions that do not adhere to the established control flow whitelisting rules. Our evaluation demonstrates that CrossGuard effectively blocks 28 of the 30 analyzed attacks when configured only once prior to contract deployment, maintaining a low false positive rate of 0.28% and minimal additional gas costs. These results underscore the efficacy of applying control flow integrity to smart contracts, significantly enhancing security beyond traditional methods and addressing the evolving threat landscape in the DeFi ecosystem.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
Authors:
Jingyuan Chen,
Fuchen Long,
Jie An,
Zhaofan Qiu,
Ting Yao,
Jiebo Luo,
Tao Mei
Abstract:
The first-in-first-out (FIFO) video diffusion, built on a pre-trained text-to-video model, has recently emerged as an effective approach for tuning-free long video generation. This technique maintains a queue of video frames with progressively increasing noise, continuously producing clean frames at the queue's head while Gaussian noise is enqueued at the tail. However, FIFO-Diffusion often strugg…
▽ More
The first-in-first-out (FIFO) video diffusion, built on a pre-trained text-to-video model, has recently emerged as an effective approach for tuning-free long video generation. This technique maintains a queue of video frames with progressively increasing noise, continuously producing clean frames at the queue's head while Gaussian noise is enqueued at the tail. However, FIFO-Diffusion often struggles to keep long-range temporal consistency in the generated videos due to the lack of correspondence modeling across frames. In this paper, we propose Ouroboros-Diffusion, a novel video denoising framework designed to enhance structural and content (subject) consistency, enabling the generation of consistent videos of arbitrary length. Specifically, we introduce a new latent sampling technique at the queue tail to improve structural consistency, ensuring perceptually smooth transitions among frames. To enhance subject consistency, we devise a Subject-Aware Cross-Frame Attention (SACFA) mechanism, which aligns subjects across frames within short segments to achieve better visual coherence. Furthermore, we introduce self-recurrent guidance. This technique leverages information from all previous cleaner frames at the front of the queue to guide the denoising of noisier frames at the end, fostering rich and contextual global information interaction. Extensive experiments of long video generation on the VBench benchmark demonstrate the superiority of our Ouroboros-Diffusion, particularly in terms of subject consistency, motion smoothness, and temporal consistency.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
SemiDDM-Weather: A Semi-supervised Learning Framework for All-in-one Adverse Weather Removal
Authors:
Fang Long,
Wenkang Su,
Zixuan Li,
Lei Cai,
Mingjie Li,
Yuan-Gen Wang,
Xiaochun Cao
Abstract:
Adverse weather removal aims to restore clear vision under adverse weather conditions. Existing methods are mostly tailored for specific weather types and rely heavily on extensive labeled data. In dealing with these two limitations, this paper presents a pioneering semi-supervised all-in-one adverse weather removal framework built on the teacher-student network with a Denoising Diffusion Model (D…
▽ More
Adverse weather removal aims to restore clear vision under adverse weather conditions. Existing methods are mostly tailored for specific weather types and rely heavily on extensive labeled data. In dealing with these two limitations, this paper presents a pioneering semi-supervised all-in-one adverse weather removal framework built on the teacher-student network with a Denoising Diffusion Model (DDM) as the backbone, termed SemiDDM-Weather. As for the design of DDM backbone in our SemiDDM-Weather, we adopt the SOTA Wavelet Diffusion Model-Wavediff with customized inputs and loss functions, devoted to facilitating the learning of many-to-one mapping distributions for efficient all-in-one adverse weather removal with limited label data. To mitigate the risk of misleading model training due to potentially inaccurate pseudo-labels generated by the teacher network in semi-supervised learning, we introduce quality assessment and content consistency constraints to screen the "optimal" outputs from the teacher network as the pseudo-labels, thus more effectively guiding the student network training with unlabeled data. Experimental results show that on both synthetic and real-world datasets, our SemiDDM-Weather consistently delivers high visual quality and superior adverse weather removal, even when compared to fully supervised competitors. Our code and pre-trained model are available at this repository.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification
Authors:
Fu Xing Long,
Moritz Frenzel,
Peter Krause,
Markus Gitterle,
Thomas Bäck,
Niki van Stein
Abstract:
In landscape-aware algorithm selection problem, the effectiveness of feature-based predictive models strongly depends on the representativeness of training data for practical applications. In this work, we investigate the potential of randomly generated functions (RGF) for the model training, which cover a much more diverse set of optimization problem classes compared to the widely-used black-box…
▽ More
In landscape-aware algorithm selection problem, the effectiveness of feature-based predictive models strongly depends on the representativeness of training data for practical applications. In this work, we investigate the potential of randomly generated functions (RGF) for the model training, which cover a much more diverse set of optimization problem classes compared to the widely-used black-box optimization benchmarking (BBOB) suite. Correspondingly, we focus on automated algorithm configuration (AAC), that is, selecting the best suited algorithm and fine-tuning its hyperparameters based on the landscape features of problem instances. Precisely, we analyze the performance of dense neural network (NN) models in handling the multi-output mixed regression and classification tasks using different training data sets, such as RGF and many-affine BBOB (MA-BBOB) functions. Based on our results on the BBOB functions in 5d and 20d, near optimal configurations can be identified using the proposed approach, which can most of the time outperform the off-the-shelf default configuration considered by practitioners with limited knowledge about AAC. Furthermore, the predicted configurations are competitive against the single best solver in many cases. Overall, configurations with better performance can be best identified by using NN models trained on a combination of RGF and MA-BBOB functions.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
OpenTracer: A Dynamic Transaction Trace Analyzer for Smart Contract Invariant Generation and Beyond
Authors:
Zhiyang Chen,
Ye Liu,
Sidi Mohamed Beillahi,
Yi Li,
Fan Long
Abstract:
Smart contracts, self-executing programs on the blockchain, facilitate reliable value exchanges without centralized oversight. Despite the recent focus on dynamic analysis of their transaction histories in both industry and academia, no open-source tool currently offers comprehensive tracking of complete transaction information to extract user-desired data such as invariant-related data. This pape…
▽ More
Smart contracts, self-executing programs on the blockchain, facilitate reliable value exchanges without centralized oversight. Despite the recent focus on dynamic analysis of their transaction histories in both industry and academia, no open-source tool currently offers comprehensive tracking of complete transaction information to extract user-desired data such as invariant-related data. This paper introduces OpenTracer, designed to address this gap. OpenTracer guarantees comprehensive tracking of every execution step, providing complete transaction information. OpenTracer has been employed to analyze 350,800 Ethereum transactions, successfully inferring 23 different types of invariant from predefined templates. The tool is fully open-sourced, serving as a valuable resource for developers and researchers aiming to study transaction behaviors or extract and validate new invariants from transaction traces. The source code of OpenTracer is available at https://github.com/jeffchen006/OpenTracer.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Assessing Code Generation with Intermediate Languages
Authors:
Xun Deng,
Sicheng Zhong,
Honghua Dong,
Jingyu Hu,
Sidi Mohamed Beillahi,
Xujie Si,
Fan Long
Abstract:
Intermediate step methodologies like chain of thoughts (COT) have demonstrated effectiveness in enhancing the performance of Large Language Models (LLMs) on code generation. This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code, and systematically evaluates their impact on the performance of LLMs in code…
▽ More
Intermediate step methodologies like chain of thoughts (COT) have demonstrated effectiveness in enhancing the performance of Large Language Models (LLMs) on code generation. This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code, and systematically evaluates their impact on the performance of LLMs in code generation tasks. Our experiments encompass eleven models across the CodeLlama, GPT, and Mistral families, as well as newly released smaller models. Our findings reveal that intermediate languages generally exhibit greater efficacy in larger models that have not yet achieved state-of-the-art performance. Natural language consistently emerges as the most effective intermediate representation across all target languages. However, we observe no universally effective intermediate formal language across different models and target languages. Furthermore, we uncover a weak correlation between the correctness of intermediate solutions and final generation, suggesting that improvements may stem from the chain-of-thought effect rather than language-specific transfer. Interestingly, we discover that for GPT family models, prompting multiple times without explicit self-correction instructions yields performance gains across the studied languages.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
PyGWalker: On-the-fly Assistant for Exploratory Visual Data Analysis
Authors:
Yue Yu,
Leixian Shen,
Fei Long,
Huamin Qu,
Hao Chen
Abstract:
Exploratory visual data analysis tools empower data analysts to efficiently and intuitively explore data insights throughout the entire analysis cycle. However, the gap between common programmatic analysis (e.g., within computational notebooks) and exploratory visual analysis leads to a disjointed and inefficient data analysis experience. To bridge this gap, we developed PyGWalker, a Python librar…
▽ More
Exploratory visual data analysis tools empower data analysts to efficiently and intuitively explore data insights throughout the entire analysis cycle. However, the gap between common programmatic analysis (e.g., within computational notebooks) and exploratory visual analysis leads to a disjointed and inefficient data analysis experience. To bridge this gap, we developed PyGWalker, a Python library that offers on-the-fly assistance for exploratory visual data analysis. It features a lightweight and intuitive GUI with a shelf builder modality. Its loosely coupled architecture supports multiple computational environments to accommodate varying data sizes. Since its release in February 2023, PyGWalker has gained much attention, with 612k downloads on PyPI and over 10.5k stars on GitHub as of June 2024. This demonstrates its value to the data science and visualization community, with researchers and developers integrating it into their own applications and studies.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
Authors:
Hanlei Zhang,
Hua Xu,
Fei Long,
Xin Wang,
Kai Gao
Abstract:
Discovering the semantics of multimodal utterances is essential for understanding human language and enhancing human-machine interactions. Existing methods manifest limitations in leveraging nonverbal information for discerning complex semantics in unsupervised scenarios. This paper introduces a novel unsupervised multimodal clustering method (UMC), making a pioneering contribution to this field.…
▽ More
Discovering the semantics of multimodal utterances is essential for understanding human language and enhancing human-machine interactions. Existing methods manifest limitations in leveraging nonverbal information for discerning complex semantics in unsupervised scenarios. This paper introduces a novel unsupervised multimodal clustering method (UMC), making a pioneering contribution to this field. UMC introduces a unique approach to constructing augmentation views for multimodal data, which are then used to perform pre-training to establish well-initialized representations for subsequent clustering. An innovative strategy is proposed to dynamically select high-quality samples as guidance for representation learning, gauged by the density of each sample's nearest neighbors. Besides, it is equipped to automatically determine the optimal value for the top-$K$ parameter in each cluster to refine sample selection. Finally, both high- and low-quality samples are used to learn representations conducive to effective clustering. We build baselines on benchmark multimodal intent and dialogue act datasets. UMC shows remarkable improvements of 2-6\% scores in clustering metrics over state-of-the-art methods, marking the first successful endeavor in this domain. The complete code and data are available at https://github.com/thuiar/UMC.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Demystifying Invariant Effectiveness for Securing Smart Contracts
Authors:
Zhiyang Chen,
Ye Liu,
Sidi Mohamed Beillahi,
Yi Li,
Fan Long
Abstract:
Smart contract transactions associated with security attacks often exhibit distinct behavioral patterns compared with historical benign transactions before the attacking events. While many runtime monitoring and guarding mechanisms have been proposed to validate invariants and stop anomalous transactions on the fly, the empirical effectiveness of the invariants used remains largely unexplored. In…
▽ More
Smart contract transactions associated with security attacks often exhibit distinct behavioral patterns compared with historical benign transactions before the attacking events. While many runtime monitoring and guarding mechanisms have been proposed to validate invariants and stop anomalous transactions on the fly, the empirical effectiveness of the invariants used remains largely unexplored. In this paper, we studied 23 prevalent invariants of 8 categories, which are either deployed in high-profile protocols or endorsed by leading auditing firms and security experts. Using these well-established invariants as templates, we developed a tool Trace2Inv which dynamically generates new invariants customized for a given contract based on its historical transaction data.
We evaluated Trace2Inv on 42 smart contracts that fell victim to 27 distinct exploits on the Ethereum blockchain. Our findings reveal that the most effective invariant guard alone can successfully block 18 of the 27 identified exploits with minimal gas overhead. Our analysis also shows that most of the invariants remain effective even when the experienced attackers attempt to bypass them. Additionally, we studied the possibility of combining multiple invariant guards, resulting in blocking up to 23 of the 27 benchmark exploits and achieving false positive rates as low as 0.32%. Trace2Inv outperforms current state-of-the-art works on smart contract invariant mining and transaction attack detection in terms of both practicality and accuracy. Though Trace2Inv is not primarily designed for transaction attack detection, it surprisingly found two previously unreported exploit transactions, earlier than any reported exploit transactions against the same victim contracts.
△ Less
Submitted 13 July, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Authors:
Zhongwei Zhang,
Fuchen Long,
Yingwei Pan,
Zhaofan Qiu,
Ting Yao,
Yang Cao,
Tao Mei
Abstract:
Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when shaping diffusion models to animate static image (i.e., image-to-video generation). The difficulty originates from the aspect that the diffusion process of subsequent animated frames should not only preserve the faithful alignment with the given imag…
▽ More
Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when shaping diffusion models to animate static image (i.e., image-to-video generation). The difficulty originates from the aspect that the diffusion process of subsequent animated frames should not only preserve the faithful alignment with the given image but also pursue temporal coherence among adjacent frames. To alleviate this, we present TRIP, a new recipe of image-to-video diffusion paradigm that pivots on image noise prior derived from static image to jointly trigger inter-frame relational reasoning and ease the coherent temporal modeling via temporal residual learning. Technically, the image noise prior is first attained through one-step backward diffusion process based on both static image and noised video latent codes. Next, TRIP executes a residual-like dual-path scheme for noise prediction: 1) a shortcut path that directly takes image noise prior as the reference noise of each frame to amplify the alignment between the first frame and subsequent frames; 2) a residual path that employs 3D-UNet over noised video and static image latent codes to enable inter-frame relational reasoning, thereby easing the learning of the residual noise for each frame. Furthermore, both reference and residual noise of each frame are dynamically merged via attention mechanism for final video generation. Extensive experiments on WebVid-10M, DTDB and MSR-VTT datasets demonstrate the effectiveness of our TRIP for image-to-video generation. Please see our project page at https://trip-i2v.github.io/TRIP/.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Authors:
Zhikai Chen,
Fuchen Long,
Zhaofan Qiu,
Ting Yao,
Wengang Zhou,
Jiebo Luo,
Tao Mei
Abstract:
Diffusion models are just at a tipping point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial…
▽ More
Diffusion models are just at a tipping point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo), for video super-resolution. SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction. Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE. SFA modulates frame features via adaptively estimating affine parameters for each pixel, guaranteeing pixel-wise guidance for high-resolution frame synthesis. TFA delves into feature interaction within a 3D local window (tubelet) through self-attention, and executes cross-attention between tubelet and its low-resolution counterpart to guide temporal feature alignment. Extensive experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Safeguarding DeFi Smart Contracts against Oracle Deviations
Authors:
Xun Deng,
Sidi Mohamed Beillahi,
Cyrus Minwalla,
Han Du,
Andreas Veneris,
Fan Long
Abstract:
This paper presents OVer, a framework designed to automatically analyze the behavior of decentralized finance (DeFi) protocols when subjected to a "skewed" oracle input. OVer firstly performs symbolic analysis on the given contract and constructs a model of constraints. Then, the framework leverages an SMT solver to identify parameters that allow its secure operation. Furthermore, guard statements…
▽ More
This paper presents OVer, a framework designed to automatically analyze the behavior of decentralized finance (DeFi) protocols when subjected to a "skewed" oracle input. OVer firstly performs symbolic analysis on the given contract and constructs a model of constraints. Then, the framework leverages an SMT solver to identify parameters that allow its secure operation. Furthermore, guard statements may be generated for smart contracts that may use the oracle values, thus effectively preventing oracle manipulation attacks. Empirical results show that OVer can successfully analyze all 10 benchmarks collected, which encompass a diverse range of DeFi protocols. Additionally, this paper also illustrates that current parameters utilized in the majority of benchmarks are inadequate to ensure safety when confronted with significant oracle deviations.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
VideoStudio: Generating Consistent-Content and Multi-Scene Videos
Authors:
Fuchen Long,
Zhaofan Qiu,
Ting Yao,
Tao Mei
Abstract:
The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video event occurring in a single background. Extending to generate multi-scene videos nevertheless is not trivial and necessitates to nicely manage the logic in between…
▽ More
The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video event occurring in a single background. Extending to generate multi-scene videos nevertheless is not trivial and necessitates to nicely manage the logic in between while preserving the consistent visual appearance of key content across video scenes. In this paper, we propose a novel framework, namely VideoStudio, for consistent-content and multi-scene video generation. Technically, VideoStudio leverages Large Language Models (LLM) to convert the input prompt into comprehensive multi-scene script that benefits from the logical knowledge learnt by LLM. The script for each scene includes a prompt describing the event, the foreground/background entities, as well as camera movement. VideoStudio identifies the common entities throughout the script and asks LLM to detail each entity. The resultant entity description is then fed into a text-to-image model to generate a reference image for each entity. Finally, VideoStudio outputs a multi-scene video by generating each scene video via a diffusion process that takes the reference images, the descriptive prompt of the event and camera movement into account. The diffusion model incorporates the reference images as the condition and alignment to strengthen the content consistency of multi-scene videos. Extensive experiments demonstrate that VideoStudio outperforms the SOTA video generation models in terms of visual quality, content consistency, and user preference. Source code is available at \url{https://github.com/FuchenUSTC/VideoStudio}.
△ Less
Submitted 16 September, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Towards Better Orthogonality Regularization with Disentangled Norm in Training Deep CNNs
Authors:
Changhao Wu,
Shenan Zhang,
Fangsong Long,
Ziliang Yin,
Tuo Leng
Abstract:
Orthogonality regularization has been developed to prevent deep CNNs from training instability and feature redundancy. Among existing proposals, kernel orthogonality regularization enforces orthogonality by minimizing the residual between the Gram matrix formed by convolutional filters and the orthogonality matrix.
We propose a novel measure for achieving better orthogonality among filters, whic…
▽ More
Orthogonality regularization has been developed to prevent deep CNNs from training instability and feature redundancy. Among existing proposals, kernel orthogonality regularization enforces orthogonality by minimizing the residual between the Gram matrix formed by convolutional filters and the orthogonality matrix.
We propose a novel measure for achieving better orthogonality among filters, which disentangles diagonal and correlation information from the residual. The model equipped with the measure under the principle of imposing strict orthogonality between filters surpasses previous regularization methods in near-orthogonality. Moreover, we observe the benefits of improved strict filter orthogonality in relatively shallow models, but as model depth increases, the performance gains in models employing strict kernel orthogonality decrease sharply.
Furthermore, based on the observation of the potential conflict between strict kernel orthogonality and growing model capacity, we propose a relaxation theory on kernel orthogonality regularization. The relaxed kernel orthogonality achieves enhanced performance on models with increased capacity, shedding light on the burden of strict kernel orthogonality on deep model performance.
We conduct extensive experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100. We observe state-of-the-art gains in model performance from the toolkit, which includes both strict orthogonality and relaxed orthogonality regularization, and obtain more robust models with expressive features. These experiments demonstrate the efficacy of our toolkit and subtly provide insights into the often overlooked challenges posed by strict orthogonality, addressing the burden of strict orthogonality on capacity-rich models.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Challenges of ELA-guided Function Evolution using Genetic Programming
Authors:
Fu Xing Long,
Diederick Vermetten,
Anna V. Kononova,
Roman Kalkreuth,
Kaifeng Yang,
Thomas Bäck,
Niki van Stein
Abstract:
Within the optimization community, the question of how to generate new optimization problems has been gaining traction in recent years. Within topics such as instance space analysis (ISA), the generation of new problems can provide new benchmarks which are not yet explored in existing research. Beyond that, this function generation can also be exploited for solving complex real-world optimization…
▽ More
Within the optimization community, the question of how to generate new optimization problems has been gaining traction in recent years. Within topics such as instance space analysis (ISA), the generation of new problems can provide new benchmarks which are not yet explored in existing research. Beyond that, this function generation can also be exploited for solving complex real-world optimization problems. By generating functions with similar properties to the target problem, we can create a robust test set for algorithm selection and configuration.
However, the generation of functions with specific target properties remains challenging. While features exist to capture low-level landscape properties, they might not always capture the intended high-level features. We show that a genetic programming (GP) approach guided by these exploratory landscape analysis (ELA) properties is not always able to find satisfying functions. Our results suggest that careful considerations of the weighting of landscape properties, as well as the distance measure used, might be required to evolve functions that are sufficiently representative to the target landscape.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery
Authors:
Hanlei Zhang,
Hua Xu,
Xin Wang,
Fei Long,
Kai Gao
Abstract:
New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services. However, most existing methods struggle to capture the complicated semantics of discrete text representations when limited or no prior knowledge of labeled data is available. To tackle this problem, we propose a novel clustering framework, USNID,…
▽ More
New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services. However, most existing methods struggle to capture the complicated semantics of discrete text representations when limited or no prior knowledge of labeled data is available. To tackle this problem, we propose a novel clustering framework, USNID, for unsupervised and semi-supervised new intent discovery, which has three key technologies. First, it fully utilizes unsupervised or semi-supervised data to mine shallow semantic similarity relations and provide well-initialized representations for clustering. Second, it designs a centroid-guided clustering mechanism to address the issue of cluster allocation inconsistency and provide high-quality self-supervised targets for representation learning. Third, it captures high-level semantics in unsupervised or semi-supervised data to discover fine-grained intent-wise clusters by optimizing both cluster-level and instance-level objectives. We also propose an effective method for estimating the cluster number in open-world scenarios without knowing the number of new intents beforehand. USNID performs exceptionally well on several benchmark intent datasets, achieving new state-of-the-art results in unsupervised and semi-supervised new intent discovery and demonstrating robust performance with different cluster numbers.
△ Less
Submitted 12 December, 2023; v1 submitted 16 April, 2023;
originally announced April 2023.
-
DoE2Vec: Deep-learning Based Features for Exploratory Landscape Analysis
Authors:
Bas van Stein,
Fu Xing Long,
Moritz Frenzel,
Peter Krause,
Markus Gitterle,
Thomas Bäck
Abstract:
We propose DoE2Vec, a variational autoencoder (VAE)-based methodology to learn optimization landscape characteristics for downstream meta-learning tasks, e.g., automated selection of optimization algorithms. Principally, using large training data sets generated with a random function generator, DoE2Vec self-learns an informative latent representation for any design of experiments (DoE). Unlike the…
▽ More
We propose DoE2Vec, a variational autoencoder (VAE)-based methodology to learn optimization landscape characteristics for downstream meta-learning tasks, e.g., automated selection of optimization algorithms. Principally, using large training data sets generated with a random function generator, DoE2Vec self-learns an informative latent representation for any design of experiments (DoE). Unlike the classical exploratory landscape analysis (ELA) method, our approach does not require any feature engineering and is easily applicable for high dimensional search spaces. For validation, we inspect the quality of latent reconstructions and analyze the latent representations using different experiments. The latent representations not only show promising potentials in identifying similar (cheap-to-evaluate) surrogate functions, but also can significantly boost performances when being used complementary to the classical ELA features in classification tasks.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
BBOB Instance Analysis: Landscape Properties and Algorithm Performance across Problem Instances
Authors:
Fu Xing Long,
Diederick Vermetten,
Bas van Stein,
Anna V. Kononova
Abstract:
Benchmarking is a key aspect of research into optimization algorithms, and as such the way in which the most popular benchmark suites are designed implicitly guides some parts of algorithm design. One of these suites is the black-box optimization benchmarking (BBOB) suite of 24 single-objective noiseless functions, which has been a standard for over a decade. Within this problem suite, different i…
▽ More
Benchmarking is a key aspect of research into optimization algorithms, and as such the way in which the most popular benchmark suites are designed implicitly guides some parts of algorithm design. One of these suites is the black-box optimization benchmarking (BBOB) suite of 24 single-objective noiseless functions, which has been a standard for over a decade. Within this problem suite, different instances of a single problem can be created, which is beneficial for testing the stability and invariance of algorithms under transformations. In this paper, we investigate the BBOB instance creation protocol by considering a set of 500 instances for each BBOB problem. Using exploratory landscape analysis, we show that the distribution of landscape features across BBOB instances is highly diverse for a large set of problems. In addition, we run a set of eight algorithms across these 500 instances, and investigate for which cases statistically significant differences in performance occur. We argue that, while the transformations applied in BBOB instances do indeed seem to preserve the high-level properties of the functions, their difference in practice should not be overlooked, particularly when treating the problems as box-constrained instead of unconstrained.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Dynamic Temporal Filtering in Video Models
Authors:
Fuchen Long,
Zhaofan Qiu,
Yingwei Pan,
Ting Yao,
Chong-Wah Ngo,
Tao Mei
Abstract:
Video temporal dynamics is conventionally modeled with 3D spatial-temporal kernel or its factorized version comprised of 2D spatial kernel and 1D temporal kernel. The modeling power, nevertheless, is limited by the fixed window size and static weights of a kernel along the temporal dimension. The pre-determined kernel size severely limits the temporal receptive fields and the fixed weights treat e…
▽ More
Video temporal dynamics is conventionally modeled with 3D spatial-temporal kernel or its factorized version comprised of 2D spatial kernel and 1D temporal kernel. The modeling power, nevertheless, is limited by the fixed window size and static weights of a kernel along the temporal dimension. The pre-determined kernel size severely limits the temporal receptive fields and the fixed weights treat each spatial location across frames equally, resulting in sub-optimal solution for long-range temporal modeling in natural scenes. In this paper, we present a new recipe of temporal feature learning, namely Dynamic Temporal Filter (DTF), that novelly performs spatial-aware temporal modeling in frequency domain with large temporal receptive field. Specifically, DTF dynamically learns a specialized frequency filter for every spatial location to model its long-range temporal dynamics. Meanwhile, the temporal feature of each spatial location is also transformed into frequency feature spectrum via 1D Fast Fourier Transform (FFT). The spectrum is modulated by the learnt frequency filter, and then transformed back to temporal domain with inverse FFT. In addition, to facilitate the learning of frequency filter in DTF, we perform frame-wise aggregation to enhance the primary temporal feature with its temporal neighbors by inter-frame correlation. It is feasible to plug DTF block into ConvNets and Transformer, yielding DTF-Net and DTF-Transformer. Extensive experiments conducted on three datasets demonstrate the superiority of our proposals. More remarkably, DTF-Transformer achieves an accuracy of 83.5% on Kinetics-400 dataset. Source code is available at \url{https://github.com/FuchenUSTC/DTF}.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
FlashSyn: Flash Loan Attack Synthesis via Counter Example Driven Approximation
Authors:
Zhiyang Chen,
Sidi Mohamed Beillahi,
Fan Long
Abstract:
In decentralized finance (DeFi), lenders can offer flash loans to borrowers, i.e., loans that are only valid within a blockchain transaction and must be repaid with fees by the end of that transaction. Unlike normal loans, flash loans allow borrowers to borrow large assets without upfront collaterals deposits. Malicious adversaries use flash loans to gather large assets to exploit vulnerable DeFi…
▽ More
In decentralized finance (DeFi), lenders can offer flash loans to borrowers, i.e., loans that are only valid within a blockchain transaction and must be repaid with fees by the end of that transaction. Unlike normal loans, flash loans allow borrowers to borrow large assets without upfront collaterals deposits. Malicious adversaries use flash loans to gather large assets to exploit vulnerable DeFi protocols. In this paper, we introduce a new framework for automated synthesis of adversarial transactions that exploit DeFi protocols using flash loans. To bypass the complexity of a DeFi protocol, we propose a new technique to approximate the DeFi protocol functional behaviors using numerical methods (polynomial linear regression and nearest-neighbor interpolation). We then construct an optimization query using the approximated functions of the DeFi protocol to find an adversarial attack constituted of a sequence of functions invocations with optimal parameters that gives the maximum profit. To improve the accuracy of the approximation, we propose a novel counterexample driven approximation refinement technique. We implement our framework in a tool named FlashSyn. We evaluate FlashSyn on 16 DeFi protocols that were victims to flash loan attacks and 2 DeFi protocols from Damn Vulnerable DeFi challenges. FlashSyn automatically synthesizes an adversarial attack for 16 of the 18 benchmarks. Among the 16 successful cases, FlashSyn identifies attack vectors yielding higher profits than those employed by historical hackers in 3 cases, and also discovers multiple distinct attack vectors in 10 cases, demonstrating its effectiveness in finding possible flash loan attacks.
△ Less
Submitted 12 January, 2024; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Bi-Calibration Networks for Weakly-Supervised Video Representation Learning
Authors:
Fuchen Long,
Ting Yao,
Zhaofan Qiu,
Xinmei Tian,
Jiebo Luo,
Tao Mei
Abstract:
The leverage of large volumes of web videos paired with the searched queries or surrounding texts (e.g., title) offers an economic and extensible alternative to supervised video representation learning. Nevertheless, modeling such weakly visual-textual connection is not trivial due to query polysemy (i.e., many possible meanings for a query) and text isomorphism (i.e., same syntactic structure of…
▽ More
The leverage of large volumes of web videos paired with the searched queries or surrounding texts (e.g., title) offers an economic and extensible alternative to supervised video representation learning. Nevertheless, modeling such weakly visual-textual connection is not trivial due to query polysemy (i.e., many possible meanings for a query) and text isomorphism (i.e., same syntactic structure of different text). In this paper, we introduce a new design of mutual calibration between query and text to boost weakly-supervised video representation learning. Specifically, we present Bi-Calibration Networks (BCN) that novelly couples two calibrations to learn the amendment from text to query and vice versa. Technically, BCN executes clustering on all the titles of the videos searched by an identical query and takes the centroid of each cluster as a text prototype. The query vocabulary is built directly on query words. The video-to-text/video-to-query projections over text prototypes/query vocabulary then start the text-to-query or query-to-text calibration to estimate the amendment to query or text. We also devise a selection scheme to balance the two corrections. Two large-scale web video datasets paired with query and title for each video are newly collected for weakly-supervised video representation learning, which are named as YOVO-3M and YOVO-10M, respectively. The video features of BCN learnt on 3M web videos obtain superior results under linear model protocol on downstream tasks. More remarkably, BCN trained on the larger set of 10M web videos with further fine-tuning leads to 1.6%, and 1.8% gains in top-1 accuracy on Kinetics-400, and Something-Something V2 datasets over the state-of-the-art TDN, and ACTION-Net methods with ImageNet pre-training. Source code and datasets are available at \url{https://github.com/FuchenUSTC/BCN}.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Stand-Alone Inter-Frame Attention in Video Models
Authors:
Fuchen Long,
Zhaofan Qiu,
Yingwei Pan,
Ting Yao,
Jiebo Luo,
Tao Mei
Abstract:
Motion, as the uniqueness of a video, has been critical to the development of video understanding models. Modern deep learning models leverage motion by either executing spatio-temporal 3D convolutions, factorizing 3D convolutions into spatial and temporal convolutions separately, or computing self-attention along temporal dimension. The implicit assumption behind such successes is that the featur…
▽ More
Motion, as the uniqueness of a video, has been critical to the development of video understanding models. Modern deep learning models leverage motion by either executing spatio-temporal 3D convolutions, factorizing 3D convolutions into spatial and temporal convolutions separately, or computing self-attention along temporal dimension. The implicit assumption behind such successes is that the feature maps across consecutive frames can be nicely aggregated. Nevertheless, the assumption may not always hold especially for the regions with large deformation. In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location. Technically, SIFA remoulds the deformable design via re-scaling the offset predictions by the difference between two frames. Taking each spatial location in the current frame as the query, the locally deformable neighbors in the next frame are regarded as the keys/values. Then, SIFA measures the similarity between query and keys as stand-alone attention to weighted average the values for temporal aggregation. We further plug SIFA block into ConvNets and Vision Transformer, respectively, to devise SIFA-Net and SIFA-Transformer. Extensive experiments conducted on four video datasets demonstrate the superiority of SIFA-Net and SIFA-Transformer as stronger backbones. More remarkably, SIFA-Transformer achieves an accuracy of 83.1% on Kinetics-400 dataset. Source code is available at \url{https://github.com/FuchenUSTC/SIFA}.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation
Authors:
Yingwei Pan,
Yehao Li,
Yiheng Zhang,
Qi Cai,
Fuchen Long,
Zhaofan Qiu,
Ting Yao,
Tao Mei
Abstract:
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021:
No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories. We investigate both imitation learning-based approach, i.e., imitating the observed behavior using classical supervised learning…
▽ More
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021:
No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories. We investigate both imitation learning-based approach, i.e., imitating the observed behavior using classical supervised learning techniques, and offline reinforcement learning-based approaches, for this track. Moreover, the geometry and texture structures of objects and robotic arms are exploited via Transformer-based networks to facilitate imitation learning.
No Restriction Track: In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks. For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms.
To ease the implementations of our systems, all the source codes and pre-trained models are available at \url{https://github.com/caiqi/Silver-Bullet-3D/}.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification
Authors:
Jiangtao Xie,
Fei Long,
Jiaming Lv,
Qilong Wang,
Peihua Li
Abstract:
Few-shot classification is a challenging problem as only very few training examples are given for each new task. One of the effective research lines to address this challenge focuses on learning deep representations driven by a similarity measure between a query image and few support images of some class. Statistically, this amounts to measure the dependency of image features, viewed as random vec…
▽ More
Few-shot classification is a challenging problem as only very few training examples are given for each new task. One of the effective research lines to address this challenge focuses on learning deep representations driven by a similarity measure between a query image and few support images of some class. Statistically, this amounts to measure the dependency of image features, viewed as random vectors in a high-dimensional embedding space. Previous methods either only use marginal distributions without considering joint distributions, suffering from limited representation capability, or are computationally expensive though harnessing joint distributions. In this paper, we propose a deep Brownian Distance Covariance (DeepBDC) method for few-shot classification. The central idea of DeepBDC is to learn image representations by measuring the discrepancy between joint characteristic functions of embedded features and product of the marginals. As the BDC metric is decoupled, we formulate it as a highly modular and efficient layer. Furthermore, we instantiate DeepBDC in two different few-shot classification frameworks. We make experiments on six standard few-shot image benchmarks, covering general object recognition, fine-grained categorization and cross-domain classification. Extensive evaluations show our DeepBDC significantly outperforms the counterparts, while establishing new state-of-the-art results. The source code is available at http://www.peihuali.org/DeepBDC
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
Joint CNN and Transformer Network via weakly supervised Learning for efficient crowd counting
Authors:
Fusen Wang,
Kai Liu,
Fei Long,
Nong Sang,
Xiaofeng Xia,
Jun Sang
Abstract:
Currently, for crowd counting, the fully supervised methods via density map estimation are the mainstream research directions. However, such methods need location-level annotation of persons in an image, which is time-consuming and laborious. Therefore, the weakly supervised method just relying upon the count-level annotation is urgently needed. Since CNN is not suitable for modeling the global co…
▽ More
Currently, for crowd counting, the fully supervised methods via density map estimation are the mainstream research directions. However, such methods need location-level annotation of persons in an image, which is time-consuming and laborious. Therefore, the weakly supervised method just relying upon the count-level annotation is urgently needed. Since CNN is not suitable for modeling the global context and the interactions between image patches, crowd counting with weakly supervised learning via CNN generally can not show good performance. The weakly supervised model via Transformer was sequentially proposed to model the global context and learn contrast features. However, the transformer directly partitions the crowd images into a series of tokens, which may not be a good choice due to each pedestrian being an independent individual, and the parameter number of the network is very large. Hence, we propose a Joint CNN and Transformer Network (JCTNet) via weakly supervised learning for crowd counting in this paper. JCTNet consists of three parts: CNN feature extraction module (CFM), Transformer feature extraction module (TFM), and counting regression module (CRM). In particular, the CFM extracts crowd semantic information features, then sends their patch partitions to TRM for modeling global context, and CRM is used to predict the number of people. Extensive experiments and visualizations demonstrate that JCTNet can effectively focus on the crowd regions and obtain superior weakly supervised counting performance on five mainstream datasets. The number of parameters of the model can be reduced by about 67%~73% compared with the pure Transformer works. We also tried to explain the phenomenon that a model constrained only by count-level annotations can still focus on the crowd regions. We believe our work can promote further research in this field.
△ Less
Submitted 12 March, 2022;
originally announced March 2022.
-
Utilizing Parallelism in Smart Contracts on Decentralized Blockchains by Taming Application-Inherent Conflicts
Authors:
Péter Garamvölgyi,
Yuxi Liu,
Dong Zhou,
Fan Long,
Ming Wu
Abstract:
Traditional public blockchain systems typically had very limited transaction throughput because of the bottleneck of the consensus protocol itself. With recent advances in consensus technology, the performance limit has been greatly lifted, typically to thousands of transactions per second. With this, transaction execution has become a new performance bottleneck. Exploiting parallelism in transact…
▽ More
Traditional public blockchain systems typically had very limited transaction throughput because of the bottleneck of the consensus protocol itself. With recent advances in consensus technology, the performance limit has been greatly lifted, typically to thousands of transactions per second. With this, transaction execution has become a new performance bottleneck. Exploiting parallelism in transaction execution is a clear and direct way to address this and to further increase transaction throughput. Although some recent literature introduced concurrency control mechanisms to execute smart contract transactions in parallel, the reported speedup that they can achieve is far from ideal. The main reason is that the proposed parallel execution mechanisms cannot effectively deal with the conflicts inherent in many blockchain applications.
In this work, we thoroughly study the historical transaction execution traces in Ethereum. We observe that application-inherent conflicts are the major factors that limit the exploitable parallelism during execution. We propose to use partitioned counters and special commutative instructions to break up the application conflict chains in order to maximize the potential speedup. When we evaluated the maximum parallel speedup achievable, these techniques doubled this limit to an 18x overall speedup compared to serial execution, thus approaching the optimum. We also propose OCC-DA, an optimistic concurrency control scheduler with deterministic aborts, which makes it possible to use OCC scheduling in public blockchain settings.
△ Less
Submitted 10 February, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
SigVM: Enabling Event-Driven Execution for Autonomous Smart Contracts
Authors:
Zihan Zhao,
Sidi Mohamed Beillahi,
Ryan Song,
Yuxi Cai,
Andreas Veneris,
Fan Long
Abstract:
This paper presents SigVM, a novel blockchain virtual machine that supports an event-driven execution model, enabling developers to build autonomous smart contracts. Contracts in SigVM can emit signal events, on which other contracts can listen. Once an event is triggered, corresponding handler functions are automatically executed as signal transactions. We build an end-to-end blockchain platform…
▽ More
This paper presents SigVM, a novel blockchain virtual machine that supports an event-driven execution model, enabling developers to build autonomous smart contracts. Contracts in SigVM can emit signal events, on which other contracts can listen. Once an event is triggered, corresponding handler functions are automatically executed as signal transactions. We build an end-to-end blockchain platform SigChain and a contract language compiler SigSolid to realize the potential of SigVM. Experimental results show that our benchmark applications can be reimplemented with SigVM in an autonomous way, eliminating the dependency on unreliable mechanisms like off-chain relay servers. The development effort of reimplementing these contracts with SigVM is small, i.e., we modified on average 2.6% of the contract code.
△ Less
Submitted 17 November, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Learning to Localize Actions from Moments
Authors:
Fuchen Long,
Ting Yao,
Zhaofan Qiu,
Xinmei Tian,
Jiebo Luo,
Tao Mei
Abstract:
With the knowledge of action moments (i.e., trimmed video clips that each contains an action instance), humans could routinely localize an action temporally in an untrimmed video. Nevertheless, most practical methods still require all training videos to be labeled with temporal annotations (action category and temporal boundary) and develop the models in a fully-supervised manner, despite expensiv…
▽ More
With the knowledge of action moments (i.e., trimmed video clips that each contains an action instance), humans could routinely localize an action temporally in an untrimmed video. Nevertheless, most practical methods still require all training videos to be labeled with temporal annotations (action category and temporal boundary) and develop the models in a fully-supervised manner, despite expensive labeling efforts and inapplicable to new categories. In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes. Specifically, we present Action Herald Networks (AherNet) that integrate such design into an one-stage action localization framework. Technically, a weight transfer function is uniquely devised to build the transformation between classification of action moments or foreground video segments and action localization in synthetic contextual moments or untrimmed videos. The context of each moment is learnt through the adversarial mechanism to differentiate the generated features from those of background in untrimmed videos. Extensive experiments are conducted on the learning both across the splits of ActivityNet v1.3 and from THUMOS14 to ActivityNet v1.3. Our AherNet demonstrates the superiority even comparing to most fully-supervised action localization methods. More remarkably, we train AherNet to localize actions from 600 categories on the leverage of action moments in Kinetics-600 and temporal annotations from 200 classes in ActivityNet v1.3. Source code and data are available at \url{https://github.com/FuchenUSTC/AherNet}.
△ Less
Submitted 31 August, 2020;
originally announced August 2020.
-
Automatic Horizontal Fusion for GPU Kernels
Authors:
Ao Li,
Bojian Zheng,
Gennady Pekhimenko,
Fan Long
Abstract:
We present automatic horizontal fusion, a novel optimization technique that complements the standard kernel fusion techniques for GPU programs. Unlike the standard fusion, whose goal is to eliminate intermediate data round trips, our horizontal fusion technique aims to increase the thread-level parallelism to hide instruction latencies. We also present HFuse, a new source to source CUDA compiler t…
▽ More
We present automatic horizontal fusion, a novel optimization technique that complements the standard kernel fusion techniques for GPU programs. Unlike the standard fusion, whose goal is to eliminate intermediate data round trips, our horizontal fusion technique aims to increase the thread-level parallelism to hide instruction latencies. We also present HFuse, a new source to source CUDA compiler that implements automatic horizontal fusion. Our experimental results show that horizontal fusion can speed up the running time by 2.5%-60.8%. Our results reveal that the horizontal fusion is especially beneficial for fusing kernels with instructions that require different kinds of GPU resources (e.g., a memory-intensive kernel and a compute-intensive kernel).
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
GHAST: Breaking Confirmation Delay Barrier in Nakamoto Consensus via Adaptive Weighted Blocks
Authors:
Chenxing Li,
Fan Long,
Guang Yang
Abstract:
Initiated from Nakamoto's Bitcoin system, blockchain technology has demonstrated great capability of building secure consensus among decentralized parties at Internet-scale, i.e., without relying on any centralized trusted party. Nowadays, blockchain systems find applications in various fields. But the performance is increasingly becoming a bottleneck, especially when permissionless participation…
▽ More
Initiated from Nakamoto's Bitcoin system, blockchain technology has demonstrated great capability of building secure consensus among decentralized parties at Internet-scale, i.e., without relying on any centralized trusted party. Nowadays, blockchain systems find applications in various fields. But the performance is increasingly becoming a bottleneck, especially when permissionless participation is retained for full decentralization.
In this work, we present a new consensus protocol named GHAST (Greedy Heaviest Adaptive Sub-Tree) which organizes blocks in a Tree-Graph structure (i.e., a directed acyclic graph (DAG) with a tree embedded) that allows fast and concurrent block generation. GHAST protocol simultaneously achieves a logarithmically bounded liveness guarantee and low confirmation latency. More specifically, for maximum latency $d$ and adversarial computing power bounded away from 50\%, GHAST guarantees confirmation with confidence $\ge 1-\varepsilon$ after a time period of $O(d\cdot \log(1/\varepsilon))$. When there is no observable attack, GHAST only needs $3d$ time to achieve confirmation at the same confidence level as six-block-confirmation in Bitcoin, while it takes roughly $360d$ in Bitcoin.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Causal Estimation of Stay-at-Home Orders on SARS-CoV-2 Transmission
Authors:
M. Keith Chen,
Yilin Zhuo,
Malena de la Fuente,
Ryne Rohla,
Elisa F. Long
Abstract:
Accurately estimating the effectiveness of stay-at-home orders (SHOs) on reducing social contact and disease spread is crucial for mitigating pandemics. Leveraging individual-level location data for 10 million smartphones, we observe that by April 30th---when nine in ten Americans were under a SHO---daily movement had fallen 70% from pre-COVID levels. One-quarter of this decline is causally attrib…
▽ More
Accurately estimating the effectiveness of stay-at-home orders (SHOs) on reducing social contact and disease spread is crucial for mitigating pandemics. Leveraging individual-level location data for 10 million smartphones, we observe that by April 30th---when nine in ten Americans were under a SHO---daily movement had fallen 70% from pre-COVID levels. One-quarter of this decline is causally attributable to SHOs, with wide demographic differences in compliance, most notably by political affiliation. Likely Trump voters reduce movement by 9% following a local SHO, compared to a 21% reduction among their Clinton-voting neighbors, who face similar exposure risks and identical government orders. Linking social distancing behavior with an epidemic model, we estimate that reductions in movement have causally reduced SARS-CoV-2 transmission rates by 49%.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
Securing Smart Contract On The Fly
Authors:
Ao Li,
Jemin Andrew Choi,
Fan Long
Abstract:
We present Solythesis, a source to source Solidity compiler which takes a smart contract code and a user specified invariant as the input and produces an instrumented contract that rejects all transactions that violate the invariant. The design of Solythesis is driven by our observation that the consensus protocol and the storage layer are the primary and the secondary performance bottlenecks of E…
▽ More
We present Solythesis, a source to source Solidity compiler which takes a smart contract code and a user specified invariant as the input and produces an instrumented contract that rejects all transactions that violate the invariant. The design of Solythesis is driven by our observation that the consensus protocol and the storage layer are the primary and the secondary performance bottlenecks of Ethereum, respectively. Solythesis operates with our novel delta update and delta check techniques to minimize the overhead caused by the instrumented storage access statements. Our experimental results validate our hypothesis that the overhead of runtime validation, which is often too expensive for other domains, is in fact negligible for smart contracts. The CPU overhead of Solythesis is only 0.12% on average for our 23 benchmark contracts.
△ Less
Submitted 28 November, 2019;
originally announced November 2019.
-
Gaussian Temporal Awareness Networks for Action Localization
Authors:
Fuchen Long,
Ting Yao,
Zhaofan Qiu,
Xinmei Tian,
Jiebo Luo,
Tao Mei
Abstract:
Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Nevertheless, the results can suffer from robustness problem due to the design of predetermined temporal scale…
▽ More
Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Nevertheless, the results can suffer from robustness problem due to the design of predetermined temporal scales, which overlooks the temporal structure of an action and limits the utility on detecting actions with complex variations. In this paper, we propose to address the problem by introducing Gaussian kernels to dynamically optimize temporal scale of each action proposal. Specifically, we present Gaussian Temporal Awareness Networks (GTAN) --- a new architecture that novelly integrates the exploitation of temporal structure into an one-stage action localization framework. Technically, GTAN models the temporal structure through learning a set of Gaussian kernels, each for a cell in the feature maps. Each Gaussian kernel corresponds to a particular interval of an action proposal and a mixture of Gaussian kernels could further characterize action proposals with various length. Moreover, the values in each Gaussian curve reflect the contextual contributions to the localization of an action proposal. Extensive experiments are conducted on both THUMOS14 and ActivityNet v1.3 datasets, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, GTAN achieves 1.9% and 1.1% improvements in mAP on testing set of the two datasets.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
vireoJD-MM at Activity Detection in Extended Videos
Authors:
Fuchen Long,
Qi Cai,
Zhaofan Qiu,
Zhijian Hou,
Yingwei Pan,
Ting Yao,
Chong-Wah Ngo
Abstract:
This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019. Specifically, we exploit person/vehicle detections in spatial level and action localization in temporal level for action detection in surveillance videos. The mechanism of different tubelet generation and model decomposition me…
▽ More
This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019. Specifically, we exploit person/vehicle detections in spatial level and action localization in temporal level for action detection in surveillance videos. The mechanism of different tubelet generation and model decomposition methods are studied as well. The detection results are finally predicted by late fusing the results from each component.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Detecting Standard Violation Errors in Smart Contracts
Authors:
Ao Li,
Fan Long
Abstract:
We present SOLAR, a new analysis tool for automatically detecting standard violation errors in Ethereum smart contracts.Given the Ethereum Virtual Machine (EVM) bytecode of a smart contract and a user specified constraint or invariant derived from a technical standard such as ERC-20,SOLAR symbolically executes the contract, explores all possible execution paths, and checks whether it is possible t…
▽ More
We present SOLAR, a new analysis tool for automatically detecting standard violation errors in Ethereum smart contracts.Given the Ethereum Virtual Machine (EVM) bytecode of a smart contract and a user specified constraint or invariant derived from a technical standard such as ERC-20,SOLAR symbolically executes the contract, explores all possible execution paths, and checks whether it is possible to initiate a sequence of malicious transactions to violate the specified constraint or invariant. Our experimental results highlight the effectiveness of SOLAR in finding new errors in smart con-tracts. Out of the evaluated 779 ERC-20 and 310 ERC-721smart contracts, SOLAR found 255 standard violation errors in 197 vulnerable contracts with only three false positives.237 out of the 255 errors are zero-day errors that are not re-ported before. Our results sound the alarm on the prevalence of standard violation errors in critical smart contracts that manipulate publicly traded digital assets
△ Less
Submitted 18 February, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Scaling Nakamoto Consensus to Thousands of Transactions per Second
Authors:
Chenxing Li,
Peilun Li,
Dong Zhou,
Wei Xu,
Fan Long,
Andrew Yao
Abstract:
This paper presents Conflux, a fast, scalable and decentralized blockchain system that optimistically process concurrent blocks without discarding any as forks. The Conflux consensus protocol represents relationships between blocks as a direct acyclic graph and achieves consensus on a total order of the blocks. Conflux then, from the block order, deterministically derives a transaction total order…
▽ More
This paper presents Conflux, a fast, scalable and decentralized blockchain system that optimistically process concurrent blocks without discarding any as forks. The Conflux consensus protocol represents relationships between blocks as a direct acyclic graph and achieves consensus on a total order of the blocks. Conflux then, from the block order, deterministically derives a transaction total order as the blockchain ledger. We evaluated Conflux on Amazon EC2 clusters with up to 20k full nodes. Conflux achieves a transaction throughput of 5.76GB/h while confirming transactions in 4.5-7.4 minutes. The throughput is equivalent to 6400 transactions per second for typical Bitcoin transactions. Our results also indicate that when running Conflux, the consensus protocol is no longer the throughput bottleneck. The bottleneck is instead at the processing capability of individual nodes.
△ Less
Submitted 31 August, 2018; v1 submitted 10 May, 2018;
originally announced May 2018.
-
An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems
Authors:
Fan Long,
Martin Rinard
Abstract:
We present the first systematic analysis of the characteristics of patch search spaces for automatic patch generation systems. We analyze the search spaces of two current state-of-the-art systems, SPR and Prophet, with 16 different search space configurations. Our results are derived from an analysis of 1104 different search spaces and 768 patch generation executions. Together these experiments co…
▽ More
We present the first systematic analysis of the characteristics of patch search spaces for automatic patch generation systems. We analyze the search spaces of two current state-of-the-art systems, SPR and Prophet, with 16 different search space configurations. Our results are derived from an analysis of 1104 different search spaces and 768 patch generation executions. Together these experiments consumed over 9000 hours of CPU time on Amazon EC2.
The analysis shows that 1) correct patches are sparse in the search spaces (typically at most one correct patch per search space per defect), 2) incorrect patches that nevertheless pass all of the test cases in the validation test suite are typically orders of magnitude more abundant, and 3) leveraging information other than the test suite is therefore critical for enabling the system to successfully isolate correct patches.
We also characterize a key tradeoff in the structure of the search spaces. Larger and richer search spaces that contain correct patches for more defects can actually cause systems to find fewer, not more, correct patches. We identify two reasons for this phenomenon: 1) increased validation times because of the presence of more candidate patches and 2) more incorrect patches that pass the test suite and block the discovery of correct patches. These fundamental properties, which are all characterized for the first time in this paper, help explain why past systems often fail to generate correct patches and help identify challenges, opportunities, and productive future directions for the field.
△ Less
Submitted 17 February, 2016;
originally announced February 2016.
-
Relaxed Belief Propagation for MIMO Detection
Authors:
Feichi Long,
Tiejun Lv
Abstract:
In this paper, relaxed belief propagation (RBP) based detectors are proposed for multiple-input multiple-out (MIMO) system. The factor graph is leveraged to represent the MIMO channels, and based on which our algorithms are developed. Unlike the existing complicated standard belief propagation (SBP) detector that considers all the edges of the factor graph when updating messages, the proposed RBP…
▽ More
In this paper, relaxed belief propagation (RBP) based detectors are proposed for multiple-input multiple-out (MIMO) system. The factor graph is leveraged to represent the MIMO channels, and based on which our algorithms are developed. Unlike the existing complicated standard belief propagation (SBP) detector that considers all the edges of the factor graph when updating messages, the proposed RBP focuses on partial edges, which largely reduces computational complexity. In particular, relax degree is introduced in to determine how many edges to be selected, whereby RBP is a generalized edge selection based BP method and SBP is a special case of RBP having the smallest relax degree. Moreover, we propose a novel Gaussian approximation with feedback information mechanism to enable the proposed RBP detector. In order to further improve the detection performance, we also propose to cascade a minimum mean square error (MMSE) detector before the RBP detector, from which pseudo priori information is judiciously exploited. Convergence and complexity analyses, along with the numerical simulation results, verify that the proposed RBP outperform other BP methods having the similar complexity, and the MMSE cascaded RBP even outperform SBP at the largest relax degree in large MIMO systems.
△ Less
Submitted 4 May, 2011; v1 submitted 19 March, 2011;
originally announced March 2011.