Search | arXiv e-print repository

STRICT: Stress Test of Rendering Images Containing Text

Authors: Tianyu Zhang, Xinyu Wang, Zhenghan Tai, Lu Li, Jijun Chi, Jingrui Tian, Hailin He, Suyuchen Wang

Abstract: While diffusion models have revolutionized text-to-image generation with their ability to synthesize realistic and diverse scenes, they continue to struggle to generate consistent and legible text within images. This shortcoming is commonly attributed to the locality bias inherent in diffusion-based generation, which limits their ability to model long-range spatial dependencies. In this paper, we… ▽ More While diffusion models have revolutionized text-to-image generation with their ability to synthesize realistic and diverse scenes, they continue to struggle to generate consistent and legible text within images. This shortcoming is commonly attributed to the locality bias inherent in diffusion-based generation, which limits their ability to model long-range spatial dependencies. In this paper, we introduce $\textbf{STRICT}$, a benchmark designed to systematically stress-test the ability of diffusion models to render coherent and instruction-aligned text in images. Our benchmark evaluates models across multiple dimensions: (1) the maximum length of readable text that can be generated; (2) the correctness and legibility of the generated text, and (3) the ratio of not following instructions for generating text. We evaluate several state-of-the-art models, including proprietary and open-source variants, and reveal persistent limitations in long-range consistency and instruction-following capabilities. Our findings provide insights into architectural bottlenecks and motivate future research directions in multimodal generative modeling. We release our entire evaluation pipeline at https://github.com/tianyu-z/STRICT-Bench. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: 13 pages

arXiv:2504.14493 [pdf, ps, other]

FinSage: A Multi-aspect RAG System for Financial Filings Question Answering

Authors: Xinyu Wang, Jijun Chi, Zhenghan Tai, Tung Sum Thomas Kwok, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Jingrui Tian, Fengran Mo, Yufei Cui, Ling Zhou

Abstract: Leveraging large language models in real-world settings often entails a need to utilize domain-specific data and tools in order to follow the complex regulations that need to be followed for acceptable use. Within financial sectors, modern enterprises increasingly rely on Retrieval-Augmented Generation (RAG) systems to address complex compliance requirements in financial document workflows. Howeve… ▽ More Leveraging large language models in real-world settings often entails a need to utilize domain-specific data and tools in order to follow the complex regulations that need to be followed for acceptable use. Within financial sectors, modern enterprises increasingly rely on Retrieval-Augmented Generation (RAG) systems to address complex compliance requirements in financial document workflows. However, existing solutions struggle to account for the inherent heterogeneity of data (e.g., text, tables, diagrams) and evolving nature of regulatory standards used in financial filings, leading to compromised accuracy in critical information extraction. We propose the FinSage framework as a solution, utilizing a multi-aspect RAG framework tailored for regulatory compliance analysis in multi-modal financial documents. FinSage introduces three innovative components: (1) a multi-modal pre-processing pipeline that unifies diverse data formats and generates chunk-level metadata summaries, (2) a multi-path sparse-dense retrieval system augmented with query expansion (HyDE) and metadata-aware semantic search, and (3) a domain-specialized re-ranking module fine-tuned via Direct Preference Optimization (DPO) to prioritize compliance-critical content. Extensive experiments demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions derived from surpasses the best baseline method on the FinanceBench question answering datasets by 24.06% in accuracy. Moreover, FinSage has been successfully deployed as financial question-answering agent in online meetings, where it has already served more than 1,200 people. △ Less

Submitted 6 June, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

arXiv:2406.13231 [pdf, other]

Tight Lower Bounds for Directed Cut Sparsification and Distributed Min-Cut

Authors: Yu Cheng, Max Li, Honghao Lin, Zi-Yi Tai, David P. Woodruff, Jason Zhang

Abstract: In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors. The first problem is to approximate cuts in balanced directed graphs. In this problem, the goal is to build a data structure that $(1 \pm ε)$-approximates cut values in graphs with $n$ vertices. For arbitrary directed graph… ▽ More In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors. The first problem is to approximate cuts in balanced directed graphs. In this problem, the goal is to build a data structure that $(1 \pm ε)$-approximates cut values in graphs with $n$ vertices. For arbitrary directed graphs, such a data structure requires $Ω(n^2)$ bits even for constant $ε$. To circumvent this, recent works study $β$-balanced graphs, meaning that for every directed cut, the total weight of edges in one direction is at most $β$ times that in the other direction. We consider two models: the {\em for-each} model, where the goal is to approximate each cut with constant probability, and the {\em for-all} model, where all cuts must be preserved simultaneously. We improve the previous $Ω(n \sqrt{β/ε})$ lower bound to $\tildeΩ(n \sqrtβ/ε)$ in the for-each model, and we improve the previous $Ω(n β/ε)$ lower bound to $Ω(n β/ε^2)$ in the for-all model. This resolves the main open questions of (Cen et al., ICALP, 2021). The second problem is to approximate the global minimum cut in a local query model, where we can only access the graph via degree, edge, and adjacency queries. We improve the previous $Ω\bigl(\frac{m}{k}\bigr)$ query complexity lower bound to $Ω\bigl(\min\{m, \frac{m}{ε^2 k}\}\bigr)$ for this problem, where $m$ is the number of edges, $k$ is the size of the minimum cut, and we seek a $(1+ε)$-approximation. In addition, we show that existing upper bounds with slight modifications match our lower bound up to logarithmic factors. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2306.17174 [pdf, other]

Empowering NLG: Offline Reinforcement Learning for Informal Summarization in Online Domains

Authors: Zhi-Xuan Tai, Po-Chuan Chen

Abstract: Our research introduces an innovative Natural Language Generation (NLG) approach that aims to optimize user experience and alleviate the workload of human customer support agents. Our primary objective is to generate informal summaries for online articles and posts using an offline reinforcement learning technique. In our study, we compare our proposed method with existing approaches to text gener… ▽ More Our research introduces an innovative Natural Language Generation (NLG) approach that aims to optimize user experience and alleviate the workload of human customer support agents. Our primary objective is to generate informal summaries for online articles and posts using an offline reinforcement learning technique. In our study, we compare our proposed method with existing approaches to text generation and provide a comprehensive overview of our architectural design, which incorporates crawling, reinforcement learning, and text generation modules. By presenting this original approach, our paper makes a valuable contribution to the field of NLG by offering a fresh perspective on generating natural language summaries for online content. Through the implementation of Empowering NLG, we are able to generate higher-quality replies in the online domain. The experimental results demonstrate a significant improvement in the average "like" score, increasing from 0.09954378 to 0.5000152. This advancement has the potential to enhance the efficiency and effectiveness of customer support services and elevate the overall user experience when consuming online content. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: 14 pages, 3 figures

arXiv:2208.13699 [pdf, other]

Graph Exploration with Embedding-Guided Layouts

Authors: Leixian Shen, Zhiwei Tai, Enya Shen, Jianmin Wang

Abstract: Node-link diagrams are widely used to visualize graphs. Most graph layout algorithms only use graph topology for aesthetic goals (e.g., minimize node occlusions and edge crossings) or use node attributes for exploration goals (e.g., preserve visible communities). Existing hybrid methods that bind the two perspectives still suffer from various generation restrictions (e.g., limited input types and… ▽ More Node-link diagrams are widely used to visualize graphs. Most graph layout algorithms only use graph topology for aesthetic goals (e.g., minimize node occlusions and edge crossings) or use node attributes for exploration goals (e.g., preserve visible communities). Existing hybrid methods that bind the two perspectives still suffer from various generation restrictions (e.g., limited input types and required manual adjustments and prior knowledge of graphs) and the imbalance between aesthetic and exploration goals. In this paper, we propose a flexible embedding-based graph exploration pipeline to enjoy the best of both graph topology and node attributes. First, we leverage embedding algorithms for attributed graphs to encode the two perspectives into latent space. Then, we present an embedding-driven graph layout algorithm, GEGraph, which can achieve aesthetic layouts with better community preservation to support an easy interpretation of the graph structure. Next, graph explorations are extended based on the generated graph layout and insights extracted from the embedding vectors. Illustrated with examples, we build a layout-preserving aggregation method with Focus+Context interaction and a related nodes searching approach with multiple proximity strategies. Finally, we conduct quantitative and qualitative evaluations, a user study, and two case studies to validate our approach. △ Less

Submitted 19 January, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

Comments: accepted by TVCG

arXiv:2205.03183 [pdf]

doi 10.1007/s41019-022-00195-3

Visual Data Analysis with Task-based Recommendations

Authors: Leixian Shen, Enya Shen, Zhiwei Tai, Yihao Xu, Jianmin Wang

Abstract: General visualization recommendation systems typically make design decisions for the dataset automatically. However, most of them can only prune meaningless visualizations but fail to recommend targeted results. This paper contributes TaskVis, a task-oriented visualization recommendation system that allows users to select their tasks precisely on the interface. We first summarize a task base with… ▽ More General visualization recommendation systems typically make design decisions for the dataset automatically. However, most of them can only prune meaningless visualizations but fail to recommend targeted results. This paper contributes TaskVis, a task-oriented visualization recommendation system that allows users to select their tasks precisely on the interface. We first summarize a task base with 18 classical analytic tasks by a survey both in academia and industry. On this basis, we maintain a rule base, which extends empirical wisdom with our targeted modeling of the analytic tasks. Then, our rule-based approach enumerates all the candidate visualizations through answer set programming. After that, the generated charts can be ranked by four ranking schemes. Furthermore, we introduce a task-based combination recommendation strategy, leveraging a set of visualizations to give a brief view of the dataset collaboratively. Finally, we evaluate TaskVis through a series of use cases and a user study. △ Less

Submitted 14 September, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

Comments: 16 pages 10 figures. Data Sci. Eng. (2022)

arXiv:2109.03506 [pdf, other]

doi 10.1109/TVCG.2022.3148007

Towards Natural Language Interfaces for Data Visualization: A Survey

Authors: Leixian Shen, Enya Shen, Yuyu Luo, Xiaocong Yang, Xuming Hu, Xiongshuai Zhang, Zhiwei Tai, Jianmin Wang

Abstract: Utilizing Visualization-oriented Natural Language Interfaces (V-NLI) as a complementary input modality to direct manipulation for visual analytics can provide an engaging user experience. It enables users to focus on their tasks rather than having to worry about how to operate visualization tools on the interface. In the past two decades, leveraging advanced natural language processing technologie… ▽ More Utilizing Visualization-oriented Natural Language Interfaces (V-NLI) as a complementary input modality to direct manipulation for visual analytics can provide an engaging user experience. It enables users to focus on their tasks rather than having to worry about how to operate visualization tools on the interface. In the past two decades, leveraging advanced natural language processing technologies, numerous V-NLI systems have been developed in academic research and commercial software, especially in recent years. In this article, we conduct a comprehensive review of the existing V-NLIs. In order to classify each paper, we develop categorical dimensions based on a classic information visualization pipeline with the extension of a V-NLI layer. The following seven stages are used: query interpretation, data transformation, visual mapping, view transformation, human interaction, dialogue management, and presentation. Finally, we also shed light on several promising directions for future work in the V-NLI community. △ Less

Submitted 4 February, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: 20 pages, 15 figures, accepted by IEEE TVCG

arXiv:1803.10615 [pdf, other]

SqueezeNext: Hardware-Aware Neural Network Design

Authors: Amir Gholami, Kiseok Kwon, Bichen Wu, Zizheng Tai, Xiangyu Yue, Peter Jin, Sicheng Zhao, Kurt Keutzer

Abstract: One of the main barriers for deploying neural networks on embedded systems has been large memory and power consumption of existing neural networks. In this work, we introduce SqueezeNext, a new family of neural network architectures whose design was guided by considering previous architectures such as SqueezeNet, as well as by simulation results on a neural network accelerator. This new network is… ▽ More One of the main barriers for deploying neural networks on embedded systems has been large memory and power consumption of existing neural networks. In this work, we introduce SqueezeNext, a new family of neural network architectures whose design was guided by considering previous architectures such as SqueezeNet, as well as by simulation results on a neural network accelerator. This new network is able to match AlexNet's accuracy on the ImageNet benchmark with $112\times$ fewer parameters, and one of its deeper variants is able to achieve VGG-19 accuracy with only 4.4 Million parameters, ($31\times$ smaller than VGG-19). SqueezeNext also achieves better top-5 classification accuracy with $1.3\times$ fewer parameters as compared to MobileNet, but avoids using depthwise-separable convolutions that are inefficient on some mobile processor platforms. This wide range of accuracy gives the user the ability to make speed-accuracy tradeoffs, depending on the available resources on the target hardware. Using hardware simulation results for power and inference speed on an embedded system has guided us to design variations of the baseline model that are $2.59\times$/$8.26\times$ faster and $2.25\times$/$7.5\times$ more energy efficient as compared to SqueezeNet/AlexNet without any accuracy degradation. △ Less

Submitted 27 August, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

Comments: 12 Pages

Journal ref: Design Automation Conference 2018 (and CVPR 2018 workshop)

arXiv:1802.01775 [pdf]

Multi-cavity ultra-stable laser towards 10-18

Authors: Zhaoyang Tai, Lulu Yan, Yanyan Zhang, Pan Zhang, Xiaofei Zhang, Wenge Guo, Shougang Zhang, Haifeng Jiang

Abstract: In this letter, we demonstrate a technique of making an ultrastable laser referenced to a multi-cavity, corresponding to a lower thermal noise limit due to the larger equivalent beam size. The multi-cavity consists of several pairs of mirrors and a common spacer. We can stabilize the laser frequencies on these cavities, and average the laser frequencies with synthesizing technique. In comparison w… ▽ More In this letter, we demonstrate a technique of making an ultrastable laser referenced to a multi-cavity, corresponding to a lower thermal noise limit due to the larger equivalent beam size. The multi-cavity consists of several pairs of mirrors and a common spacer. We can stabilize the laser frequencies on these cavities, and average the laser frequencies with synthesizing technique. In comparison with a single cavity system, relative frequency instability of the synthesized laser can be improved by a factor of the squire root of the cavity number (n). In addition, we perform an experiment to simulate a two-cavity system. Experimental results show that frequency instability of the synthesized laser is improved by a factor of 1.4, and discrimination of the laser frequency instability, introduced by the process of lasers synthesizing, is negligible, and can reach a floor at low level 10-18 limited by noise of currently used signal generators. This technique is comparable with other techniques; thus, it can gain a factor of the squre root of n on the frequency instability of an ultrastable laser to an unprecedented level. △ Less

Submitted 5 February, 2018; originally announced February 2018.

Comments: 5 pages 4 figures

arXiv:1702.05865 [pdf, other]

Hemingway: Modeling Distributed Optimization Algorithms

Authors: Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez

Abstract: Distributed optimization algorithms are widely used in many industrial machine learning applications. However choosing the appropriate algorithm and cluster size is often difficult for users as the performance and convergence rate of optimization algorithms vary with the size of the cluster. In this paper we make the case for an ML-optimizer that can select the appropriate algorithm and cluster si… ▽ More Distributed optimization algorithms are widely used in many industrial machine learning applications. However choosing the appropriate algorithm and cluster size is often difficult for users as the performance and convergence rate of optimization algorithms vary with the size of the cluster. In this paper we make the case for an ML-optimizer that can select the appropriate algorithm and cluster size to use for a given problem. To do this we propose building two models: one that captures the system level characteristics of how computation, communication change as we increase cluster sizes and another that captures how convergence rates change with cluster sizes. We present preliminary results from our prototype implementation called Hemingway and discuss some of the challenges involved in developing such a system. △ Less

Submitted 20 February, 2017; originally announced February 2017.

Comments: Presented at ML Systems Workshop at NIPS, Dec 2016

Showing 1–10 of 10 results for author: Tai, Z