Search | arXiv e-print repository

arXiv:2504.19302 [pdf, other]

Wave Energy Is Conserved in a Spatially Varying and Inhomogeneously Moving Medium

Authors: Zhaohua Wu, Jie Sun, Zhe-Min Tan, Ming Cai, Yongyun Hu, Norden E. Huang

Abstract: Waves are propagating disturbances that redistribute energy across space. Previous studies have shown that for waves propagating through an inhomogeneously moving mean flow, the conserved quantity is wave action rather than wave energy, raising questions about the validity of energy conservation, which is one of the foundational principles of physics. In this study, we prove that wave action conse… ▽ More Waves are propagating disturbances that redistribute energy across space. Previous studies have shown that for waves propagating through an inhomogeneously moving mean flow, the conserved quantity is wave action rather than wave energy, raising questions about the validity of energy conservation, which is one of the foundational principles of physics. In this study, we prove that wave action conservation is, in fact, an apparent form of wave energy conservation in spatially varying and inhomogeneously moving media, where waves undergo deformation during propagation. We further show that wave action conservation can be derived directly from the law of energy conservation. This result holds universally across all isolated wave systems in varying media, including hydrodynamic and non-hydrodynamic waves. △ Less

Submitted 27 April, 2025; originally announced April 2025.

Comments: 25 pages, 5 figures

arXiv:2504.19213 [pdf, other]

Measurements of branching fractions of $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (693 additional authors not shown)

Abstract: Utilizing $7.9\,\rm fb^{-1}$ of $e^+e^-$ collision data taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, we report the measurements of absolute branching fractions of the hadronic decays $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$. The $D^0\to K^- 3π^+2π^-$ decay is measured with improved precision, while the latter two decays are observed w… ▽ More Utilizing $7.9\,\rm fb^{-1}$ of $e^+e^-$ collision data taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, we report the measurements of absolute branching fractions of the hadronic decays $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$. The $D^0\to K^- 3π^+2π^-$ decay is measured with improved precision, while the latter two decays are observed with statistical significance higher than $5σ$ for the first time. The absolute branching fractions of these decays are determined to be ${\mathcal B}(D^0\to K^- 3π^+2π^-)=( 1.35\pm 0.23\pm 0.08 )\times 10^{-4}$, ${\mathcal B}(D^0\to K^- 2π^+π^-2π^0)=( 19.0\pm 1.1\pm 1.5)\times 10^{-4}$, and ${\mathcal B}(D^+\to K^- 3π^+π^-π^0)=( 6.57\pm 0.69\pm 0.33)\times 10^{-4}$, where the first uncertainties are statistical and the second systematic. △ Less

Submitted 27 April, 2025; originally announced April 2025.

Comments: 12pages, 6 figures, 4 tables

Report number: BAM-00843

arXiv:2504.19099 [pdf, other]

VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction

Authors: Ning Wang, Bingkun Yao, Jie Zhou, Yuchen Hu, Xi Wang, Nan Guan, Zhe Jiang

Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in debugging for various programming languages. However, the application of LLMs to Verilog debugging remains insufficiently explored. Here, we present VeriDebug, an approach that integrates contrastive representation and guided correction capabilities for automated Verilog debugging. Unlike existing methods, VeriDebug employs an… ▽ More Large Language Models (LLMs) have demonstrated remarkable potential in debugging for various programming languages. However, the application of LLMs to Verilog debugging remains insufficiently explored. Here, we present VeriDebug, an approach that integrates contrastive representation and guided correction capabilities for automated Verilog debugging. Unlike existing methods, VeriDebug employs an embedding-based technique to accurately retrieve internal information, followed by bug-fixing. VeriDebug unifies Verilog bug detection and correction through a shared parameter space. By simultaneously learning bug patterns and fixes, it streamlines debugging via contrastive embedding and guided correction. Empirical results show the efficacy of VeriDebug in enhancing Verilog debugging. Our VeriDebugLoc, Type model achieves 64.7 accuracy in bug fixing (Acc1), a significant improvement from the existing open-source SOTAs 11.3. This performance not only outperforms open-source alternatives but also exceeds larger closed-source models like GPT-3.5-turbo (36.6), offering a more accurate alternative to conventional debugging methods. △ Less

Submitted 27 April, 2025; originally announced April 2025.

arXiv:2504.19087 [pdf, ps, other]

Search for $η_{1}(1855)$ in $χ_{cJ}\toηηη^{\prime}$ decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (697 additional authors not shown)

Abstract: Based on a sample of $2.7\times10^{9}$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, an analysis of the decay $ψ(3686)\toγχ_{cJ}, χ_{cJ}\toηηη^{\prime}$ is performed. The decay modes $χ_{c1}$ and $χ_{c2}\toηηη^{\prime}$ are observed for the first time, and their corresponding branching fractions are determined to be… ▽ More Based on a sample of $2.7\times10^{9}$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, an analysis of the decay $ψ(3686)\toγχ_{cJ}, χ_{cJ}\toηηη^{\prime}$ is performed. The decay modes $χ_{c1}$ and $χ_{c2}\toηηη^{\prime}$ are observed for the first time, and their corresponding branching fractions are determined to be $\mathcal{B}(χ_{c1}\toηηη^{\prime}) = (1.39 \pm 0.13(\text{stat.}) \pm 0.09(\text{sys.})) \times 10^{-4}$ and $\mathcal{B}(χ_{c2}\toηηη^{\prime}) = (4.42 \pm 0.86(\text{stat.}) \pm 0.37(\text{sys.})) \times 10^{-5}$. An upper limit on the branching fraction of $χ_{c0}\toηηη^{\prime}$ is set as $2.64 \times 10^{-5}$ at 90\% confidence level (CL). A partial wave analysis (PWA) of the decay $χ_{c1}\toηηη^{\prime}$ is performed to search for the $1^{-+}$ exotic state $η_1(1855)$. The PWA result indicates that the structure in the $ηη^{\prime}$ mass spectrum is mainly attributed to the $f_0(1500)$, while in the $ηη$ mass spectrum, it is primarily the $0^{++}$ phase space. The upper limit of $\mathcal{B}(χ_{c1}\toη_{1}(1855)η) \cdot \mathcal{B}(η_{1}(1855)\toηη^{\prime})< 9.79 \times 10^{-5}$ is set based on the PWA at 90\% CL. △ Less

Submitted 26 April, 2025; originally announced April 2025.

arXiv:2504.18926 [pdf, other]

Critical Non-Hermitian Edge Modes

Authors: Kunling Zhou, Zihe Yang, Bowen Zeng, Yong Hu

Abstract: We unveil a unique critical phenomenon of topological edge modes in non-Hermitian systems, dubbed the critical non-Hermitian edge modes (CNHEM). Specifically, in the thermodynamic limit, the eigenvectors of edge modes jump discontinuously under infinitesimal on-site staggered perturbations. The CNHEM arises from the competition between the introduced on-site staggered potentials and size-dependent… ▽ More We unveil a unique critical phenomenon of topological edge modes in non-Hermitian systems, dubbed the critical non-Hermitian edge modes (CNHEM). Specifically, in the thermodynamic limit, the eigenvectors of edge modes jump discontinuously under infinitesimal on-site staggered perturbations. The CNHEM arises from the competition between the introduced on-site staggered potentials and size-dependent non-reciprocal coupling between edge modes, and are closely connected to the exceptional point (EP). As the system size increases, the coupling between edge modes decreases while the non-reciprocity is enhanced, causing the eigenvectors to gradually collapse toward the EP. However, when the on-site potentials dominate, this weakened coupling assists the eigenvectors to stay away from the EP. Such a critical phenomenon is absent in Hermitian systems, where the coupling between edge modes is reciprocal. △ Less

Submitted 26 April, 2025; originally announced April 2025.

Comments: 5 pages, 2 figures

arXiv:2504.18509 [pdf, other]

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Authors: Shivam Duggal, Yushi Hu, Oscar Michel, Aniruddha Kembhavi, William T. Freeman, Noah A. Smith, Ranjay Krishna, Antonio Torralba, Ali Farhadi, Wei-Chiu Ma

Abstract: Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often ov… ▽ More Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal large language models for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models. △ Less

Submitted 25 April, 2025; originally announced April 2025.

Comments: CVPR 2025. Project page and codes: https://eval3d.github.io/

arXiv:2504.17789 [pdf, other]

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

Authors: Xu Ma, Peize Sun, Haoyu Ma, Hao Tang, Chih-Yao Ma, Jialiang Wang, Kunpeng Li, Xiaoliang Dai, Yujun Shi, Xuan Ju, Yushi Hu, Artsiom Sanakoyeu, Felix Juefei-Xu, Ji Hou, Junjiao Tian, Tao Xu, Tingbo Hou, Yen-Cheng Liu, Zecheng He, Zijian He, Matt Feiszli, Peizhao Zhang, Peter Vajda, Sam Tsai, Yun Fu

Abstract: Autoregressive (AR) models, long dominant in language generation, are increasingly applied to image synthesis but are often considered less competitive than Diffusion-based models. A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution. To address this, we present Token-Shuffle, a n… ▽ More Autoregressive (AR) models, long dominant in language generation, are increasingly applied to image synthesis but are often considered less competitive than Diffusion-based models. A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution. To address this, we present Token-Shuffle, a novel yet simple method that reduces the number of image tokens in Transformer. Our key insight is the dimensional redundancy of visual vocabularies in Multimodal Large Language Models (MLLMs), where low-dimensional visual codes from visual encoder are directly mapped to high-dimensional language vocabularies. Leveraging this, we consider two key operations: token-shuffle, which merges spatially local tokens along channel dimension to decrease the input token number, and token-unshuffle, which untangles the inferred tokens after Transformer blocks to restore the spatial arrangement for output. Jointly training with textual prompts, our strategy requires no additional pretrained text-encoder and enables MLLMs to support extremely high-resolution image synthesis in a unified next-token prediction way while maintaining efficient training and inference. For the first time, we push the boundary of AR text-to-image generation to a resolution of 2048x2048 with gratifying generation performance. In GenAI-benchmark, our 2.7B model achieves 0.77 overall score on hard prompts, outperforming AR models LlamaGen by 0.18 and diffusion models LDM by 0.15. Exhaustive large-scale human evaluations also demonstrate our prominent image generation ability in terms of text-alignment, visual flaw, and visual appearance. We hope that Token-Shuffle can serve as a foundational design for efficient high-resolution image generation within MLLMs. △ Less

Submitted 27 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

Comments: Project Page: https://ma-xu.github.io/token-shuffle/ Add related works

arXiv:2504.17705 [pdf, other]

LUIDA: Large-scale Unified Infrastructure for Digital Assessments based on Commercial Metaverse Platform

Authors: Yong-Hao Hu, Sotaro Yokoi, Yuji Hatada, Yuichi Hiroi, Takuji Narumi, Takefumi Hiraki

Abstract: Online experiments using metaverse platforms have gained significant traction in Human-Computer Interaction and Virtual Reality (VR) research. However, current research workflows are highly fragmented, as researchers must use separate tools for system implementation, participant recruitment, experiment execution, and data collection, reducing consistency and increasing workload. We present LUIDA (… ▽ More Online experiments using metaverse platforms have gained significant traction in Human-Computer Interaction and Virtual Reality (VR) research. However, current research workflows are highly fragmented, as researchers must use separate tools for system implementation, participant recruitment, experiment execution, and data collection, reducing consistency and increasing workload. We present LUIDA (Large-scale Unified Infrastructure for Digital Assessments), a metaverse-based framework that integrates these fragmented processes. LUIDA automatically allocates interconnected virtual environments for parallel experiment execution and provides implementation templates adaptable to various VR research domains, requiring minimal metaverse development expertise. Our evaluation included two studies using a prototype built on Cluster, the commercial metaverse platform. First, VR researchers using LUIDA to develop and run experiments reported high usability scores (SUS: 73.75) and moderate workload (NASA-TLX: 24.11) for overall usage, with interviews confirming streamlined workflows compared to traditional laboratory experiments. Second, we conducted three replicated experiments with public Cluster users, each recruiting approximately 200 participants within one week. These experiments produced results that closely matched the original studies, validating the experimental integrity of LUIDA across research domains. After technical refinements, we plan to release LUIDA as an open platform, providing a standardized protocol to improve research efficiency and experimental reproducibility in VR studies. △ Less

Submitted 24 April, 2025; originally announced April 2025.

arXiv:2504.17533 [pdf, other]

Relic gravitational waves from cosmological horizon radiation during de Sitter period: as zero-order approximation of inflation

Authors: Chen-Hao Wu, Xiao Liang, Ya-Peng Hu

Abstract: It is well known that the event horizon of the de Sitter universe can produce particles, and one can get sizable Hawking radiation by considering inflationary phases as de Sitter spacetimes with large Hubble rates. In this compact paper, we consider the graviton emission part of these radiations and assume that these graviton signals can exist in the current universe in the form of gravitational w… ▽ More It is well known that the event horizon of the de Sitter universe can produce particles, and one can get sizable Hawking radiation by considering inflationary phases as de Sitter spacetimes with large Hubble rates. In this compact paper, we consider the graviton emission part of these radiations and assume that these graviton signals can exist in the current universe in the form of gravitational waves. We predict an energy density parameter of $\log_{10}(Ω_{\rm GW} h^2) \sim \mathscr{O}(-25) - \mathscr{O}(-30)$ and its associated peak frequency $\log_{10}(f_{\rm peak}^0) \sim \mathscr{O}(6)-\mathscr{O}(5)$, depending on the reheating temperature. These signals occupy a frequency band below the ultrahigh-frequency regime and possess a detectable energy density, offering a promising target for future gravitational wave observatories. We believe that the detection of such signals would provide a compelling test of Hawking's radiation theory in a cosmological context. △ Less

Submitted 24 April, 2025; originally announced April 2025.

Comments: 7 pages, 2 figures

arXiv:2504.17034 [pdf, other]

An extremely soft and weak fast X-ray transient associated with a luminous supernova

Authors: W. -X. Li, Z. -P. Zhu, X. -Z. Zou, J. -J. Geng, L. -D. Liu, Y. -H. Wang, R. -Z. Li, D. Xu, H. Sun, X. -F. Wang, Y. -W. Yu, B. Zhang, X. -F. Wu, Y. Yang, A. V. Filippenko, X. -W. Liu, W. -M. Yuan, D. Aguado, J. An, T. An, D. A. H. Buckley, A. J. Castro-Tirado, S. -Y. Fu, J. P. U. Fynbo, D. A. Howell , et al. (80 additional authors not shown)

Abstract: Long gamma-ray bursts (LGRBs), including their subclasses of low-luminosity GRBs (LL-GRBs) and X-ray flashes (XRFs) characterized by low spectral peak energies, are known to be associated with broad-lined Type Ic supernovae (SNe Ic-BL), which result from the core collapse of massive stars that lose their outer hydrogen and helium envelopes. However, the soft and weak end of the GRB/XRF population… ▽ More Long gamma-ray bursts (LGRBs), including their subclasses of low-luminosity GRBs (LL-GRBs) and X-ray flashes (XRFs) characterized by low spectral peak energies, are known to be associated with broad-lined Type Ic supernovae (SNe Ic-BL), which result from the core collapse of massive stars that lose their outer hydrogen and helium envelopes. However, the soft and weak end of the GRB/XRF population remains largely unexplored, due to the limited sensitivity to soft X-ray emission. Here we report the discovery of a fast X-ray transient, EP250108a, detected by the Einstein Probe (EP) in the soft X-ray band at redshift $z = 0.176$, which was followed up by extensive multiband observations. EP250108a shares similar X-ray luminosity as XRF\,060218, the prototype of XRFs, but it extends GRBs/XRFs down to the unprecedentedly soft and weak regimes, with its $E_{\rm peak} \lesssim 1.8\,\mathrm{keV}$ and $E_{\rm iso} \lesssim 10^{49}\, \mathrm{erg}$, respectively. Meanwhile, EP250108a is found to be associated with SN\,2025kg, one of the most luminous and possibly magnetar-powered SNe Ic-BL detected so far. Modeling of the well-sampled optical light curves favors a mildly relativistic outflow as the origin of this event. This discovery demonstrates that EP, with its unique capability, is opening a new observational window into the diverse outcomes of death of massive stars. △ Less

Submitted 23 April, 2025; originally announced April 2025.

Comments: 54 pages, 10 figures, submitted

arXiv:2504.16789 [pdf, other]

MLOps Monitoring at Scale for Digital Platforms

Authors: Yu Jeffrey Hu, Jeroen Rombouts, Ines Wilms

Abstract: Machine learning models are widely recognized for their strong performance in forecasting. To keep that performance in streaming data settings, they have to be monitored and frequently re-trained. This can be done with machine learning operations (MLOps) techniques under supervision of an MLOps engineer. However, in digital platform settings where the number of data streams is typically large and… ▽ More Machine learning models are widely recognized for their strong performance in forecasting. To keep that performance in streaming data settings, they have to be monitored and frequently re-trained. This can be done with machine learning operations (MLOps) techniques under supervision of an MLOps engineer. However, in digital platform settings where the number of data streams is typically large and unstable, standard monitoring becomes either suboptimal or too labor intensive for the MLOps engineer. As a consequence, companies often fall back on very simple worse performing ML models without monitoring. We solve this problem by adopting a design science approach and introducing a new monitoring framework, the Machine Learning Monitoring Agent (MLMA), that is designed to work at scale for any ML model with reasonable labor cost. A key feature of our framework concerns test-based automated re-training based on a data-adaptive reference loss batch. The MLOps engineer is kept in the loop via key metrics and also acts, pro-actively or retrospectively, to maintain performance of the ML model in the production stage. We conduct a large-scale test at a last-mile delivery platform to empirically validate our monitoring framework. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2504.16214 [pdf, other]

Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis

Authors: Xiao Zhang, Yaoyao Ding, Yang Hu, Gennady Pekhimenko

Abstract: Deep learning (DL) workloads mainly run on accelerators like GPUs. Recent DL quantization techniques demand a new matrix multiplication operator with mixed input data types, further complicating GPU optimization. Prior high-level compilers like Triton lack the expressiveness to implement key optimizations like fine-grained data pipelines and hardware-friendly memory layouts for these operators, wh… ▽ More Deep learning (DL) workloads mainly run on accelerators like GPUs. Recent DL quantization techniques demand a new matrix multiplication operator with mixed input data types, further complicating GPU optimization. Prior high-level compilers like Triton lack the expressiveness to implement key optimizations like fine-grained data pipelines and hardware-friendly memory layouts for these operators, while low-level programming models, such as Hidet, Graphene, and CUTLASS, require significant programming efforts. To balance expressiveness with engineering effort, we propose Hexcute, a tile-based programming language that exposes shared memory and register abstractions to enable fine-grained optimization for these operators. Additionally, Hexcute leverages task mapping to schedule the GPU program, and to reduce programming efforts, it automates layout and task mapping synthesis with a novel type-inference-based algorithm. Our evaluation shows that Hexcute generalizes to a wide range of DL operators, achieves 1.7-11.28$\times$ speedup over existing DL compilers for mixed-type operators, and brings up to 2.91$\times$ speedup in the end-to-end evaluation. △ Less

Submitted 30 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

Comments: 17 pages, 24 figures

arXiv:2504.16074 [pdf, other]

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Authors: Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang , et al. (27 additional authors not shown)

Abstract: We introduce PHYBench, a novel, high-quality benchmark designed for evaluating reasoning capabilities of large language models (LLMs) in physical contexts. PHYBench consists of 500 meticulously curated physics problems based on real-world physical scenarios, designed to assess the ability of models to understand and reason about realistic physical processes. Covering mechanics, electromagnetism, t… ▽ More We introduce PHYBench, a novel, high-quality benchmark designed for evaluating reasoning capabilities of large language models (LLMs) in physical contexts. PHYBench consists of 500 meticulously curated physics problems based on real-world physical scenarios, designed to assess the ability of models to understand and reason about realistic physical processes. Covering mechanics, electromagnetism, thermodynamics, optics, modern physics, and advanced physics, the benchmark spans difficulty levels from high school exercises to undergraduate problems and Physics Olympiad challenges. Additionally, we propose the Expression Edit Distance (EED) Score, a novel evaluation metric based on the edit distance between mathematical expressions, which effectively captures differences in model reasoning processes and results beyond traditional binary scoring methods. We evaluate various LLMs on PHYBench and compare their performance with human experts. Our results reveal that even state-of-the-art reasoning models significantly lag behind human experts, highlighting their limitations and the need for improvement in complex physical reasoning scenarios. Our benchmark results and dataset are publicly available at https://phybench-official.github.io/phybench-demo/. △ Less