Search | arXiv e-print repository

arXiv:2502.14748 [pdf, ps, other]

Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of Topic Models

Authors: Zongxia Li, Lorena Calvo-Bartolomé, Alexander Hoyle, Paiheng Xu, Alden Dima, Juan Francisco Fung, Jordan Boyd-Graber

Abstract: A common use of NLP is to facilitate the understanding of large document collections, with a shift from using traditional topic models to Large Language Models. Yet the effectiveness of using LLM for large corpus understanding in real-world applications remains under-explored. This study measures the knowledge users acquire with unsupervised, supervised LLM-based exploratory approaches or traditio… ▽ More A common use of NLP is to facilitate the understanding of large document collections, with a shift from using traditional topic models to Large Language Models. Yet the effectiveness of using LLM for large corpus understanding in real-world applications remains under-explored. This study measures the knowledge users acquire with unsupervised, supervised LLM-based exploratory approaches or traditional topic models on two datasets. While LLM-based methods generate more human-readable topics and show higher average win probabilities than traditional models for data exploration, they produce overly generic topics for domain-specific datasets that do not easily allow users to learn much about the documents. Adding human supervision to the LLM generation process improves data exploration by mitigating hallucination and over-genericity but requires greater human effort. In contrast, traditional. models like Latent Dirichlet Allocation (LDA) remain effective for exploration but are less user-friendly. We show that LLMs struggle to describe the haystack of large corpora without human help, particularly domain-specific data, and face scaling and hallucination limitations due to context length constraints. △ Less

Submitted 4 June, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

Comments: 22 Pages. LLM for Data Exploration and content analysis, Topic Models. 63rd Annual Meeting of the Association for Computational Linguistics (2025)

arXiv:2311.08038 [pdf, other]

Linking QKD testbeds across Europe

Authors: Max Brauer, Rafael J. Vicente, Jaime S. Buruaga, Ruben B. Mendez, Ralf-Peter Braun, Marc Geitz, Piotr Rydlichkowski, Hans H. Brunner, Fred Fung, Momtchil Peev, Antonio Pastor, Diego Lopez, Vicente Martin, Juan P. Brito

Abstract: Quantum-key-distribution (QKD) networks are gaining importance and it has become necessary to analyze the most appropriate methods for their long-distance interconnection. In this paper, four different methods of interconnecting remote QKD networks are proposed. The methods are used to link three different QKD testbeds in Europe, located in Berlin, Madrid, and Poznan. Although long-distance QKD li… ▽ More Quantum-key-distribution (QKD) networks are gaining importance and it has become necessary to analyze the most appropriate methods for their long-distance interconnection. In this paper, four different methods of interconnecting remote QKD networks are proposed. The methods are used to link three different QKD testbeds in Europe, located in Berlin, Madrid, and Poznan. Although long-distance QKD links are only emulated, the used methods can serve as a blueprint for a secure interconnection of distant QKD networks in the future. Specifically, the presented approaches combine, in a transparent way, different fiber and satellite physical media, as well as common standards of key-delivery interfaces. The testbed interconnections are designed to increase the security by utilizing multipath techniques and multiple hybridizations of QKD and post quantum cryptography (PQC) algorithms. △ Less

Submitted 10 January, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2011.02600 [pdf, other]

Upwind summation by parts finite difference methods for large scale elastic wave simulations in 3D complex geometries

Authors: Kenneth Duru, Frederick Fung, Christopher Williams

Abstract: High-order accurate summation-by-parts (SBP) finite difference (FD) methods constitute efficient numerical methods for simulating large-scale hyperbolic wave propagation problems. Traditional SBP FD operators that approximate first-order spatial derivatives with central-difference stencils often have spurious unresolved numerical wave-modes in their computed solutions. Recently derived high order… ▽ More High-order accurate summation-by-parts (SBP) finite difference (FD) methods constitute efficient numerical methods for simulating large-scale hyperbolic wave propagation problems. Traditional SBP FD operators that approximate first-order spatial derivatives with central-difference stencils often have spurious unresolved numerical wave-modes in their computed solutions. Recently derived high order accurate upwind SBP operators based upwind FD stencils have the potential to suppress these poisonous spurious wave-modes on marginally resolved computational grids. In this paper, we demonstrate that not all high order upwind SBP FD operators are applicable. Numerical dispersion relation analysis shows that odd-order upwind SBP FD operators also support spurious unresolved high-frequencies on marginally resolved meshes. Meanwhile, even-order upwind SBP FD operators (of order 2, 4, 6) do not support spurious unresolved high frequency wave modes and also have better numerical dispersion properties. We discretise the three space dimensional (3D) elastic wave equation on boundary-conforming curvilinear meshes. Using the energy method we prove that the semi-discrete approximation is stable and energy-conserving. We derive a priori error estimate and prove the convergence of the numerical error. Numerical experiments for the 3D elastic wave equation in complex geometries corroborate the theoretical analysis. Numerical simulations of the 3D elastic wave equation in heterogeneous media with complex non-planar free surface topography are given, including numerical simulations of community developed seismological benchmark problems. Computational results show that even-order upwind SBP FD operators are more efficient, robust and less prone to numerical dispersion errors on marginally resolved meshes when compared to the odd-order upwind and traditional SBP FD operators. △ Less

Submitted 24 July, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

arXiv:2010.13342 [pdf, other]

Resiliency in Numerical Algorithm Design for Extreme Scale Simulations

Authors: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik Goeddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Orti , et al. (11 additional authors not shown)

Abstract: This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors. Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to backgr… ▽ More This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors. Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 45 pages, 3 figures, submitted to The International Journal of High Performance Computing Applications

ACM Class: D.4.5; G.4; G.1; D.4.4

arXiv:1405.0198

No Superluminal Signaling Implies Unconditionally Secure Bit Commitment

Authors: H. F. Chau, C. -H. Fred Fung, H. -K. Lo

Abstract: Bit commitment (BC) is an important cryptographic primitive for an agent to convince a mutually mistrustful party that she has already made a binding choice of 0 or 1 but only to reveal her choice at a later time. Ideally, a BC protocol should be simple, reliable, easy to implement using existing technologies, and most importantly unconditionally secure in the sense that its security is based on a… ▽ More Bit commitment (BC) is an important cryptographic primitive for an agent to convince a mutually mistrustful party that she has already made a binding choice of 0 or 1 but only to reveal her choice at a later time. Ideally, a BC protocol should be simple, reliable, easy to implement using existing technologies, and most importantly unconditionally secure in the sense that its security is based on an information-theoretic proof rather than computational complexity assumption or the existence of a trustworthy arbitrator. Here we report such a provably secure scheme involving only one-way classical communications whose unconditional security is based on no superluminal signaling (NSS). Our scheme is inspired by the earlier works by Kent, who proposed two impractical relativistic protocols whose unconditional securities are yet to be established as well as several provably unconditionally secure protocols which rely on both quantum mechanics and NSS. Our scheme is conceptually simple and shows for the first time that quantum communication is not needed to achieve unconditional security for BC. Moreover, with purely classical communications, our scheme is practical and easy to implement with existing telecom technologies. This completes the cycle of study of unconditionally secure bit commitment based on known physical laws. △ Less

Submitted 18 November, 2014; v1 submitted 1 May, 2014; originally announced May 2014.

Comments: This paper has been withdrawn by the authors due to a crucial oversight on an earlier work by A. Kent

arXiv:quant-ph/0601115 [pdf, ps, other]

doi 10.1103/PhysRevA.75.032314

Phase-Remapping Attack in Practical Quantum Key Distribution Systems

Authors: Chi-Hang Fred Fung, Bing Qi, Kiyoshi Tamaki, Hoi-Kwong Lo

Abstract: Quantum key distribution (QKD) can be used to generate secret keys between two distant parties. Even though QKD has been proven unconditionally secure against eavesdroppers with unlimited computation power, practical implementations of QKD may contain loopholes that may lead to the generated secret keys being compromised. In this paper, we propose a phase-remapping attack targeting two practical… ▽ More Quantum key distribution (QKD) can be used to generate secret keys between two distant parties. Even though QKD has been proven unconditionally secure against eavesdroppers with unlimited computation power, practical implementations of QKD may contain loopholes that may lead to the generated secret keys being compromised. In this paper, we propose a phase-remapping attack targeting two practical bidirectional QKD systems (the "plug & play" system and the Sagnac system). We showed that if the users of the systems are unaware of our attack, the final key shared between them can be compromised in some situations. Specifically, we showed that, in the case of the Bennett-Brassard 1984 (BB84) protocol with ideal single-photon sources, when the quantum bit error rate (QBER) is between 14.6% and 20%, our attack renders the final key insecure, whereas the same range of QBER values has been proved secure if the two users are unaware of our attack; also, we demonstrated three situations with realistic devices where positive key rates are obtained without the consideration of Trojan horse attacks but in fact no key can be distilled. We remark that our attack is feasible with only current technology. Therefore, it is very important to be aware of our attack in order to ensure absolute security. In finding our attack, we minimize the QBER over individual measurements described by a general POVM, which has some similarity with the standard quantum state discrimination problem. △ Less

Submitted 5 March, 2007; v1 submitted 17 January, 2006; originally announced January 2006.

Comments: 13 pages, 8 figures

Journal ref: Phys. Rev. A 75, 032314 (2007)

Showing 1–6 of 6 results for author: Fung, F