Showing 1–2 of 2 results for author: Sie, M

Search v0.5.6 released 2020-02-24

arXiv:2411.15425 [pdf, other]

quant-ph cs.CR

Efficient Bitcoin Address Classification Using Quantum-Inspired Feature Selection

Authors: Ming-Fong Sie, Yen-Jui Chang, Chien-Lung Lin, Ching-Ray Chang, Shih-Wei Liao

Abstract: Over 900 million Bitcoin transactions have been recorded, posing considerable challenges for machine learning in terms of computation time and maintaining prediction accuracy. We propose an innovative approach using quantum-inspired algorithms implemented with Simulated Annealing and Quantum Annealing to address the challenge of local minima in solution spaces. This method efficiently identifies k… ▽ More Over 900 million Bitcoin transactions have been recorded, posing considerable challenges for machine learning in terms of computation time and maintaining prediction accuracy. We propose an innovative approach using quantum-inspired algorithms implemented with Simulated Annealing and Quantum Annealing to address the challenge of local minima in solution spaces. This method efficiently identifies key features linked to mixer addresses, significantly reducing model training time. By categorizing Bitcoin addresses into six classes: exchanges, faucets, gambling, marketplaces, mixers, and mining pools, and applying supervised learning methods, our results demonstrate that feature selection with SA reduced training time by 30.3% compared to using all features in a random forest model while maintaining a 91% F1-score for mixer addresses. This highlights the potential of quantum-inspired algorithms to swiftly and accurately identify high-risk Bitcoin addresses based on transaction features. △ Less

Submitted 22 November, 2024; originally announced November 2024.

Comments: 19 pages
arXiv:2408.09777 [pdf, other]

cs.CL

Summarizing long regulatory documents with a multi-step pipeline

Authors: Mika Sie, Ruby Beek, Michiel Bots, Sjaak Brinkkemper, Albert Gatt

Abstract: Due to their length and complexity, long regulatory texts are challenging to summarize. To address this, a multi-step extractive-abstractive architecture is proposed to handle lengthy regulatory documents more effectively. In this paper, we show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies significantly depending on the model used. Specifically, th… ▽ More Due to their length and complexity, long regulatory texts are challenging to summarize. To address this, a multi-step extractive-abstractive architecture is proposed to handle lengthy regulatory documents more effectively. In this paper, we show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies significantly depending on the model used. Specifically, the two-step architecture improves the performance of decoder-only models. For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance. This research also highlights the challenges of evaluating generated texts, as evidenced by the differing results from human and automated evaluations. Most notably, human evaluations favoured language models pretrained on legal text, while automated metrics rank general-purpose language models higher. The results underscore the importance of selecting the appropriate summarization strategy based on model architecture and context length. △ Less

Submitted 14 October, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

Comments: Published in: Proceedings of the 6th Workshop on Natural Legal Language Processing (NLLP 2024)

Search v0.5.6 released 2020-02-24