Search | arXiv e-print repository

Extending the Applicability of Bloom Filters by Relaxing their Parameter Constraints

Authors: Paul Walther, Wejdene Mansour, Martin Werner

Abstract: These days, Key-Value Stores are widely used for scalable data storage. In this environment, Bloom filters serve as an efficient probabilistic data structure for the representation of sets of keys as they allow for set membership queries with controllable false positive rates and no false negatives. For optimal error rates, the right choice of the main parameters, namely the length of the Bloom fi… ▽ More These days, Key-Value Stores are widely used for scalable data storage. In this environment, Bloom filters serve as an efficient probabilistic data structure for the representation of sets of keys as they allow for set membership queries with controllable false positive rates and no false negatives. For optimal error rates, the right choice of the main parameters, namely the length of the Bloom filter array, the number of hash functions used to map an element to the array's indices, and the number of elements to be inserted in one filter, is crucial. However, these parameters are constrained: The number of hash functions is bounded to integer values, and the length of a Bloom filter is usually chosen to be a power-of-two to allow for efficient modulo operations using binary arithmetics. These modulo calculations are necessary to map from the output universe of the applied universal hash functions, like Murmur, to the set of indices of the Bloom filter. In this paper, we relax these constraints by proposing the Rational Bloom filter, which allows for non-integer numbers of hash functions. This results in optimized fraction-of-zero values for a known number of elements to be inserted. Based on this, we construct the Variably-Sized Block Bloom filters to allow for a flexible filter length, especially for large filters, while keeping computation efficient. △ Less

Submitted 17 April, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

Comments: 18 pages, 7 figures

arXiv:2403.06149 [pdf, other]

Can Large Language Models Automatically Score Proficiency of Written Essays?

Authors: Watheq Mansour, Salam Albatarni, Sohaila Eltanbouly, Tamer Elsayed

Abstract: Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effe… ▽ More Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effectively score written essays. We experimented with two popular LLMs, namely ChatGPT and Llama. We aim to check if these models can do this task and, if so, how their performance is positioned among the state-of-the-art (SOTA) models across two levels, holistically and per individual writing trait. We utilized prompt-engineering tactics in designing four different prompts to bring their maximum potential to this task. Our experiments conducted on the ASAP dataset revealed several interesting observations. First, choosing the right prompt depends highly on the model and nature of the task. Second, the two LLMs exhibited comparable average performance in AES, with a slight advantage for ChatGPT. Finally, despite the performance gap between the two LLMs and SOTA models in terms of predictions, they provide feedback to enhance the quality of the essays, which can potentially help both teachers and students. △ Less

Submitted 15 April, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: V2 (published version of LREC-COLING 2024)

arXiv:2109.12987 [pdf, other]

Overview of the CLEF--2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News

Authors: Preslav Nakov, Giovanni Da San Martino, Tamer Elsayed, Alberto Barrón-Cedeño, Rubén Míguez, Shaden Shaar, Firoj Alam, Fatima Haouari, Maram Hasanain, Watheq Mansour, Bayan Hamdan, Zien Sheikh Ali, Nikolay Babulkov, Alex Nikolov, Gautam Kishore Shahi, Julia Maria Struß, Thomas Mandl, Mucahid Kutlu, Yavuz Selim Kartal

Abstract: We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 a… ▽ More We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 asks to determine whether a claim in a tweet can be verified using a set of previously fact-checked claims (in Arabic and English). Task 3 asks to predict the veracity of a news article and its topical domain (in English). The evaluation is based on mean average precision or precision at rank k for the ranking tasks, and macro-F1 for the classification tasks. This was the most popular CLEF-2021 lab in terms of team registrations: 132 teams. Nearly one-third of them participated: 15, 5, and 25 teams submitted official runs for tasks 1, 2, and 3, respectively. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: Check-Worthiness Estimation, Fact-Checking, Veracity, Evidence-based Verification, Detecting Previously Fact-Checked Claims, Social Media Verification, Computational Journalism, COVID-19

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

Journal ref: CLEF-2021

arXiv:2104.00411 [pdf, other]

Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features

Authors: Ashkan Khakzar, Yang Zhang, Wejdene Mansour, Yuezhi Cai, Yawei Li, Yucheng Zhang, Seong Tae Kim, Nassir Navab

Abstract: Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks' prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building o… ▽ More Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks' prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building on Information Bottleneck Attribution (IBA) method, for each prediction we identify the chest X-ray regions that have high mutual information with the network's output. Original IBA identifies input regions that have sufficient predictive information. We propose Inverse IBA to identify all informative regions. Thus all predictive cues for pathologies are highlighted on the X-rays, a desirable property for chest X-ray diagnosis. Moreover, we propose Regression IBA for explaining regression models. Using Regression IBA we observe that a model trained on cumulative severity score labels implicitly learns the severity of different X-ray regions. Finally, we propose Multi-layer IBA to generate higher resolution and more detailed attribution/saliency maps. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-independent feature importance metrics on NIH Chest X-ray8 and BrixIA datasets. The Code is publicly available. △ Less

Submitted 4 August, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: Accepted in MICCAI 2021 (Medical Image Computing and Computer Assisted Intervention 2021) ----- Project website: https://camp-explain-ai.github.io/CheXplain-IBA/

arXiv:2010.15450 [pdf]

doi 10.1109/TNS.2021.3086416

FPGA Based Real-Time Image Manipulation and Advanced Data Acquisition For 2D-XRAY Detectors

Authors: Wassim Mansour, Rattana Biv, Cyril Ponchut, Raphael Ponsard, Nicolas Janvier, Pablo Fajardo

Abstract: Scientific experiments rely on some type of measurements that provides the required data to extract aimed information or conclusions. Data production and analysis are therefore essential components at the heart of any scientific experimental application. Traditionally, efforts on detector development for photon sources have focused on the properties and performance of the detection front-ends. In… ▽ More Scientific experiments rely on some type of measurements that provides the required data to extract aimed information or conclusions. Data production and analysis are therefore essential components at the heart of any scientific experimental application. Traditionally, efforts on detector development for photon sources have focused on the properties and performance of the detection front-ends. In many cases, the data acquisition chain as well as data processing, are treated as a complementary component of the detector system and added at a late stage of the project. In most of the cases, data processing tasks are entrusted to CPUs; achieving thus the minimum bandwidth requirements and kept hardware relatively simple in term of functionalities. This also minimizes design effort, complexity and implementation cost. This approach is changing in the last years as it does not fit new high-performance detectors; FPGA and GPUs are now used to perform complex image manipulation tasks such as image reconstruction, image rotation, accumulation, filtering, data analysis and many others. This frees up CPUs for simpler tasks. The objective of this paper is to present both the implementation of real time FPGA-based image manipulation techniques, as well as, the performance of the ESRF data acquisition platform called RASHPA, into the back-end board of the SMARTPIX photon-counting detector developed at the ESRF. △ Less

Submitted 29 October, 2020; originally announced October 2020.

arXiv:1806.08939 [pdf]

doi 10.1109/TNS.2019.2904118

FPGA Implementation of RDMA-Based Data Acquisition System Over 100 GbE

Authors: Wassim Mansour, Nicolas Janvier, Pablo Fajardo

Abstract: This paper presents an RDMA over Ethernet protocol used for data acquisition systems, currently under development at the ESRF. The protocol is implemented on Xilinx Ultrascale + FPGAs thanks to the 100G hard MAC IP. The proposed protocol is fairly compared with the well-known RoCE-V2 protocol using a commercial network adapter from Mellanox. Obtained results show the superiority of the proposed al… ▽ More This paper presents an RDMA over Ethernet protocol used for data acquisition systems, currently under development at the ESRF. The protocol is implemented on Xilinx Ultrascale + FPGAs thanks to the 100G hard MAC IP. The proposed protocol is fairly compared with the well-known RoCE-V2 protocol using a commercial network adapter from Mellanox. Obtained results show the superiority of the proposed algorithm over RoCE-V2 in terms of data throughput. Performance tests on the 100G link show that it can reach a maximum stable link performance of 90 Gbps with minimum packets sizes greater than 1KB and 95Gbps for packet sizes greater than 32KB. △ Less

Submitted 23 June, 2018; originally announced June 2018.

Comments: 6 pages, 10 figures, real time conference 2018

arXiv:1806.08938 [pdf]

A Generic Data Acquisition Framework For High Performance 2D X-RAY Detectors

Authors: W. Mansour, N. Janvier, P. Fajardo

Abstract: This paper presents the design criteria and the current implementation of a generic and functionally rich data acquisition framework for high performance detectors called RASHPA. The framework is based on the use of RDMA mechanisms for optimized data transfer and supports multiple destinations and simultaneous transfer operations through multiple parallel data links. Although RASHPA is somehow agn… ▽ More This paper presents the design criteria and the current implementation of a generic and functionally rich data acquisition framework for high performance detectors called RASHPA. The framework is based on the use of RDMA mechanisms for optimized data transfer and supports multiple destinations and simultaneous transfer operations through multiple parallel data links. Although RASHPA is somehow agnostic in what respects to the type of detector and can deal with different types of data and metadata, it implements selection and dispatching rules that are optimized for the efficient manipulation and distribution of images. For all the previous reasons, the full potential of RASHPA comes up when implemented in very high data throughput modular 2D detectors as most of the advanced new area detectors that are in development for synchrotron and free-electron laser applications. △ Less

Submitted 23 June, 2018; originally announced June 2018.

Comments: 6 pages, 6 figures, Real Time conference 2018

arXiv:hep-ph/9912366 [pdf, ps, other]

doi 10.1103/PhysRevD.61.111301

Mass Hierarchies and the Seesaw Neutrino Mixing

Authors: T. K. Kuo, Guo-Hong Wu, Sadek W. Mansour

Abstract: We give a general analysis of neutrino mixing in the seesaw mechanism with three flavors. Assuming that the Dirac and u-quark mass matrices are similar, we establish simple relations between the neutrino parameters and individual Majorana masses. They are shown to depend rather strongly on the physical neutrino mixing angles. We calculate explicitly the implied Majorana mass hierarchies for para… ▽ More We give a general analysis of neutrino mixing in the seesaw mechanism with three flavors. Assuming that the Dirac and u-quark mass matrices are similar, we establish simple relations between the neutrino parameters and individual Majorana masses. They are shown to depend rather strongly on the physical neutrino mixing angles. We calculate explicitly the implied Majorana mass hierarchies for parameter sets corresponding to different solutions to the solar neutrino problem. △ Less

Submitted 20 March, 2000; v1 submitted 15 December, 1999; originally announced December 1999.

Comments: 11 pages, no figures, replaced with final version. Minor corrections and one typo corrected. Added one reference

Report number: PURD-TH-99-09

Journal ref: Phys.Rev.D61:111301,2000

arXiv:hep-ph/9907521 [pdf, ps, other]

doi 10.1016/S0370-2693(99)01162-4

Classification and Application of Triangular Quark Mass Matrices

Authors: T. K. Kuo, Sadek W. Mansour, Guo-Hong Wu

Abstract: The hierarchical structure in the quark masses and mixings allows its ten physical parameters to be most conveniently encoded in mass matrices of the upper triangular form. We classify these matrices in the hierarchical, minimal parameter basis where the mismatch between the weak and mass eigenstates involves only small mixing angles. Ten such pairs are obtained for the up and down quarks. This… ▽ More The hierarchical structure in the quark masses and mixings allows its ten physical parameters to be most conveniently encoded in mass matrices of the upper triangular form. We classify these matrices in the hierarchical, minimal parameter basis where the mismatch between the weak and mass eigenstates involves only small mixing angles. Ten such pairs are obtained for the up and down quarks. This analysis can be used to classify texture zeros of general mass matrices. For hermitian mass matrices with five texture zeros, this method yields immediately five pairs of textures with simple, analytic predictions for the quark mixings. Comparison with data indicates that, of the five pairs, three are disfavored, one is marginally acceptable, while the fifth fits well. △ Less

Submitted 4 October, 1999; v1 submitted 27 July, 1999; originally announced July 1999.

Comments: 18 pages, ReVTeX

Report number: PURD-TH-99-06

Journal ref: Phys.Lett.B467:116-125,1999

arXiv:hep-ph/9907314 [pdf, ps, other]

doi 10.1103/PhysRevD.60.093004

Triangular Textures for Quark Mass Matrices

Authors: T. K. Kuo, Sadek W. Mansour, Guo-Hong Wu

Abstract: The hierarchical quark masses and small mixing angles are shown to lead to a simple triangular form for the U- and D-type quark mass matrices. In the basis where one of the matrices is diagonal, each matrix element of the other is, to a good approximation, the product of a quark mass and a CKM matrix element. The physical content of a general mass matrix can be easily deciphered in its triangula… ▽ More The hierarchical quark masses and small mixing angles are shown to lead to a simple triangular form for the U- and D-type quark mass matrices. In the basis where one of the matrices is diagonal, each matrix element of the other is, to a good approximation, the product of a quark mass and a CKM matrix element. The physical content of a general mass matrix can be easily deciphered in its triangular form. This parameterization could serve as a useful starting point for model building. Examples of mass textures are analyzed using this method. △ Less

Submitted 12 July, 1999; originally announced July 1999.

Comments: 10 pages, no figures

Report number: PURD-TH-99-05

Journal ref: Phys.Rev. D60 (1999) 093004

arXiv:hep-ph/9810510 [pdf, ps, other]

doi 10.1103/PhysRevD.60.097301

Solar Neutrinos and the Violation of Equivalence Principle

Authors: S. W. Mansour, T. K. Kuo

Abstract: In this Brief Report, a non-standard solution to the solar neutrino problem is revisited. This solution assumes that neutrino flavors could have different couplings to gravity, hence, the equivalence principle is violated in this mechanism. The gravity induced mixing has the potential of accounting for the current solar neutrino data from several experiments even for massless neutrinos. We fit t… ▽ More In this Brief Report, a non-standard solution to the solar neutrino problem is revisited. This solution assumes that neutrino flavors could have different couplings to gravity, hence, the equivalence principle is violated in this mechanism. The gravity induced mixing has the potential of accounting for the current solar neutrino data from several experiments even for massless neutrinos. We fit this solution to the total rate of neutrino events in the SuperKamiokande detector together with the total rate from other detectors and also with the most recent results of the SuperKamiokande results for the recoil-electron spectrum. △ Less

Submitted 28 October, 1998; originally announced October 1998.

Comments: 6 pages, 4 figures, submitted to Phys.Rev.D

Journal ref: Phys. Rev. D 60, 097301 (1999)

arXiv:hep-ph/9711424 [pdf, ps, other]

doi 10.1103/PhysRevD.58.013012

Supernova neutrinos in the light of FCNC

Authors: S. W. Mansour, T. K. Kuo

Abstract: We study the effect of including flavor changing neutral currents (FCNC) in the analysis of the neutrino signal of a supernova burst. When we include the effect of the FCNC which are beyond the standard model (SM) in the study of the MSW resonant conversion, we obtain dramatic changes in the Δm^2-sin^2(2θ) probability contours for neutrino detection. We study the effect of including flavor changing neutral currents (FCNC) in the analysis of the neutrino signal of a supernova burst. When we include the effect of the FCNC which are beyond the standard model (SM) in the study of the MSW resonant conversion, we obtain dramatic changes in the Δm^2-sin^2(2θ) probability contours for neutrino detection. △ Less

Submitted 16 February, 1998; v1 submitted 21 November, 1997; originally announced November 1997.

Comments: 8 pages in ReVTeX,3 figures. Revised manuscript submitted to Phys. Rev. D

Report number: PURD-TH-97-09

Journal ref: Phys.Rev.D58:013012,1998

Showing 1–12 of 12 results for author: Mansour, W