-
Extending the Applicability of Bloom Filters by Relaxing their Parameter Constraints
Authors:
Paul Walther,
Wejdene Mansour,
Martin Werner
Abstract:
These days, Key-Value Stores are widely used for scalable data storage. In this environment, Bloom filters serve as an efficient probabilistic data structure for the representation of sets of keys as they allow for set membership queries with controllable false positive rates and no false negatives. For optimal error rates, the right choice of the main parameters, namely the length of the Bloom fi…
▽ More
These days, Key-Value Stores are widely used for scalable data storage. In this environment, Bloom filters serve as an efficient probabilistic data structure for the representation of sets of keys as they allow for set membership queries with controllable false positive rates and no false negatives. For optimal error rates, the right choice of the main parameters, namely the length of the Bloom filter array, the number of hash functions used to map an element to the array's indices, and the number of elements to be inserted in one filter, is crucial. However, these parameters are constrained: The number of hash functions is bounded to integer values, and the length of a Bloom filter is usually chosen to be a power-of-two to allow for efficient modulo operations using binary arithmetics. These modulo calculations are necessary to map from the output universe of the applied universal hash functions, like Murmur, to the set of indices of the Bloom filter. In this paper, we relax these constraints by proposing the Rational Bloom filter, which allows for non-integer numbers of hash functions. This results in optimized fraction-of-zero values for a known number of elements to be inserted. Based on this, we construct the Variably-Sized Block Bloom filters to allow for a flexible filter length, especially for large filters, while keeping computation efficient.
△ Less
Submitted 17 April, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Can Large Language Models Automatically Score Proficiency of Written Essays?
Authors:
Watheq Mansour,
Salam Albatarni,
Sohaila Eltanbouly,
Tamer Elsayed
Abstract:
Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effe…
▽ More
Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effectively score written essays. We experimented with two popular LLMs, namely ChatGPT and Llama. We aim to check if these models can do this task and, if so, how their performance is positioned among the state-of-the-art (SOTA) models across two levels, holistically and per individual writing trait. We utilized prompt-engineering tactics in designing four different prompts to bring their maximum potential to this task. Our experiments conducted on the ASAP dataset revealed several interesting observations. First, choosing the right prompt depends highly on the model and nature of the task. Second, the two LLMs exhibited comparable average performance in AES, with a slight advantage for ChatGPT. Finally, despite the performance gap between the two LLMs and SOTA models in terms of predictions, they provide feedback to enhance the quality of the essays, which can potentially help both teachers and students.
△ Less
Submitted 15 April, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Overview of the CLEF--2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News
Authors:
Preslav Nakov,
Giovanni Da San Martino,
Tamer Elsayed,
Alberto Barrón-Cedeño,
Rubén Míguez,
Shaden Shaar,
Firoj Alam,
Fatima Haouari,
Maram Hasanain,
Watheq Mansour,
Bayan Hamdan,
Zien Sheikh Ali,
Nikolay Babulkov,
Alex Nikolov,
Gautam Kishore Shahi,
Julia Maria Struß,
Thomas Mandl,
Mucahid Kutlu,
Yavuz Selim Kartal
Abstract:
We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 a…
▽ More
We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 asks to determine whether a claim in a tweet can be verified using a set of previously fact-checked claims (in Arabic and English). Task 3 asks to predict the veracity of a news article and its topical domain (in English). The evaluation is based on mean average precision or precision at rank k for the ranking tasks, and macro-F1 for the classification tasks. This was the most popular CLEF-2021 lab in terms of team registrations: 132 teams. Nearly one-third of them participated: 15, 5, and 25 teams submitted official runs for tasks 1, 2, and 3, respectively.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features
Authors:
Ashkan Khakzar,
Yang Zhang,
Wejdene Mansour,
Yuezhi Cai,
Yawei Li,
Yucheng Zhang,
Seong Tae Kim,
Nassir Navab
Abstract:
Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks' prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building o…
▽ More
Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks' prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building on Information Bottleneck Attribution (IBA) method, for each prediction we identify the chest X-ray regions that have high mutual information with the network's output. Original IBA identifies input regions that have sufficient predictive information. We propose Inverse IBA to identify all informative regions. Thus all predictive cues for pathologies are highlighted on the X-rays, a desirable property for chest X-ray diagnosis. Moreover, we propose Regression IBA for explaining regression models. Using Regression IBA we observe that a model trained on cumulative severity score labels implicitly learns the severity of different X-ray regions. Finally, we propose Multi-layer IBA to generate higher resolution and more detailed attribution/saliency maps. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-independent feature importance metrics on NIH Chest X-ray8 and BrixIA datasets. The Code is publicly available.
△ Less
Submitted 4 August, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
FPGA Based Real-Time Image Manipulation and Advanced Data Acquisition For 2D-XRAY Detectors
Authors:
Wassim Mansour,
Rattana Biv,
Cyril Ponchut,
Raphael Ponsard,
Nicolas Janvier,
Pablo Fajardo
Abstract:
Scientific experiments rely on some type of measurements that provides the required data to extract aimed information or conclusions. Data production and analysis are therefore essential components at the heart of any scientific experimental application. Traditionally, efforts on detector development for photon sources have focused on the properties and performance of the detection front-ends. In…
▽ More
Scientific experiments rely on some type of measurements that provides the required data to extract aimed information or conclusions. Data production and analysis are therefore essential components at the heart of any scientific experimental application. Traditionally, efforts on detector development for photon sources have focused on the properties and performance of the detection front-ends. In many cases, the data acquisition chain as well as data processing, are treated as a complementary component of the detector system and added at a late stage of the project. In most of the cases, data processing tasks are entrusted to CPUs; achieving thus the minimum bandwidth requirements and kept hardware relatively simple in term of functionalities. This also minimizes design effort, complexity and implementation cost. This approach is changing in the last years as it does not fit new high-performance detectors; FPGA and GPUs are now used to perform complex image manipulation tasks such as image reconstruction, image rotation, accumulation, filtering, data analysis and many others. This frees up CPUs for simpler tasks. The objective of this paper is to present both the implementation of real time FPGA-based image manipulation techniques, as well as, the performance of the ESRF data acquisition platform called RASHPA, into the back-end board of the SMARTPIX photon-counting detector developed at the ESRF.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
FPGA Implementation of RDMA-Based Data Acquisition System Over 100 GbE
Authors:
Wassim Mansour,
Nicolas Janvier,
Pablo Fajardo
Abstract:
This paper presents an RDMA over Ethernet protocol used for data acquisition systems, currently under development at the ESRF. The protocol is implemented on Xilinx Ultrascale + FPGAs thanks to the 100G hard MAC IP. The proposed protocol is fairly compared with the well-known RoCE-V2 protocol using a commercial network adapter from Mellanox. Obtained results show the superiority of the proposed al…
▽ More
This paper presents an RDMA over Ethernet protocol used for data acquisition systems, currently under development at the ESRF. The protocol is implemented on Xilinx Ultrascale + FPGAs thanks to the 100G hard MAC IP. The proposed protocol is fairly compared with the well-known RoCE-V2 protocol using a commercial network adapter from Mellanox. Obtained results show the superiority of the proposed algorithm over RoCE-V2 in terms of data throughput. Performance tests on the 100G link show that it can reach a maximum stable link performance of 90 Gbps with minimum packets sizes greater than 1KB and 95Gbps for packet sizes greater than 32KB.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.
-
A Generic Data Acquisition Framework For High Performance 2D X-RAY Detectors
Authors:
W. Mansour,
N. Janvier,
P. Fajardo
Abstract:
This paper presents the design criteria and the current implementation of a generic and functionally rich data acquisition framework for high performance detectors called RASHPA. The framework is based on the use of RDMA mechanisms for optimized data transfer and supports multiple destinations and simultaneous transfer operations through multiple parallel data links. Although RASHPA is somehow agn…
▽ More
This paper presents the design criteria and the current implementation of a generic and functionally rich data acquisition framework for high performance detectors called RASHPA. The framework is based on the use of RDMA mechanisms for optimized data transfer and supports multiple destinations and simultaneous transfer operations through multiple parallel data links. Although RASHPA is somehow agnostic in what respects to the type of detector and can deal with different types of data and metadata, it implements selection and dispatching rules that are optimized for the efficient manipulation and distribution of images. For all the previous reasons, the full potential of RASHPA comes up when implemented in very high data throughput modular 2D detectors as most of the advanced new area detectors that are in development for synchrotron and free-electron laser applications.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.
-
Mass Hierarchies and the Seesaw Neutrino Mixing
Authors:
T. K. Kuo,
Guo-Hong Wu,
Sadek W. Mansour
Abstract:
We give a general analysis of neutrino mixing in the seesaw mechanism with three flavors. Assuming that the Dirac and u-quark mass matrices are similar, we establish simple relations between the neutrino parameters and individual Majorana masses. They are shown to depend rather strongly on the physical neutrino mixing angles. We calculate explicitly the implied Majorana mass hierarchies for para…
▽ More
We give a general analysis of neutrino mixing in the seesaw mechanism with three flavors. Assuming that the Dirac and u-quark mass matrices are similar, we establish simple relations between the neutrino parameters and individual Majorana masses. They are shown to depend rather strongly on the physical neutrino mixing angles. We calculate explicitly the implied Majorana mass hierarchies for parameter sets corresponding to different solutions to the solar neutrino problem.
△ Less
Submitted 20 March, 2000; v1 submitted 15 December, 1999;
originally announced December 1999.
-
Classification and Application of Triangular Quark Mass Matrices
Authors:
T. K. Kuo,
Sadek W. Mansour,
Guo-Hong Wu
Abstract:
The hierarchical structure in the quark masses and mixings allows its ten physical parameters to be most conveniently encoded in mass matrices of the upper triangular form. We classify these matrices in the hierarchical, minimal parameter basis where the mismatch between the weak and mass eigenstates involves only small mixing angles. Ten such pairs are obtained for the up and down quarks. This…
▽ More
The hierarchical structure in the quark masses and mixings allows its ten physical parameters to be most conveniently encoded in mass matrices of the upper triangular form. We classify these matrices in the hierarchical, minimal parameter basis where the mismatch between the weak and mass eigenstates involves only small mixing angles. Ten such pairs are obtained for the up and down quarks. This analysis can be used to classify texture zeros of general mass matrices. For hermitian mass matrices with five texture zeros, this method yields immediately five pairs of textures with simple, analytic predictions for the quark mixings. Comparison with data indicates that, of the five pairs, three are disfavored, one is marginally acceptable, while the fifth fits well.
△ Less
Submitted 4 October, 1999; v1 submitted 27 July, 1999;
originally announced July 1999.
-
Triangular Textures for Quark Mass Matrices
Authors:
T. K. Kuo,
Sadek W. Mansour,
Guo-Hong Wu
Abstract:
The hierarchical quark masses and small mixing angles are shown to lead to a simple triangular form for the U- and D-type quark mass matrices. In the basis where one of the matrices is diagonal, each matrix element of the other is, to a good approximation, the product of a quark mass and a CKM matrix element. The physical content of a general mass matrix can be easily deciphered in its triangula…
▽ More
The hierarchical quark masses and small mixing angles are shown to lead to a simple triangular form for the U- and D-type quark mass matrices. In the basis where one of the matrices is diagonal, each matrix element of the other is, to a good approximation, the product of a quark mass and a CKM matrix element. The physical content of a general mass matrix can be easily deciphered in its triangular form. This parameterization could serve as a useful starting point for model building. Examples of mass textures are analyzed using this method.
△ Less
Submitted 12 July, 1999;
originally announced July 1999.
-
Solar Neutrinos and the Violation of Equivalence Principle
Authors:
S. W. Mansour,
T. K. Kuo
Abstract:
In this Brief Report, a non-standard solution to the solar neutrino problem is revisited. This solution assumes that neutrino flavors could have different couplings to gravity, hence, the equivalence principle is violated in this mechanism. The gravity induced mixing has the potential of accounting for the current solar neutrino data from several experiments even for massless neutrinos. We fit t…
▽ More
In this Brief Report, a non-standard solution to the solar neutrino problem is revisited. This solution assumes that neutrino flavors could have different couplings to gravity, hence, the equivalence principle is violated in this mechanism. The gravity induced mixing has the potential of accounting for the current solar neutrino data from several experiments even for massless neutrinos. We fit this solution to the total rate of neutrino events in the SuperKamiokande detector together with the total rate from other detectors and also with the most recent results of the SuperKamiokande results for the recoil-electron spectrum.
△ Less
Submitted 28 October, 1998;
originally announced October 1998.
-
Supernova neutrinos in the light of FCNC
Authors:
S. W. Mansour,
T. K. Kuo
Abstract:
We study the effect of including flavor changing neutral currents (FCNC) in the analysis of the neutrino signal of a supernova burst. When we include the effect of the FCNC which are beyond the standard model (SM) in the study of the MSW resonant conversion, we obtain dramatic changes in the Δm^2-sin^2(2θ) probability contours for neutrino detection.
We study the effect of including flavor changing neutral currents (FCNC) in the analysis of the neutrino signal of a supernova burst. When we include the effect of the FCNC which are beyond the standard model (SM) in the study of the MSW resonant conversion, we obtain dramatic changes in the Δm^2-sin^2(2θ) probability contours for neutrino detection.
△ Less
Submitted 16 February, 1998; v1 submitted 21 November, 1997;
originally announced November 1997.