-
Enhanced Outsourced and Secure Inference for Tall Sparse Decision Trees
Authors:
Andrew Quijano,
Spyros T. Halkidis,
Kevin Gallagher,
Kemal Akkaya,
Nikolaos Samaras
Abstract:
A decision tree is an easy-to-understand tool that has been widely used for classification tasks. On the one hand, due to privacy concerns, there has been an urgent need to create privacy-preserving classifiers that conceal the user's input from the classifier. On the other hand, with the rise of cloud computing, data owners are keen to reduce risk by outsourcing their model, but want security gua…
▽ More
A decision tree is an easy-to-understand tool that has been widely used for classification tasks. On the one hand, due to privacy concerns, there has been an urgent need to create privacy-preserving classifiers that conceal the user's input from the classifier. On the other hand, with the rise of cloud computing, data owners are keen to reduce risk by outsourcing their model, but want security guarantees that third parties cannot steal their decision tree model. To address these issues, Joye and Salehi introduced a theoretical protocol that efficiently evaluates decision trees while maintaining privacy by leveraging their comparison protocol that is resistant to timing attacks. However, their approach was not only inefficient but also prone to side-channel attacks. Therefore, in this paper, we propose a new decision tree inference protocol in which the model is shared and evaluated among multiple entities. We partition our decision tree model by each level to be stored in a new entity we refer to as a "level-site." Utilizing this approach, we were able to gain improved average run time for classifier evaluation for a non-complete tree, while also having strong mitigations against side-channel attacks.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
A Statistical Model of Word Rank Evolution
Authors:
Alex John Quijano,
Rick Dale,
Suzanne Sindi
Abstract:
The availability of large linguistic data sets enables data-driven approaches to study linguistic change. The Google Books corpus unigram frequency data set is used to investigate the word rank dynamics in eight languages. We observed the rank changes of the unigrams from 1900 to 2008 and compared it to a Wright-Fisher inspired model that we developed for our analysis. The model simulates a neutra…
▽ More
The availability of large linguistic data sets enables data-driven approaches to study linguistic change. The Google Books corpus unigram frequency data set is used to investigate the word rank dynamics in eight languages. We observed the rank changes of the unigrams from 1900 to 2008 and compared it to a Wright-Fisher inspired model that we developed for our analysis. The model simulates a neutral evolutionary process with the restriction of having no disappearing and added words. This work explains the mathematical framework of the model - written as a Markov Chain with multinomial transition probabilities - to show how frequencies of words change in time. From our observations in the data and our model, word rank stability shows two types of characteristics: (1) the increase/decrease in ranks are monotonic, or (2) the rank stays the same. Based on our model, high-ranked words tend to be more stable while low-ranked words tend to be more volatile. Some words change in ranks in two ways: (a) by an accumulation of small increasing/decreasing rank changes in time and (b) by shocks of increase/decrease in ranks. Most words in all of the languages we have looked at are rank stable, but not as stable as a neutral model would predict. The stopwords and Swadesh words are observed to be rank stable across eight languages indicating linguistic conformity in established languages. These signatures suggest unigram frequencies in all languages have changed in a manner inconsistent with a purely neutral evolutionary process.
△ Less
Submitted 14 February, 2022; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Radiation and Scattering of EM Waves in Large Plasmas Around Objects in Hypersonic Flight
Authors:
A. Scarabosio,
J. L. Araque Quijano,
J. Tobon,
M. Righero,
G. Giordanengo,
D. DAmbrosio,
L. Walpot,
G. Vecchi
Abstract:
Hypersonic flight regime is conventionally defined for Mach larger than 5; in these conditions, the flying object becomes enveloped in a plasma. This plasma is densest in thin surface layers, but in typical situations of interest it impacts electromagnetic wave propagation in an electrically large volume. We address this problem with a hybrid approach. We employ Equivalence Theorem to separate the…
▽ More
Hypersonic flight regime is conventionally defined for Mach larger than 5; in these conditions, the flying object becomes enveloped in a plasma. This plasma is densest in thin surface layers, but in typical situations of interest it impacts electromagnetic wave propagation in an electrically large volume. We address this problem with a hybrid approach. We employ Equivalence Theorem to separate the inhomogeneous plasma region from the surrounding free space via an equivalent (Huygens) surface, and the Eikonal approximation to Maxwell equations in the large inhomogeneous region for obtaining equivalent currents on the separating surface. Then, we obtain the scattered field via (exact) free space radiation of these surface equivalent currents. The method is extensively tested against reference results and then applied to a real-life re-entry vehicle with full 3D plasma computed via Computational Fluid Dynamic (CFD) simulations. We address both scattering (RCS) from the entire vehicle and radiation from the on-board antennas. From our results, significant radio link path losses can be associated with plasma spatial variations (gradients) and collisional losses, to an extent that matches well the usually perceived blackout in crossing layers in cutoff. Furthermore, we find good agreement with existing literature concerning significant alterations of the radar response (RCS) due to the plasma envelope.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Grid Search Hyperparameter Benchmarking of BERT, ALBERT, and LongFormer on DuoRC
Authors:
Alex John Quijano,
Sam Nguyen,
Juanita Ordonez
Abstract:
The purpose of this project is to evaluate three language models named BERT, ALBERT, and LongFormer on the Question Answering dataset called DuoRC. The language model task has two inputs, a question, and a context. The context is a paragraph or an entire document while the output is the answer based on the context. The goal is to perform grid search hyperparameter fine-tuning using DuoRC. Pretrain…
▽ More
The purpose of this project is to evaluate three language models named BERT, ALBERT, and LongFormer on the Question Answering dataset called DuoRC. The language model task has two inputs, a question, and a context. The context is a paragraph or an entire document while the output is the answer based on the context. The goal is to perform grid search hyperparameter fine-tuning using DuoRC. Pretrained weights of the models are taken from the Huggingface library. Different sets of hyperparameters are used to fine-tune the models using two versions of DuoRC which are the SelfRC and the ParaphraseRC. The results show that the ALBERT (pretrained using the SQuAD1 dataset) has an F1 score of 76.4 and an accuracy score of 68.52 after fine-tuning on the SelfRC dataset. The Longformer model (pretrained using the SQuAD and SelfRC datasets) has an F1 score of 52.58 and an accuracy score of 46.60 after fine-tuning on the ParaphraseRC dataset. The current results outperformed the results from the previous model by DuoRC.
△ Less
Submitted 29 March, 2021; v1 submitted 15 January, 2021;
originally announced January 2021.
-
Maximum Covering Subtrees for Phylogenetic Networks
Authors:
Nathan Davidov,
Amanda Hernandez,
Justin Jian,
Patrick McKenna,
K. A. Medlin,
Roadra Mojumder,
Megan Owen,
Andrew Quijano,
Amanda Rodriguez,
Katherine St. John,
Katherine Thai,
Meliza Uraga
Abstract:
Tree-based phylogenetic networks, which may be roughly defined as leaf-labeled networks built by adding arcs only between the original tree edges, have elegant properties for modeling evolutionary histories. We answer an open question of Francis, Semple, and Steel about the complexity of determining how far a phylogenetic network is from being tree-based, including non-binary phylogenetic networks…
▽ More
Tree-based phylogenetic networks, which may be roughly defined as leaf-labeled networks built by adding arcs only between the original tree edges, have elegant properties for modeling evolutionary histories. We answer an open question of Francis, Semple, and Steel about the complexity of determining how far a phylogenetic network is from being tree-based, including non-binary phylogenetic networks. We show that finding a phylogenetic tree covering the maximum number of nodes in a phylogenetic network can be be computed in polynomial time via an encoding into a minimum-cost maximum flow problem.
△ Less
Submitted 24 November, 2020; v1 submitted 25 September, 2020;
originally announced September 2020.
-
Server-side Fingerprint-Based Indoor Localization Using Encrypted Sorting
Authors:
Andrew Quijano,
Kemal Akkaya
Abstract:
GPS signals, the main origin of navigation, are not functional in indoor environments. Therefore, Wi-Fi access points have started to be increasingly used for localization and tracking inside the buildings by relying on a fingerprint-based approach. However, with these types of approaches, several concerns regarding the privacy of the users have arisen. Malicious individuals can determine a client…
▽ More
GPS signals, the main origin of navigation, are not functional in indoor environments. Therefore, Wi-Fi access points have started to be increasingly used for localization and tracking inside the buildings by relying on a fingerprint-based approach. However, with these types of approaches, several concerns regarding the privacy of the users have arisen. Malicious individuals can determine a client's daily habits and activities by simply analyzing their wireless signals. While there are already efforts to incorporate privacy into the existing fingerprint-based approaches, they are limited to the characteristics of the homomorphic cryptographic schemes they employed. In this paper, we propose to enhance the performance of these approaches by exploiting another homomorphic algorithm, namely DGK, with its unique encrypted sorting capability and thus pushing most of the computations to the server side. We developed an Android app and tested our system within a Columbia University dormitory. Compared to existing systems, the results indicated that more power savings can be achieved at the client side and DGK can be a viable option with more powerful server computation capabilities.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.