-
Literature-based Discovery for Landscape Planning
Authors:
David Marasco,
Ilya Tyagin,
Justin Sybrandt,
James H. Spencer,
Ilya Safro
Abstract:
This project demonstrates how medical corpus hypothesis generation, a knowledge discovery field of AI, can be used to derive new research angles for landscape and urban planners. The hypothesis generation approach herein consists of a combination of deep learning with topic modeling, a probabilistic approach to natural language analysis that scans aggregated research databases for words that can b…
▽ More
This project demonstrates how medical corpus hypothesis generation, a knowledge discovery field of AI, can be used to derive new research angles for landscape and urban planners. The hypothesis generation approach herein consists of a combination of deep learning with topic modeling, a probabilistic approach to natural language analysis that scans aggregated research databases for words that can be grouped together based on their subject matter commonalities; the word groups accordingly form topics that can provide implicit connections between two general research terms. The hypothesis generation system AGATHA was used to identify likely conceptual relationships between emerging infectious diseases (EIDs) and deforestation, with the objective of providing landscape planners guidelines for productive research directions to help them formulate research hypotheses centered on deforestation and EIDs that will contribute to the broader health field that asserts causal roles of landscape-level issues. This research also serves as a partial proof-of-concept for the application of medical database hypothesis generation to medicine-adjacent hypothesis discovery.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Dissecting Malware in the Wild
Authors:
Hamish Spencer,
Wei Wang,
Ruoxi Sun,
Minhui Xue
Abstract:
With the increasingly rapid development of new malicious computer software by bad faith actors, both commercial and research-oriented antivirus detectors have come to make greater use of machine learning tactics to identify such malware as harmful before end users are exposed to their effects. This, in turn, has spurred the development of tools that allow for known malware to be manipulated such t…
▽ More
With the increasingly rapid development of new malicious computer software by bad faith actors, both commercial and research-oriented antivirus detectors have come to make greater use of machine learning tactics to identify such malware as harmful before end users are exposed to their effects. This, in turn, has spurred the development of tools that allow for known malware to be manipulated such that they can evade being classified as dangerous by these machine learning-based detectors, while retaining their malicious functionality. These manipulations function by applying a set of changes that can be made to Windows programs that result in a different file structure and signature without altering the software's capabilities. Various proposals have been made for the most effective way of applying these alterations to input malware to deceive static malware detectors; the purpose of this research is to examine these proposals and test their implementations to determine which tactics tend to generate the most successful attacks.
△ Less
Submitted 4 December, 2021; v1 submitted 27 November, 2021;
originally announced November 2021.
-
On-Line Balancing of Random Inputs
Authors:
Nikhil Bansal,
Joel H. Spencer
Abstract:
We consider an online vector balancing game where vectors $v_t$, chosen uniformly at random in $\{-1,+1\}^n$, arrive over time and a sign $x_t \in \{-1,+1\}$ must be picked immediately upon the arrival of $v_t$. The goal is to minimize the $L^\infty$ norm of the signed sum $\sum_t x_t v_t$. We give an online strategy for picking the signs $x_t$ that has value $O(n^{1/2})$ with high probability. Up…
▽ More
We consider an online vector balancing game where vectors $v_t$, chosen uniformly at random in $\{-1,+1\}^n$, arrive over time and a sign $x_t \in \{-1,+1\}$ must be picked immediately upon the arrival of $v_t$. The goal is to minimize the $L^\infty$ norm of the signed sum $\sum_t x_t v_t$. We give an online strategy for picking the signs $x_t$ that has value $O(n^{1/2})$ with high probability. Up to constants, this is the best possible even when the vectors are given in advance.
△ Less
Submitted 12 July, 2020; v1 submitted 16 March, 2019;
originally announced March 2019.