-
Automatic Extraction of Disease Risk Factors from Medical Publications
Authors:
Maxim Rubchinsky,
Ella Rabinovich,
Adi Shraibman,
Netanel Golan,
Tali Sahar,
Dorit Shweiki
Abstract:
We present a novel approach to automating the identification of risk factors for diseases from medical literature, leveraging pre-trained models in the bio-medical domain, while tuning them for the specific task. Faced with the challenges of the diverse and unstructured nature of medical articles, our study introduces a multi-step system to first identify relevant articles, then classify them base…
▽ More
We present a novel approach to automating the identification of risk factors for diseases from medical literature, leveraging pre-trained models in the bio-medical domain, while tuning them for the specific task. Faced with the challenges of the diverse and unstructured nature of medical articles, our study introduces a multi-step system to first identify relevant articles, then classify them based on the presence of risk factor discussions and, finally, extract specific risk factor information for a disease through a question-answering model.
Our contributions include the development of a comprehensive pipeline for the automated extraction of risk factors and the compilation of several datasets, which can serve as valuable resources for further research in this area. These datasets encompass a wide range of diseases, as well as their associated risk factors, meticulously identified and validated through a fine-grained evaluation scheme. We conducted both automatic and thorough manual evaluation, demonstrating encouraging results. We also highlight the importance of improving models and expanding dataset comprehensiveness to keep pace with the rapidly evolving field of medical research.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
The Rank-Ramsey Problem and the Log-Rank Conjecture
Authors:
Gal Beniamini,
Nati Linial,
Adi Shraibman
Abstract:
A graph is called Rank-Ramsey if (i) Its clique number is small, and (ii) The adjacency matrix of its complement has small rank. We initiate a systematic study of such graphs. Our main motivation is that their constructions, as well as proofs of their non-existence, are intimately related to the famous log-rank conjecture from the field of communication complexity. These investigations also open i…
▽ More
A graph is called Rank-Ramsey if (i) Its clique number is small, and (ii) The adjacency matrix of its complement has small rank. We initiate a systematic study of such graphs. Our main motivation is that their constructions, as well as proofs of their non-existence, are intimately related to the famous log-rank conjecture from the field of communication complexity. These investigations also open interesting new avenues in Ramsey theory.
We construct two families of Rank-Ramsey graphs exhibiting polynomial separation between order and complement rank. Graphs in the first family have bounded clique number (as low as $41$). These are subgraphs of certain strong products, whose building blocks are derived from triangle-free strongly-regular graphs. Graphs in the second family are obtained by applying Boolean functions to Erdős-Rényi graphs. Their clique number is logarithmic, but their complement rank is far smaller than in the first family, about $\mathcal{O}(n^{2/3})$. A key component of this construction is our matrix-theoretic view of lifts.
We also consider lower bounds on the Rank-Ramsey numbers, and determine them in the range where the complement rank is $5$ or less. We consider connections between said numbers and other graph parameters, and find that the two best known explicit constructions of triangle-free Ramsey graphs turn out to be far from Rank-Ramsey.
△ Less
Submitted 17 October, 2024; v1 submitted 12 May, 2024;
originally announced May 2024.
-
An improved protocol for ExactlyN with more than 3 players
Authors:
Lianna Hambardzumyan,
Toniann Pitassi,
Suhail Sherif,
Morgan Shirley,
Adi Shraibman
Abstract:
The ExactlyN problem in the number-on-forehead (NOF) communication setting asks $k$ players, each of whom can see every input but their own, if the $k$ input numbers add up to $N$. Introduced by Chandra, Furst and Lipton in 1983, ExactlyN is important for its role in understanding the strength of randomness in communication complexity with many players. It is also tightly connected to the field of…
▽ More
The ExactlyN problem in the number-on-forehead (NOF) communication setting asks $k$ players, each of whom can see every input but their own, if the $k$ input numbers add up to $N$. Introduced by Chandra, Furst and Lipton in 1983, ExactlyN is important for its role in understanding the strength of randomness in communication complexity with many players. It is also tightly connected to the field of combinatorics: its $k$-party NOF communication complexity is related to the size of the largest corner-free subset in $[N]^{k-1}$.
In 2021, Linial and Shraibman gave more efficient protocols for ExactlyN for 3 players. As an immediate consequence, this also gave a new construction of larger corner-free subsets in $[N]^2$. Later that year Green gave a further refinement to their argument. These results represent the first improvements to the highest-order term for $k=3$ since the famous work of Behrend in 1946. In this paper we give a corresponding improvement to the highest-order term for all $k>3$, the first since Rankin in 1961. That is, we give a more efficient protocol for ExactlyN as well as larger corner-free sets in higher dimensions.
Nearly all previous results in this line of research approached the problem from the combinatorics perspective, implicitly resulting in non-constructive protocols for ExactlyN. Approaching the problem from the communication complexity point of view and constructing explicit protocols for ExactlyN was key to the improvements in the $k=3$ setting. As a further contribution we provide explicit protocols for ExactlyN for any number of players which serves as a base for our improvement.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Larger Corner-Free Sets from Better NOF Exactly-$N$ Protocols
Authors:
Nati Linial,
Adi Shraibman
Abstract:
A subset of the integer planar grid $[N] \times [N]$ is called corner-free if it contains no triple of the form $(x,y), (x+δ,y), (x,y+δ)$. It is known that such a set has a vanishingly small density, but how large this density can be remains unknown. The best previous construction was based on Behrend's large subset of $[N]$ with no $3$-term arithmetic progression. Here we provide the first substa…
▽ More
A subset of the integer planar grid $[N] \times [N]$ is called corner-free if it contains no triple of the form $(x,y), (x+δ,y), (x,y+δ)$. It is known that such a set has a vanishingly small density, but how large this density can be remains unknown. The best previous construction was based on Behrend's large subset of $[N]$ with no $3$-term arithmetic progression. Here we provide the first substantial improvement to this lower bound in decades. Our approach to the problem is based on the theory of communication complexity.
In the $3$-players exactly-$N$ problem the players need to decide whether $x+y+z=N$ for inputs $x,y,z$ and fixed $N$. This is the first problem considered in the multiplayer Number On the Forehead (NOF) model. Despite the basic nature of this problem, no progress has been made on it throughout the years. Only recently have explicit protocols been found for the first time, yet no improvement in complexity has been achieved to date. The present paper offers the first improved protocol for the exactly-$N$ problem. This is also the first significant example where algorithmic ideas in communication complexity bear fruit in additive combinatorics.
△ Less
Submitted 3 October, 2021; v1 submitted 31 January, 2021;
originally announced February 2021.
-
Algorithmic Number On the Forehead Protocols Yielding Dense Ruzsa-Szemerédi Graphs and Hypergraphs
Authors:
Noga Alon,
Adi Shraibman
Abstract:
We describe algorithmic Number On the Forehead protocols that provide dense Ruzsa-Szemerédi graphs. One protocol leads to a simple and natural extension of the original construction of Ruzsa and Szemerédi. The graphs induced by this protocol have $n$ vertices, $Ω(n^2/\log n)$ edges, and are decomposable into $n^{1+O(1/\log \log n)}$ induced matchings. Another protocol is an explicit (and slightly…
▽ More
We describe algorithmic Number On the Forehead protocols that provide dense Ruzsa-Szemerédi graphs. One protocol leads to a simple and natural extension of the original construction of Ruzsa and Szemerédi. The graphs induced by this protocol have $n$ vertices, $Ω(n^2/\log n)$ edges, and are decomposable into $n^{1+O(1/\log \log n)}$ induced matchings. Another protocol is an explicit (and slightly simpler) version of the construction of Alon, Moitra and Sudakov, producing graphs with similar properties. We also generalize the above protocols to more than three players, in order to construct dense uniform hypergraphs in which every edge lies in a positive small number of simplices.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
Property testing of the Boolean and binary rank
Authors:
Michal Parnas,
Dana Ron,
Adi Shraibman
Abstract:
We present algorithms for testing if a $(0,1)$-matrix $M$ has Boolean/binary rank at most $d$, or is $ε$-far from Boolean/binary rank $d$ (i.e., at least an $ε$-fraction of the entries in $M$ must be modified so that it has rank at most $d$).
The query complexity of our testing algorithm for the Boolean rank is $\tilde{O}\left(d^4/ ε^6\right)$. For the binary rank we present a testing algorithm…
▽ More
We present algorithms for testing if a $(0,1)$-matrix $M$ has Boolean/binary rank at most $d$, or is $ε$-far from Boolean/binary rank $d$ (i.e., at least an $ε$-fraction of the entries in $M$ must be modified so that it has rank at most $d$).
The query complexity of our testing algorithm for the Boolean rank is $\tilde{O}\left(d^4/ ε^6\right)$. For the binary rank we present a testing algorithm whose query complexity is $O(2^{2d}/ε)$.
Both algorithms are $1$-sided error algorithms that always accept $M$ if it has Boolean/binary rank at most $d$, and reject with probability at least $2/3$ if $M$ is $ε$-far from Boolean/binary rank $d$.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
On maximal isolation sets in the uniform intersection matrix
Authors:
Michal Parnas,
Adi Shraibman
Abstract:
Let $A_{k,t}$ be the matrix that represents the adjacency matrix of the intersection bipartite graph of all subsets of size $t$ of $\{1,2,...,k\}$. We give constructions of large isolation sets in $A_{k,t}$, where, for a large enough $k$, our constructions are the best possible.
We first prove that the largest identity submatrix in $A_{k,t}$ is of size $k-2t+2$. Then we provide constructions of…
▽ More
Let $A_{k,t}$ be the matrix that represents the adjacency matrix of the intersection bipartite graph of all subsets of size $t$ of $\{1,2,...,k\}$. We give constructions of large isolation sets in $A_{k,t}$, where, for a large enough $k$, our constructions are the best possible.
We first prove that the largest identity submatrix in $A_{k,t}$ is of size $k-2t+2$. Then we provide constructions of isolations sets in $A_{k,t}$ for any $t\geq 2$, as follows: \begin{itemize} \item If $k = 2t+r$ and $0 \leq r \leq 2t-3$, there exists an isolation set of size $2r+3 = 2k-4t+3$. \item If $k \geq 4t-3$, there exists an isolation set of size $k$. \end{itemize} The construction is maximal for $k\geq 4t-3$, since the Boolean rank of $A_{k,t}$ is $k$ in this case. As we prove, the construction is maximal also for $k = 2t, 2t+1$.
Finally, we consider the problem of the maximal triangular isolation submatrix of $A_{k,t}$ that has ones in every entry on the main diagonal and below it, and zeros elsewhere. We give an optimal construction of such a submatrix of size $({2t \choose t}-1) \times ({2t \choose t}-1)$, for any $t \geq 1$ and a large enough $k$. This construction is tight, as there is a matching upper bound, which can be derived from a theorem of Frankl about skew matrices.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
Nondeterministic Communication Complexity with Help and Graph Functions
Authors:
Adi Shraibman
Abstract:
We define nondeterministic communication complexity in the model of communication complexity with help of Babai, Hayes and Kimmel. We use it to prove logarithmic lower bounds on the NOF communication complexity of explicit graph functions, which are complementary to the bounds proved by Beame, David, Pitassi and Woelfel.
We define nondeterministic communication complexity in the model of communication complexity with help of Babai, Hayes and Kimmel. We use it to prove logarithmic lower bounds on the NOF communication complexity of explicit graph functions, which are complementary to the bounds proved by Beame, David, Pitassi and Woelfel.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
The Augmentation Property of Binary Matrices for the Binary and Boolean Rank
Authors:
Michal Parnas,
Adi Shraibman
Abstract:
We define the Augmentation property for binary matrices with respect to different rank functions. A matrix $A$ has the Augmentation property for a given rank function, if for any subset of column vectors $x_1,...,x_t$ for for which the rank of $A$ does not increase when augmented separately with each of the vectors $x_i$, $1\leq i \leq t$, it also holds that the rank does not increase when augment…
▽ More
We define the Augmentation property for binary matrices with respect to different rank functions. A matrix $A$ has the Augmentation property for a given rank function, if for any subset of column vectors $x_1,...,x_t$ for for which the rank of $A$ does not increase when augmented separately with each of the vectors $x_i$, $1\leq i \leq t$, it also holds that the rank does not increase when augmenting $A$ with all vectors $x_1,...,x_t$ simultaneously. This property holds trivially for the usual linear rank over the reals, but as we show, things change significantly when considering the binary and boolean rank of a matrix.
We prove a necessary and sufficient condition for this property to hold under the binary and boolean rank of binary matrices. Namely, a matrix has the Augmentation property for these rank functions if and only if it has a unique base that spans all other bases of the matrix with respect to the given rank function. For the binary rank, we also present a concrete characterization of a family of matrices that has the Augmentation property. This characterization is based on the possible types of linear dependencies between rows of $V$, in optimal binary decompositions of the matrix as $A=U\cdot V$.
Furthermore, we use the Augmentation property to construct simple families of matrices, for which there is a gap between their real and binary rank and between their real and boolean rank.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
A Note on Multiparty Communication Complexity and the Hales-Jewett Theorem
Authors:
Adi Shraibman
Abstract:
For integers $n$ and $k$, the density Hales-Jewett number $c_{n,k}$ is defined as the maximal size of a subset of $[k]^n$ that contains no combinatorial line. We show that for $k \ge 3$ the density Hales-Jewett number $c_{n,k}$ is equal to the maximal size of a cylinder intersection in the problem $Part_{n,k}$ of testing whether $k$ subsets of $[n]$ form a partition. It follows that the communicat…
▽ More
For integers $n$ and $k$, the density Hales-Jewett number $c_{n,k}$ is defined as the maximal size of a subset of $[k]^n$ that contains no combinatorial line. We show that for $k \ge 3$ the density Hales-Jewett number $c_{n,k}$ is equal to the maximal size of a cylinder intersection in the problem $Part_{n,k}$ of testing whether $k$ subsets of $[n]$ form a partition. It follows that the communication complexity, in the Number On the Forehead (NOF) model, of $Part_{n,k}$, is equal to the minimal size of a partition of $[k]^n$ into subsets that do not contain a combinatorial line. Thus, the bound in \cite{chattopadhyay2007languages} on $Part_{n,k}$ using the Hales-Jewett theorem is in fact tight, and the density Hales-Jewett number can be thought of as a quantity in communication complexity. This gives a new angle to this well studied quantity.
As a simple application we prove a lower bound on $c_{n,k}$, similar to the lower bound in \cite{polymath2010moser} which is roughly $c_{n,k}/k^n \ge \exp(-O(\log n)^{1/\lceil \log_2 k\rceil})$. This lower bound follows from a protocol for $Part_{n,k}$. It is interesting to better understand the communication complexity of $Part_{n,k}$ as this will also lead to the better understanding of the Hales-Jewett number. The main purpose of this note is to motivate this study.
△ Less
Submitted 30 June, 2018; v1 submitted 7 June, 2017;
originally announced June 2017.
-
On The Communication Complexity of High-Dimensional Permutations
Authors:
Nati Linial,
and Toniann Pitassi,
Adi Shraibman
Abstract:
We study the multiparty communication complexity of high dimensional permutations, in the Number On the Forehead (NOF) model. This model is due to Chandra, Furst and Lipton (CFL) who also gave a nontrivial protocol for the Exactly-n problem where three players receive integer inputs and need to decide if their inputs sum to a given integer $n$. There is a considerable body of literature dealing wi…
▽ More
We study the multiparty communication complexity of high dimensional permutations, in the Number On the Forehead (NOF) model. This model is due to Chandra, Furst and Lipton (CFL) who also gave a nontrivial protocol for the Exactly-n problem where three players receive integer inputs and need to decide if their inputs sum to a given integer $n$. There is a considerable body of literature dealing with the same problem, where $(\mathbb{N},+)$ is replaced by some other abelian group. Our work can be viewed as a far-reaching extension of this line of work.
We show that the known lower bounds for that group-theoretic problem apply to all high dimensional permutations. We introduce new proof techniques that appeal to recent advances in Additive Combinatorics and Ramsey theory. We reveal new and unexpected connections between the NOF communication complexity of high dimensional permutations and a variety of well known and thoroughly studied problems in combinatorics.
Previous protocols for Exactly-n all rely on the construction of large sets of integers without a 3-term arithmetic progression. No direct algorithmic protocol was previously known for the problem, and we provide the first such algorithm. This suggests new ways to significantly improve the CFL protocol.
Many new open questions are presented throughout.
△ Less
Submitted 27 November, 2018; v1 submitted 7 June, 2017;
originally announced June 2017.
-
The Corruption Bound, Log Rank, and Communication Complexity
Authors:
Adi Shraibman
Abstract:
We prove upper bounds on deterministic communication complexity in terms of log of the rank and simple versions of the corruption bound.
Our bounds are a simplified version of the results of Gavinsky and Lovett, using the same set of tools. We also give an elementary proof for the upper bound on communication complexity in terms of rank proved by Lovett.
We prove upper bounds on deterministic communication complexity in terms of log of the rank and simple versions of the corruption bound.
Our bounds are a simplified version of the results of Gavinsky and Lovett, using the same set of tools. We also give an elementary proof for the upper bound on communication complexity in terms of rank proved by Lovett.
△ Less
Submitted 22 September, 2017; v1 submitted 14 September, 2014;
originally announced September 2014.
-
An approximation algorithm for approximation rank
Authors:
Troy Lee,
Adi Shraibman
Abstract:
One of the strongest techniques available for showing lower bounds on quantum communication complexity is the logarithm of the approximation rank of the communication matrix--the minimum rank of a matrix which is entrywise close to the communication matrix. This technique has two main drawbacks: it is difficult to compute, and it is not known to lower bound quantum communication complexity with en…
▽ More
One of the strongest techniques available for showing lower bounds on quantum communication complexity is the logarithm of the approximation rank of the communication matrix--the minimum rank of a matrix which is entrywise close to the communication matrix. This technique has two main drawbacks: it is difficult to compute, and it is not known to lower bound quantum communication complexity with entanglement.
Linial and Shraibman recently introduced a norm, called gamma_2^{alpha}, to quantum communication complexity, showing that it can be used to lower bound communication with entanglement. Here the parameter alpha is a measure of approximation which is related to the allowable error probability of the protocol. This bound can be written as a semidefinite program and gives bounds at least as large as many techniques in the literature, although it is smaller than the corresponding alpha-approximation rank, rk_alpha. We show that in fact log gamma_2^{alpha}(A)$ and log rk_{alpha}(A)$ agree up to small factors. As corollaries we obtain a constant factor polynomial time approximation algorithm to the logarithm of approximate rank, and that the logarithm of approximation rank is a lower bound for quantum communication complexity with entanglement.
△ Less
Submitted 7 March, 2021; v1 submitted 11 September, 2008;
originally announced September 2008.
-
Disjointness is hard in the multi-party number on the forehead model
Authors:
Troy Lee,
Adi Shraibman
Abstract:
We show that disjointness requires randomized communication Omega(n^{1/(k+1)}/2^{2^k}) in the general k-party number-on-the-forehead model of complexity. The previous best lower bound for k >= 3 was log(n)/(k-1). Our results give a separation between nondeterministic and randomized multiparty number-on-the-forehead communication complexity for up to k=log log n - O(log log log n) many players. A…
▽ More
We show that disjointness requires randomized communication Omega(n^{1/(k+1)}/2^{2^k}) in the general k-party number-on-the-forehead model of complexity. The previous best lower bound for k >= 3 was log(n)/(k-1). Our results give a separation between nondeterministic and randomized multiparty number-on-the-forehead communication complexity for up to k=log log n - O(log log log n) many players. Also by a reduction of Beame, Pitassi, and Segerlind, these results imply subexponential lower bounds on the size of proofs needed to refute certain unsatisfiable CNFs in a broad class of proof systems, including tree-like Lovasz-Schrijver proofs.
△ Less
Submitted 9 June, 2009; v1 submitted 27 December, 2007;
originally announced December 2007.