-
Sequence Reconstruction for Sticky Insertion/Deletion Channels
Authors:
Van Long Phuoc Pham,
Yeow Meng Chee,
Kui Cai,
Van Khu Vu
Abstract:
The sequence reconstruction problem for insertion/deletion channels has attracted significant attention owing to their applications recently in some emerging data storage systems, such as racetrack memories, DNA-based data storage. Our goal is to investigate the reconstruction problem for sticky-insdel channels where both sticky-insertions and sticky-deletions occur. If there are only sticky-inser…
▽ More
The sequence reconstruction problem for insertion/deletion channels has attracted significant attention owing to their applications recently in some emerging data storage systems, such as racetrack memories, DNA-based data storage. Our goal is to investigate the reconstruction problem for sticky-insdel channels where both sticky-insertions and sticky-deletions occur. If there are only sticky-insertion errors, the reconstruction problem for sticky-insertion channel is a special case of the reconstruction problem for tandem-duplication channel which has been well-studied. In this work, we consider the $(t, s)$-sticky-insdel channel where there are at most $t$ sticky-insertion errors and $s$ sticky-deletion errors when we transmit a message through the channel. For the reconstruction problem, we are interested in the minimum number of distinct outputs from these channels that are needed to uniquely recover the transmitted vector. We first provide a recursive formula to determine the minimum number of distinct outputs required. Next, we provide an efficient algorithm to reconstruct the transmitted vector from erroneous sequences.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Yet Another Diminishing Spark: Low-level Cyberattacks in the Israel-Gaza Conflict
Authors:
Anh V. Vu,
Alice Hutchings,
Ross Anderson
Abstract:
We report empirical evidence of web defacement and DDoS attacks carried out by low-level cybercrime actors in the Israel-Gaza conflict. Our quantitative measurements indicate an immediate increase in such cyberattacks following the Hamas-led assault and the subsequent declaration of war. However, the surges waned quickly after a few weeks, with patterns resembling those observed in the aftermath o…
▽ More
We report empirical evidence of web defacement and DDoS attacks carried out by low-level cybercrime actors in the Israel-Gaza conflict. Our quantitative measurements indicate an immediate increase in such cyberattacks following the Hamas-led assault and the subsequent declaration of war. However, the surges waned quickly after a few weeks, with patterns resembling those observed in the aftermath of the Russian invasion of Ukraine. The scale of attacks and discussions within the hacking community this time was both significantly lower than those during the early days of the Russia-Ukraine war, and attacks have been prominently one-sided: many pro-Palestinian supporters have targeted Israel, while attacks on Palestine have been much less significant. Beyond targeting these two, attackers also defaced sites of other countries to express their war support. Their broader opinions are also largely disparate, with far more support for Palestine and many objections expressed toward Israel.
△ Less
Submitted 16 May, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
HDC: Hierarchical Distillation for Multi-level Noisy Consistency in Semi-Supervised Fetal Ultrasound Segmentation
Authors:
Tran Quoc Khanh Le,
Nguyen Lan Vi Vu,
Ha-Hieu Pham,
Xuan-Loc Huynh,
Tien-Huy Nguyen,
Minh Huu Nhat Le,
Quan Nguyen,
Hien D. Nguyen
Abstract:
Transvaginal ultrasound is a critical imaging modality for evaluating cervical anatomy and detecting physiological changes. However, accurate segmentation of cervical structures remains challenging due to low contrast, shadow artifacts, and indistinct boundaries. While convolutional neural networks (CNNs) have demonstrated efficacy in medical image segmentation, their reliance on large-scale annot…
▽ More
Transvaginal ultrasound is a critical imaging modality for evaluating cervical anatomy and detecting physiological changes. However, accurate segmentation of cervical structures remains challenging due to low contrast, shadow artifacts, and indistinct boundaries. While convolutional neural networks (CNNs) have demonstrated efficacy in medical image segmentation, their reliance on large-scale annotated datasets presents a significant limitation in clinical ultrasound imaging. Semi-supervised learning (SSL) offers a potential solution by utilizing unlabeled data, yet existing teacher-student frameworks often encounter confirmation bias and high computational costs. In this paper, a novel semi-supervised segmentation framework, called HDC, is proposed incorporating adaptive consistency learning with a single-teacher architecture. The framework introduces a hierarchical distillation mechanism with two objectives: Correlation Guidance Loss for aligning feature representations and Mutual Information Loss for stabilizing noisy student learning. The proposed approach reduces model complexity while enhancing generalization. Experiments on fetal ultrasound datasets, FUGC and PSFH, demonstrate competitive performance with reduced computational overhead compared to multi-teacher models.
△ Less
Submitted 16 April, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
Computational bottlenecks for denoising diffusions
Authors:
Andrea Montanari,
Viet Vu
Abstract:
Denoising diffusions sample from a probability distribution $μ$ in $\mathbb{R}^d$ by constructing a stochastic process $({\hat{\boldsymbol x}}_t:t\ge 0)$ in $\mathbb{R}^d$ such that ${\hat{\boldsymbol x}}_0$ is easy to sample, but the distribution of $\hat{\boldsymbol x}_T$ at large $T$ approximates $μ$. The drift ${\boldsymbol m}:\mathbb{R}^d\times\mathbb{R}\to\mathbb{R}^d$ of this diffusion proc…
▽ More
Denoising diffusions sample from a probability distribution $μ$ in $\mathbb{R}^d$ by constructing a stochastic process $({\hat{\boldsymbol x}}_t:t\ge 0)$ in $\mathbb{R}^d$ such that ${\hat{\boldsymbol x}}_0$ is easy to sample, but the distribution of $\hat{\boldsymbol x}_T$ at large $T$ approximates $μ$. The drift ${\boldsymbol m}:\mathbb{R}^d\times\mathbb{R}\to\mathbb{R}^d$ of this diffusion process is learned my minimizing a score-matching objective.
Is every probability distribution $μ$, for which sampling is tractable, also amenable to sampling via diffusions? We provide evidence to the contrary by studying a probability distribution $μ$ for which sampling is easy, but the drift of the diffusion process is intractable -- under a popular conjecture on information-computation gaps in statistical estimation. We show that there exist drifts that are superpolynomially close to the optimum value (among polynomial time drifts) and yet yield samples with distribution that is very far from the target one.
△ Less
Submitted 5 June, 2025; v1 submitted 11 March, 2025;
originally announced March 2025.
-
Assessing the Aftermath: the Effects of a Global Takedown against DDoS-for-hire Services
Authors:
Anh V. Vu,
Ben Collier,
Daniel R. Thomas,
John Kristoff,
Richard Clayton,
Alice Hutchings
Abstract:
Law enforcement and private-sector partners have in recent years conducted various interventions to disrupt the DDoS-for-hire market. Drawing on multiple quantitative datasets, including web traffic and ground-truth visits to seized websites, millions of DDoS attack records from academic, industry, and self-reported statistics, along with chats on underground forums and Telegram channels, we asses…
▽ More
Law enforcement and private-sector partners have in recent years conducted various interventions to disrupt the DDoS-for-hire market. Drawing on multiple quantitative datasets, including web traffic and ground-truth visits to seized websites, millions of DDoS attack records from academic, industry, and self-reported statistics, along with chats on underground forums and Telegram channels, we assess the effects of an ongoing global intervention against DDoS-for-hire services since December 2022. This is the most extensive booter takedown to date conducted, combining targeting infrastructure with digital influence tactics in a concerted effort by law enforcement across several countries with two waves of website takedowns and the use of deceptive domains. We found over half of the seized sites in the first wave returned within a median of one day, while all booters seized in the second wave returned within a median of two days. Re-emerged booter domains, despite closely resembling old ones, struggled to attract visitors (80-90% traffic reduction). While the first wave cut the global DDoS attack volume by 20-40% with a statistically significant effect specifically on UDP-based DDoS attacks (commonly attributed to booters), the impact of the second wave appeared minimal. Underground discussions indicated a cumulative impact, leading to changes in user perceptions of safety and causing some operators to leave the market. Despite the extensive intervention efforts, all DDoS datasets consistently suggest that the illicit market is fairly resilient, with an overall short-lived effect on the global DDoS attack volume lasting for at most only around six weeks.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Fast exact recovery of noisy matrix from few entries: the infinity norm approach
Authors:
BaoLinh Tran,
Van Vu
Abstract:
The matrix recovery (completion) problem, a central problem in data science and theoretical computer science, is to recover a matrix $A$ from a relatively small sample of entries.
While such a task is impossible in general, it has been shown that one can recover $A$ exactly in polynomial time, with high probability, from a random subset of entries, under three (basic and necessary) assumptions:…
▽ More
The matrix recovery (completion) problem, a central problem in data science and theoretical computer science, is to recover a matrix $A$ from a relatively small sample of entries.
While such a task is impossible in general, it has been shown that one can recover $A$ exactly in polynomial time, with high probability, from a random subset of entries, under three (basic and necessary) assumptions: (1) the rank of $A$ is very small compared to its dimensions (low rank), (2) $A$ has delocalized singular vectors (incoherence), and (3) the sample size is sufficiently large.
There are many different algorithms for the task, including convex optimization by Candes, Tao and Recht (2009), alternating projection by Hardt and Wooters (2014) and low rank approximation with gradient descent by Keshavan, Montanari and Oh (2009, 2010).
In applications, it is more realistic to assume that data is noisy. In this case, these approaches provide an approximate recovery with small root mean square error. However, it is hard to transform such an approximate recovery to an exact one.
Recently, results by Abbe et al. (2017) and Bhardwaj et al. (2023) concerning approximation in the infinity norm showed that we can achieve exact recovery even in the noisy case, given that the ground matrix has bounded precision. Beyond the three basic assumptions above, they required either the condition number of $A$ is small (Abbe et al.) or the gap between consecutive singular values is large (Bhardwaj et al.).
In this paper, we remove these extra spectral assumptions. As a result, we obtain a simple algorithm for exact recovery in the noisy case, under only the three basic assumptions. This is the first such algorithm. To analyse this algorithm, we introduce a contour integration argument which is totally different from all previous methods and may be of independent interest.
△ Less
Submitted 4 March, 2025; v1 submitted 31 January, 2025;
originally announced January 2025.
-
A New Construction of Non-Binary Deletion Correcting Codes and their Decoding
Authors:
Michael Schaller,
Beatrice Toesca,
Van Khu Vu
Abstract:
Non-binary codes correcting multiple deletions have recently attracted a lot of attention. In this work, we focus on multiplicity-free codes, a family of non-binary codes where all symbols are distinct. Our main contribution is a new explicit construction of such codes, based on set and permutation codes. We show that our multiplicity-free codes can correct multiple deletions and provide a decodin…
▽ More
Non-binary codes correcting multiple deletions have recently attracted a lot of attention. In this work, we focus on multiplicity-free codes, a family of non-binary codes where all symbols are distinct. Our main contribution is a new explicit construction of such codes, based on set and permutation codes. We show that our multiplicity-free codes can correct multiple deletions and provide a decoding algorithm. We also show that, for a certain regime of parameters, our constructed codes have size larger than all the previously known non-binary codes correcting multiple deletions.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Heterogeneous Hypergraph Embedding for Recommendation Systems
Authors:
Darnbi Sakong,
Viet Hung Vu,
Thanh Trung Huynh,
Phi Le Nguyen,
Hongzhi Yin,
Quoc Viet Hung Nguyen,
Thanh Tam Nguyen
Abstract:
Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to…
▽ More
Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to sub-optimal recommendations, and ii) Dealing with the heterogeneous modalities of input sources, such as user-item bipartite graphs and KGs, which may introduce noise and inaccuracies. To address these issues, we present a novel Knowledge-enhanced Heterogeneous Hypergraph Recommender System (KHGRec). KHGRec captures group-wise characteristics of both the interaction network and the KG, modeling complex connections in the KG. Using a collaborative knowledge heterogeneous hypergraph (CKHG), it employs two hypergraph encoders to model group-wise interdependencies and ensure explainability. Additionally, it fuses signals from the input graphs with cross-view self-supervised learning and attention mechanisms. Extensive experiments on four real-world datasets show our model's superiority over various state-of-the-art baselines, with an average 5.18\% relative improvement. Additional tests on noise resilience, missing data, and cold-start problems demonstrate the robustness of our KHGRec framework. Our model and evaluation datasets are publicly available at \url{https://github.com/viethungvu1998/KHGRec}.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Permutation and Multi-permutation Codes Correcting Multiple Deletions
Authors:
Shuche Wang,
The Nguyen,
Yeow Meng Chee,
Van Khu Vu
Abstract:
Permutation codes in the Ulam metric, which can correct multiple deletions, have been investigated extensively recently. In this work, we are interested in the maximum size of permutation codes in the Ulam metric and aim to design permutation codes that can correct multiple deletions with efficient decoding algorithms. We first present an improvement on the Gilbert--Varshamov bound of the maximum…
▽ More
Permutation codes in the Ulam metric, which can correct multiple deletions, have been investigated extensively recently. In this work, we are interested in the maximum size of permutation codes in the Ulam metric and aim to design permutation codes that can correct multiple deletions with efficient decoding algorithms. We first present an improvement on the Gilbert--Varshamov bound of the maximum size of these permutation codes by analyzing the independence number of the auxiliary graph. The idea is widely used in various cases and our contribution in this section is enumerating the number of triangles in the auxiliary graph and showing that it is small enough. Next, we design permutation codes correcting multiple deletions with a decoding algorithm. In particular, the constructed permutation codes can correct $t$ deletions with at most $(3t-1) \log n+o(\log n)$ bits of redundancy where $n$ is the length of the code. Our construction is based on a new mapping which yields a new connection between permutation codes in the Hamming metric and permutation codes in various metrics. Furthermore, we construct permutation codes that correct multiple bursts of deletions using this new mapping. Finally, we extend the new mapping for multi-permutations and construct the best-known multi-permutation codes in Ulam metric.
△ Less
Submitted 9 December, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Efficient designs for threshold group testing without gap
Authors:
Thach V. Bui,
Yeow Meng Chee,
Van Khu Vu
Abstract:
Given $d$ defective items in a population of $n$ items with $d \ll n$, in threshold group testing without gap, the outcome of a test on a subset of items is positive if the subset has at least $u$ defective items and negative otherwise, where $1 \leq u \leq d$. The basic goal of threshold group testing is to quickly identify the defective items via a small number of tests. In non-adaptive design,…
▽ More
Given $d$ defective items in a population of $n$ items with $d \ll n$, in threshold group testing without gap, the outcome of a test on a subset of items is positive if the subset has at least $u$ defective items and negative otherwise, where $1 \leq u \leq d$. The basic goal of threshold group testing is to quickly identify the defective items via a small number of tests. In non-adaptive design, all tests are designed independently and can be performed in parallel. The decoding time in the non-adaptive state-of-the-art work is a polynomial of $(d/u)^u (d/(d-u))^{d - u}, d$, and $\log{n}$. In this work, we present a novel design that significantly reduces the number of tests and the decoding time to polynomials of $\min\{u^u, (d - u)^{d - u}\}, d$, and $\log{n}$. In particular, when $u$ is a constant, the number of tests and the decoding time are $O(d^3 (\log^2{n}) \log{(n/d)} )$ and $O\big(d^3 (\log^2{n}) \log{(n/d)} + d^2 (\log{n}) \log^3{(n/d)} \big)$, respectively. For a special case when $u = 2$, with non-adaptive design, the number of tests and the decoding time are $O(d^3 (\log{n}) \log{(n/d)} )$ and $O(d^2 (\log{n} + \log^2{(n/d)}) )$, respectively. Moreover, with 2-stage design, the number of tests and the decoding time are $O(d^2 \log^2{(n/d)} )$.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
On de Bruijn Covering Sequences and Arrays
Authors:
Yeow Meng Chee,
Tuvi Etzion,
Hoang Ta,
Van Khu Vu
Abstract:
An $(m,n,R)$-de Bruijn covering array (dBCA) is a doubly periodic $M \times N$ array over an alphabet of size $q$ such that the set of all its $m \times n$ windows form a covering code with radius $R$. An upper bound of the smallest array area of an $(m,n,R)$-dBCA is provided using a probabilistic technique which is similar to the one that was used for an upper bound on the length of a de Bruijn c…
▽ More
An $(m,n,R)$-de Bruijn covering array (dBCA) is a doubly periodic $M \times N$ array over an alphabet of size $q$ such that the set of all its $m \times n$ windows form a covering code with radius $R$. An upper bound of the smallest array area of an $(m,n,R)$-dBCA is provided using a probabilistic technique which is similar to the one that was used for an upper bound on the length of a de Bruijn covering sequence. A folding technique to construct a dBCA from a de Bruijn covering sequence or de Bruijn covering sequences code is presented. Several new constructions that yield shorter de Bruijn covering sequences and $(m,n,R)$-dBCAs with smaller areas are also provided. These constructions are mainly based on sequences derived from cyclic codes, self-dual sequences, primitive polynomials, an interleaving technique, folding, and mutual shifts of sequences with the same covering radius. Finally, constructions of de Bruijn covering sequences codes are also discussed.
△ Less
Submitted 9 May, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
Maximum Length RLL Sequences in de Bruijn Graph
Authors:
Yeow Meng Chee,
Tuvi Etzion,
Tien Long Nguyen,
Duy Hoang Ta,
Vinh Duc Tran,
Van Khu Vu
Abstract:
Free-space quantum key distribution requires to synchronize the transmitted and received signals. A timing and synchronization system for this purpose based on a de Bruijn sequence has been proposed and studied recently for a channel associated with quantum communication that requires reliable synchronization. To avoid a long period of no-pulse in such a system on-off pulses are used to simulate a…
▽ More
Free-space quantum key distribution requires to synchronize the transmitted and received signals. A timing and synchronization system for this purpose based on a de Bruijn sequence has been proposed and studied recently for a channel associated with quantum communication that requires reliable synchronization. To avoid a long period of no-pulse in such a system on-off pulses are used to simulate a \emph{zero} and on-on pulses are used to simulate a \emph{one}. However, these sequences have high redundancy and low rate. To reduce the redundancy and increase the rate, run-length limited sequences in the de Bruijn graph are proposed for the same purpose. The maximum length of such sequences in the de Bruijn graph is studied and an efficient algorithm to construct a large set of these sequences is presented. Based on known algorithms and enumeration methods, maximum length sequence for which the position of each window can be computed efficiently is presented and an enumeration on the number of such sequences is given.
△ Less
Submitted 8 November, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Authors:
Zhaowei Li,
Qi Xu,
Dong Zhang,
Hang Song,
Yiqing Cai,
Qi Qi,
Ran Zhou,
Junting Pan,
Zefeng Li,
Van Tu Vu,
Zhida Huang,
Tao Wang
Abstract:
Multi-modal large language models have demonstrated impressive performance across various tasks in different modalities. However, existing multi-modal models primarily emphasize capturing global information within each modality while neglecting the importance of perceiving local information across modalities. Consequently, these models lack the ability to effectively understand the fine-grained de…
▽ More
Multi-modal large language models have demonstrated impressive performance across various tasks in different modalities. However, existing multi-modal models primarily emphasize capturing global information within each modality while neglecting the importance of perceiving local information across modalities. Consequently, these models lack the ability to effectively understand the fine-grained details of input data, limiting their performance in tasks that require a more nuanced understanding. To address this limitation, there is a compelling need to develop models that enable fine-grained understanding across multiple modalities, thereby enhancing their applicability to a wide range of tasks. In this paper, we propose GroundingGPT, a language enhanced multi-modal grounding model. Beyond capturing global information like other multi-modal models, our proposed model excels at tasks demanding a detailed understanding of local information within the input. It demonstrates precise identification and localization of specific regions in images or moments in videos. To achieve this objective, we design a diversified dataset construction pipeline, resulting in a multi-modal, multi-granularity dataset for model training. The code, dataset, and demo of our model can be found at https: //github.com/lzw-lzw/GroundingGPT.
△ Less
Submitted 5 March, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision
Authors:
Jianning Li,
Zongwei Zhou,
Jiancheng Yang,
Antonio Pepe,
Christina Gsaxner,
Gijs Luijten,
Chongyu Qu,
Tiezheng Zhang,
Xiaoxi Chen,
Wenxuan Li,
Marek Wodzinski,
Paul Friedrich,
Kangxian Xie,
Yuan Jin,
Narmada Ambigapathy,
Enrico Nasca,
Naida Solak,
Gian Marco Melito,
Viet Duc Vu,
Afaque R. Memon,
Christopher Schlachta,
Sandrine De Ribaupierre,
Rajnikant Patel,
Roy Eagleson,
Xiaojun Chen
, et al. (132 additional authors not shown)
Abstract:
Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of Shape…
▽ More
Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedback
△ Less
Submitted 12 December, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Efficient Approximation of Quantum Channel Fidelity Exploiting Symmetry
Authors:
Yeow Meng Chee,
Hoang Ta,
Van Khu Vu
Abstract:
Determining the optimal fidelity for the transmission of quantum information over noisy quantum channels is one of the central problems in quantum information theory. Recently, [Berta-Borderi-Fawzi-Scholz, Mathematical Programming, 2021] introduced an asymptotically converging semidefinite programming hierarchy of outer bounds for this quantity. However, the size of the semidefinite programs (SDPs…
▽ More
Determining the optimal fidelity for the transmission of quantum information over noisy quantum channels is one of the central problems in quantum information theory. Recently, [Berta-Borderi-Fawzi-Scholz, Mathematical Programming, 2021] introduced an asymptotically converging semidefinite programming hierarchy of outer bounds for this quantity. However, the size of the semidefinite programs (SDPs) grows exponentially with respect to the level of the hierarchy, thus making their computation unscalable. In this work, by exploiting the symmetries in the SDP, we show that, for a fixed output dimension of the quantum channel, we can compute the SDP in time polynomial with respect to the level of the hierarchy and input dimension. As a direct consequence of our result, the optimal fidelity can be approximated with an accuracy of $ε$ in $\mathrm{poly}(1/ε, \text{input dimension})$ time.
△ Less
Submitted 21 March, 2024; v1 submitted 30 August, 2023;
originally announced August 2023.
-
No Easy Way Out: the Effectiveness of Deplatforming an Extremist Forum to Suppress Hate and Harassment
Authors:
Anh V. Vu,
Alice Hutchings,
Ross Anderson
Abstract:
Legislators and policymakers worldwide are debating options for suppressing illegal, harmful and undesirable material online. Drawing on several quantitative data sources, we show that deplatforming an active community to suppress online hate and harassment, even with a substantial concerted effort involving several tech firms, can be hard. Our case study is the disruption of the largest and longe…
▽ More
Legislators and policymakers worldwide are debating options for suppressing illegal, harmful and undesirable material online. Drawing on several quantitative data sources, we show that deplatforming an active community to suppress online hate and harassment, even with a substantial concerted effort involving several tech firms, can be hard. Our case study is the disruption of the largest and longest-running harassment forum Kiwi Farms in late 2022, which is probably the most extensive industry effort to date. Despite the active participation of a number of tech companies over several consecutive months, this campaign failed to shut down the forum and remove its objectionable content. While briefly raising public awareness, it led to rapid platform displacement and traffic fragmentation. Part of the activity decamped to Telegram, while traffic shifted from the primary domain to previously abandoned alternatives. The forum experienced intermittent outages for several weeks, after which the community leading the campaign lost interest, traffic was directed back to the main domain, users quickly returned, and the forum was back online and became even more connected. The forum members themselves stopped discussing the incident shortly thereafter, and the net effect was that forum activity, active users, threads, posts and traffic were all cut by about half. Deplatforming a community without a court order raises philosophical issues about censorship versus free speech; ethical and legal issues about the role of industry in online content moderation; and practical issues on the efficacy of private-sector versus government action. Deplatforming a dispersed community using a series of court orders against individual service providers appears unlikely to be very effective if the censor cannot incapacitate the key maintainers, whether by arresting them, enjoining them or otherwise deterring them.
△ Less
Submitted 13 April, 2024; v1 submitted 14 April, 2023;
originally announced April 2023.
-
A Unified Taxonomy for Automated Vehicles: Individual, Cooperative, Collaborative, On-Road, and Off-Road
Authors:
Fredrik Warg,
Anders Thorsén,
Victoria Vu,
Carl Bergenhem
Abstract:
Various types of vehicle automation is increasingly used in a variety of environments including road vehicles such as cars or automated shuttles, confined areas such as mines or harbours, or in agriculture and forestry. In many use cases, the benefits are greater if several automated vehicles (AVs) cooperate to aid each other reach their goals more efficiently, or collaborate to complete a common…
▽ More
Various types of vehicle automation is increasingly used in a variety of environments including road vehicles such as cars or automated shuttles, confined areas such as mines or harbours, or in agriculture and forestry. In many use cases, the benefits are greater if several automated vehicles (AVs) cooperate to aid each other reach their goals more efficiently, or collaborate to complete a common task. Taxonomies and definitions create a common framework that helps researchers and practitioners advance the field. However, most existing work focus on road vehicles. In this paper, we review and extend taxonomies and definitions to encompass individually acting as well as cooperative and collaborative AVs for both on-road and off-road use cases. In particular, we introduce classes of collaborative vehicles not defined in existing literature, and define levels of automation suitable for vehicles where automation applies to additional functions in addition to the driving task.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Matrix Perturbation: Davis-Kahan in the Infinity Norm
Authors:
Abhinav Bhardwaj,
Van Vu
Abstract:
Perturbation theory is developed to analyze the impact of noise on data and has been an essential part of numerical analysis. Recently, it has played an important role in designing and analyzing matrix algorithms. One of the most useful tools in this subject, the Davis-Kahan sine theorem, provides an $\ell_2$ error bound on the perturbation of the leading singular vectors (and spaces).
We focus…
▽ More
Perturbation theory is developed to analyze the impact of noise on data and has been an essential part of numerical analysis. Recently, it has played an important role in designing and analyzing matrix algorithms. One of the most useful tools in this subject, the Davis-Kahan sine theorem, provides an $\ell_2$ error bound on the perturbation of the leading singular vectors (and spaces).
We focus on the case when the signal matrix has low rank and the perturbation is random, which occurs often in practice. In an earlier paper, O'Rourke, Wang, and the second author showed that in this case, one can obtain an improved theorem. In particular, the noise-to-gap ratio condition in the original setting can be weakened considerably.
In the current paper, we develop an infinity norm version of the O'Rourke-Vu-Wang result. The key ideas in the proof are a new bootstrapping argument and the so-called iterative leave-one-out method, which may be of independent interest.
Applying the new bounds, we develop new, simple, and quick algorithms for several well-known problems, such as finding hidden partitions and matrix completion. The core of these new algorithms is the fact that one is now able to quickly approximate certain key objects in the infinity norm, which has critical advantages over approximations in the $\ell_2$ norm, Frobenius norm, or spectral norm.
△ Less
Submitted 20 November, 2023; v1 submitted 1 April, 2023;
originally announced April 2023.
-
Codes for Correcting $t$ Limited-Magnitude Sticky Deletions
Authors:
Shuche Wang,
Van Khu Vu,
Vincent Y. F. Tan
Abstract:
Codes for correcting sticky insertions/deletions and limited-magnitude errors have attracted significant attention due to their applications of flash memories, racetrack memories, and DNA data storage systems. In this paper, we first consider the error type of $t$-sticky deletions with $\ell$-limited-magnitude and propose a non-systematic code for correcting this type of error with redundancy…
▽ More
Codes for correcting sticky insertions/deletions and limited-magnitude errors have attracted significant attention due to their applications of flash memories, racetrack memories, and DNA data storage systems. In this paper, we first consider the error type of $t$-sticky deletions with $\ell$-limited-magnitude and propose a non-systematic code for correcting this type of error with redundancy $2t(1-1/p)\cdot\log(n+1)+O(1)$, where $p$ is the smallest prime larger than $\ell+1$. Next, we present a systematic code construction with an efficient encoding and decoding algorithm with redundancy $\frac{\lceil2t(1-1/p)\rceil\cdot\lceil\log p\rceil}{\log p} \log(n+1)+O(\log\log n)$, where $p$ is the smallest prime larger than $\ell+1$.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Codes for Correcting Asymmetric Adjacent Transpositions and Deletions
Authors:
Shuche Wang,
Van Khu Vu,
Vincent Y. F. Tan
Abstract:
Codes in the Damerau--Levenshtein metric have been extensively studied recently owing to their applications in DNA-based data storage. In particular, Gabrys, Yaakobi, and Milenkovic (2017) designed a length-$n$ code correcting a single deletion and $s$ adjacent transpositions with at most $(1+2s)\log n$ bits of redundancy. In this work, we consider a new setting where both asymmetric adjacent tran…
▽ More
Codes in the Damerau--Levenshtein metric have been extensively studied recently owing to their applications in DNA-based data storage. In particular, Gabrys, Yaakobi, and Milenkovic (2017) designed a length-$n$ code correcting a single deletion and $s$ adjacent transpositions with at most $(1+2s)\log n$ bits of redundancy. In this work, we consider a new setting where both asymmetric adjacent transpositions (also known as right-shifts or left-shifts) and deletions may occur. We present several constructions of the codes correcting these errors in various cases. In particular, we design a code correcting a single deletion, $s^+$ right-shift, and $s^-$ left-shift errors with at most $(1+s)\log (n+s+1)+1$ bits of redundancy where $s=s^{+}+s^{-}$. In addition, we investigate codes correcting $t$ $0$-deletions, $s^+$ right-shift, and $s^-$ left-shift errors with both uniquely-decoding and list-decoding algorithms. Our main contribution here is the construction of a list-decodable code with list size $O(n^{\min\{s+1,t\}})$ and with at most $(\max \{t,s+1\}) \log n+O(1)$ bits of redundancy, where $s=s^{+}+s^{-}$. Finally, we construct both non-systematic and systematic codes for correcting blocks of $0$-deletions with $\ell$-limited-magnitude and $s$ adjacent transpositions.
△ Less
Submitted 29 June, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Getting Bored of Cyberwar: Exploring the Role of Low-level Cybercrime Actors in the Russia-Ukraine Conflict
Authors:
Anh V. Vu,
Daniel R. Thomas,
Ben Collier,
Alice Hutchings,
Richard Clayton,
Ross Anderson
Abstract:
There has been substantial commentary on the role of cyberattacks carried out by low-level cybercrime actors in the Russia-Ukraine conflict. We analyse 358k website defacement attacks, 1.7M UDP amplification DDoS attacks, 1764 posts made by 372 users on Hack Forums mentioning the two countries, and 441 Telegram announcements (with 58k replies) of a volunteer hacking group for two months before and…
▽ More
There has been substantial commentary on the role of cyberattacks carried out by low-level cybercrime actors in the Russia-Ukraine conflict. We analyse 358k website defacement attacks, 1.7M UDP amplification DDoS attacks, 1764 posts made by 372 users on Hack Forums mentioning the two countries, and 441 Telegram announcements (with 58k replies) of a volunteer hacking group for two months before and four months after the invasion. We find the conflict briefly but notably caught the attention of low-level cybercrime actors, with significant increases in online discussion and both types of attacks targeting Russia and Ukraine. However, there was little evidence of high-profile actions; the role of these players in the ongoing hybrid warfare is minor, and they should be separated from persistent and motivated 'hacktivists' in state-sponsored operations. Their involvement in the conflict appears to have been short-lived and fleeting, with a clear loss of interest in discussing the situation and carrying out both website defacement and DDoS attacks against either Russia or Ukraine after just a few weeks.
△ Less
Submitted 13 April, 2024; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers
Authors:
B. G. Palm,
F. M. Bayer,
R. Machado,
M. I. Pettersson,
V. T. Vu,
R. J. Cintra
Abstract:
The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter esti…
▽ More
The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter estimators robust to the presence of outliers. The proposed approach considered the weighted maximum likelihood method and was submitted to numerical experiments using simulated and measured SAR images. Monte Carlo simulations were employed for the numerical assessment of the proposed robust estimator performance in finite signal lengths, their sensitivity to outliers, and the breakdown point. For instance, the non-robust estimators show a relative bias value $65$-fold larger than the results provided by the robust approach in corrupted signals. In terms of sensitivity analysis and break down point, the robust scheme resulted in a reduction of about $96\%$ and $10\%$, respectively, in the mean absolute value of both measures, in compassion to the non-robust estimators. Moreover, two SAR data sets were used to compare the ground type and anomaly detection results of the proposed robust scheme with competing methods in the literature.
△ Less
Submitted 29 July, 2022;
originally announced August 2022.
-
Autoregressive Model for Multi-Pass SAR Change Detection Based on Image Stacks
Authors:
B. G. Palm,
D. I. Alves,
V. T. Vu,
M. I. Pettersson,
F. M. Bayer,
R. J. Cintra,
R. Machado,
P. Dammert,
H. Hellsten
Abstract:
Change detection is an important synthetic aperture radar (SAR) application, usually used to detect changes on the ground scene measurements in different moments in time. Traditionally, change detection algorithm (CDA) is mainly designed for two synthetic aperture radar (SAR) images retrieved at different instants. However, more images can be used to improve the algorithms performance, witch emerg…
▽ More
Change detection is an important synthetic aperture radar (SAR) application, usually used to detect changes on the ground scene measurements in different moments in time. Traditionally, change detection algorithm (CDA) is mainly designed for two synthetic aperture radar (SAR) images retrieved at different instants. However, more images can be used to improve the algorithms performance, witch emerges as a research topic on SAR change detection. Image stack information can be treated as a data series over time and can be modeled by autoregressive (AR) models. Thus, we present some initial findings on SAR change detection based on image stack considering AR models. Applying AR model for each pixel position in the image stack, we obtained an estimated image of the ground scene which can be used as a reference image for CDA. The experimental results reveal that ground scene estimates by the AR models is accurate and can be used for change detection applications.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography
Authors:
Hieu T. Nguyen,
Ha Q. Nguyen,
Hieu H. Pham,
Khanh Lam,
Linh T. Le,
Minh Dao,
Van Vu
Abstract:
Mammography, or breast X-ray, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe or CADx) tools have been developed to support physicians and improve the accuracy of interpreting mammography. However, most published datasets of mammography are either limited on sampl…
▽ More
Mammography, or breast X-ray, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe or CADx) tools have been developed to support physicians and improve the accuracy of interpreting mammography. However, most published datasets of mammography are either limited on sample size or digitalized from screen-film mammography (SFM), hindering the development of CADe and CADx tools which are developed based on full-field digital mammography (FFDM). To overcome this challenge, we introduce VinDr-Mammo - a new benchmark dataset of FFDM for detecting and diagnosing breast cancer and other diseases in mammography. The dataset consists of 5,000 mammography exams, each of which has four standard views and is double read with disagreement (if any) being resolved by arbitration. It is created for the assessment of Breast Imaging Reporting and Data System (BI-RADS) and density at the breast level. In addition, the dataset also provides the category, location, and BI-RADS assessment of non-benign findings. We make VinDr-Mammo publicly available on PhysioNet as a new imaging resource to promote advances in developing CADe and CADx tools for breast cancer screening.
△ Less
Submitted 16 March, 2023; v1 submitted 20 March, 2022;
originally announced March 2022.
-
ExtremeBB: A Database for Large-Scale Research into Online Hate, Harassment, the Manosphere and Extremism
Authors:
Anh V. Vu,
Lydia Wilson,
Yi Ting Chua,
Ilia Shumailov,
Ross Anderson
Abstract:
We introduce ExtremeBB, a textual database of over 53.5M posts made by 38.5k users on 12 extremist bulletin board forums promoting online hate, harassment, the manosphere and other forms of extremism. It enables large-scale analyses of qualitative and quantitative historical trends going back two decades: measuring hate speech and toxicity; tracing the evolution of different strands of extremist i…
▽ More
We introduce ExtremeBB, a textual database of over 53.5M posts made by 38.5k users on 12 extremist bulletin board forums promoting online hate, harassment, the manosphere and other forms of extremism. It enables large-scale analyses of qualitative and quantitative historical trends going back two decades: measuring hate speech and toxicity; tracing the evolution of different strands of extremist ideology; tracking the relationships between online subcultures, extremist behaviours, and real-world violence; and monitoring extremist communities in near real time. This can shed light not only on the spread of problematic ideologies but also the effectiveness of interventions. ExtremeBB comes with a robust ethical data-sharing regime that allows us to share data with academics worldwide. Since 2020, access has been granted to 49 licensees in 16 research groups from 12 institutions.
△ Less
Submitted 20 August, 2023; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Endurance-Limited Memories: Capacity and Codes
Authors:
Yeow Meng Chee,
Michal Horovitz,
Alexander Vardy,
Van Khu Vu,
Eitan Yaakobi
Abstract:
\emph{Resistive memories}, such as \emph{phase change memories} and \emph{resistive random access memories} have attracted significant attention in recent years due to their better scalability, speed, rewritability, and yet non-volatility. However, their \emph{limited endurance} is still a major drawback that has to be improved before they can be widely adapted in large-scale systems.
In this wo…
▽ More
\emph{Resistive memories}, such as \emph{phase change memories} and \emph{resistive random access memories} have attracted significant attention in recent years due to their better scalability, speed, rewritability, and yet non-volatility. However, their \emph{limited endurance} is still a major drawback that has to be improved before they can be widely adapted in large-scale systems.
In this work, in order to reduce the wear out of the cells, we propose a new coding scheme, called \emph{endurance-limited memories} (\emph{ELM}) codes, that increases the endurance of these memories by limiting the number of cell programming operations. Namely, an \emph{$\ell$-change $t$-write ELM code} is a coding scheme that allows to write $t$ messages into some $n$ binary cells while guaranteeing that each cell is programmed at most $\ell$ times. In case $\ell=1$, these codes coincide with the well-studied \emph{write-once memory} (\emph{WOM}) codes. We study some models of these codes which depend upon whether the encoder knows on each write the number of times each cell was programmed, knows only the memory state, or even does not know anything. For the decoder, we consider these similar three cases. We fully characterize the capacity regions and the maximum sum-rates of three models where the encoder knows on each write the number of times each cell was programmed. In particular, it is shown that in these models the maximum sum-rate is $\log \sum_{i=0}^{\ell} {t \choose i}$. We also study and expose the capacity regions of the models where the decoder is informed with the number of times each cell was programmed. Finally we present the most practical model where the encoder read the memory before encoding new data and the decoder has no information about the previous states of the memory.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
VinDr-SpineXR: A deep learning framework for spinal lesions detection and classification from radiographs
Authors:
Hieu T. Nguyen,
Hieu H. Pham,
Nghia T. Nguyen,
Ha Q. Nguyen,
Thang Q. Huynh,
Minh Dao,
Van Vu
Abstract:
Radiographs are used as the most important imaging tool for identifying spine anomalies in clinical practice. The evaluation of spinal bone lesions, however, is a challenging task for radiologists. This work aims at developing and evaluating a deep learning-based framework, named VinDr-SpineXR, for the classification and localization of abnormalities from spine X-rays. First, we build a large data…
▽ More
Radiographs are used as the most important imaging tool for identifying spine anomalies in clinical practice. The evaluation of spinal bone lesions, however, is a challenging task for radiologists. This work aims at developing and evaluating a deep learning-based framework, named VinDr-SpineXR, for the classification and localization of abnormalities from spine X-rays. First, we build a large dataset, comprising 10,468 spine X-ray images from 5,000 studies, each of which is manually annotated by an experienced radiologist with bounding boxes around abnormal findings in 13 categories. Using this dataset, we then train a deep learning classifier to determine whether a spine scan is abnormal and a detector to localize 7 crucial findings amongst the total 13. The VinDr-SpineXR is evaluated on a test set of 2,078 images from 1,000 studies, which is kept separate from the training set. It demonstrates an area under the receiver operating characteristic curve (AUROC) of 88.61% (95% CI 87.19%, 90.02%) for the image-level classification task and a mean average precision ([email protected]) of 33.56% for the lesion-level localization task. These results serve as a proof of concept and set a baseline for future research in this direction. To encourage advances, the dataset, codes, and trained deep learning models are made publicly available.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Multi-Agent Reinforcement Learning for Channel Assignment and Power Allocation in Platoon-Based C-V2X Systems
Authors:
Hung V. Vu,
Mohammad Farzanullah,
Zheyu Liu,
Duy H. N. Nguyen,
Robert Morawski,
Tho Le-Ngoc
Abstract:
We consider the problem of joint channel assignment and power allocation in underlaid cellular vehicular-to-everything (C-V2X) systems where multiple vehicle-to-network (V2N) uplinks share the time-frequency resources with multiple vehicle-to-vehicle (V2V) platoons that enable groups of connected and autonomous vehicles to travel closely together. Due to the nature of high user mobility in vehicul…
▽ More
We consider the problem of joint channel assignment and power allocation in underlaid cellular vehicular-to-everything (C-V2X) systems where multiple vehicle-to-network (V2N) uplinks share the time-frequency resources with multiple vehicle-to-vehicle (V2V) platoons that enable groups of connected and autonomous vehicles to travel closely together. Due to the nature of high user mobility in vehicular environment, traditional centralized optimization approach relying on global channel information might not be viable in C-V2X systems with large number of users. Utilizing a multi-agent reinforcement learning (RL) approach, we propose a distributed resource allocation (RA) algorithm to overcome this challenge. Specifically, we model the RA problem as a multi-agent system. Based solely on the local channel information, each platoon leader, acting as an agent, collectively interacts with each other and accordingly selects the optimal combination of sub-band and power level to transmit its signals. Toward this end, we utilize the double deep Q-learning algorithm to jointly train the agents under the objectives of simultaneously maximizing the sum-rate of V2N links and satisfying the packet delivery probability of each V2V link in a desired latency limitation. Simulation results show that our proposed RL-based algorithm provides a close performance compared to that of the well-known exhaustive search algorithm.
△ Less
Submitted 19 June, 2022; v1 submitted 9 November, 2020;
originally announced November 2020.
-
Constrained de Bruijn Codes: Properties, Enumeration, Constructions, and Applications
Authors:
Yeow Meng Chee,
Tuvi Etzion,
Han Mao Kiah,
Alexander Vardy,
Van Khu Vu,
Eitan yaakobi
Abstract:
The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize sequences in the de Bruijn graph are defined. These sequences can be also defined and viewed as constrained sequences.…
▽ More
The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize sequences in the de Bruijn graph are defined. These sequences can be also defined and viewed as constrained sequences. Hence, they will be called constrained de Bruijn sequences and a set of such sequences will be called a constrained de Bruijn code. Several properties and alternative definitions for such codes are examined and they are analyzed as generalized sequences in the de Bruijn graph (and its generalization) and as constrained sequences. Various enumeration techniques are used to compute the total number of sequences for any given set of parameters. A construction method of such codes from the theory of shift-register sequences is proposed. Finally, we show how these constrained de Bruijn sequences and codes can be applied in constructions of codes for correcting synchronization errors in the $\ell$-symbol read channel and in the racetrack memory channel. For this purpose, these codes are superior in their size on previously known codes.
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
Mining Shape of Expertise: A Novel Approach Based on Convolutional Neural Network
Authors:
Mahdi Dehghan,
Hossein A. Rahmani,
Ahmad Ali Abin,
Viet-Vu Vu
Abstract:
Expert finding addresses the task of retrieving and ranking talented people on the subject of user query. It is a practical issue in the Community Question Answering networks. Recruiters looking for knowledgeable people for their job positions are the most important clients of expert finding systems. In addition to employee expertise, the cost of hiring new staff is another significant concern for…
▽ More
Expert finding addresses the task of retrieving and ranking talented people on the subject of user query. It is a practical issue in the Community Question Answering networks. Recruiters looking for knowledgeable people for their job positions are the most important clients of expert finding systems. In addition to employee expertise, the cost of hiring new staff is another significant concern for organizations. An efficient solution to cope with this concern is to hire T-shaped experts that are cost-effective. In this study, we have proposed a new deep model for T-shaped experts finding based on Convolutional Neural Networks. The proposed model tries to match queries and users by extracting local and position-invariant features from their corresponding documents. In other words, it detects users' shape of expertise by learning patterns from documents of users and queries simultaneously. The proposed model contains two parallel CNN's that extract latent vectors of users and queries based on their corresponding documents and join them together in the last layer to match queries with users. Experiments on a large subset of Stack Overflow documents indicate the effectiveness of the proposed method against baselines in terms of NDCG, MRR, and ERR evaluation metrics.
△ Less
Submitted 5 April, 2020;
originally announced April 2020.
-
Proceedings of the 11th Asia-Europe Workshop on Concepts in Information Theory
Authors:
A. J. Han Vinck,
Kees A. Schouhamer Immink,
Tadashi Wadayama,
Van Khu Vu,
Akiko Manada,
Kui Cai,
Shunsuke Horii,
Yoshiki Abe,
Mitsugu Iwamoto,
Kazuo Ohta,
Xingwei Zhong,
Zhen Mei,
Renfei Bu,
J. H. Weber,
Vitaly Skachek,
Hiroyoshi Morita,
N. Hovhannisyan,
Hiroshi Kamabe,
Shan Lu,
Hirosuke Yamamoto,
Kengo Hasimoto,
O. Ytrehus,
Shigeaki Kuzuoaka,
Mikihiko Nishiara,
Han Mao Kiah
, et al. (2 additional authors not shown)
Abstract:
This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay a tribute to a special lecturer in our community.…
▽ More
This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay a tribute to a special lecturer in our community. This year we selected Hiroyoshi Morita, who is a well known information theorist with many original contributions.
△ Less
Submitted 26 June, 2019;
originally announced July 2019.
-
Matrices with Gaussian noise: optimal estimates for singular subspace perturbation
Authors:
Sean O'Rourke,
Van Vu,
Ke Wang
Abstract:
The Davis-Kahan-Wedin $\sin Θ$ theorem describes how the singular subspaces of a matrix change when subjected to a small perturbation. This classic result is sharp in the worst case scenario. In this paper, we prove a stochastic version of the Davis-Kahan-Wedin $\sin Θ$ theorem when the perturbation is a Gaussian random matrix. Under certain structural assumptions, we obtain an optimal bound that…
▽ More
The Davis-Kahan-Wedin $\sin Θ$ theorem describes how the singular subspaces of a matrix change when subjected to a small perturbation. This classic result is sharp in the worst case scenario. In this paper, we prove a stochastic version of the Davis-Kahan-Wedin $\sin Θ$ theorem when the perturbation is a Gaussian random matrix. Under certain structural assumptions, we obtain an optimal bound that significantly improves upon the classic Davis-Kahan-Wedin $\sin Θ$ theorem. One of our key tools is a new perturbation bound for the singular values, which may be of independent interest.
△ Less
Submitted 29 December, 2023; v1 submitted 1 March, 2018;
originally announced March 2018.
-
Coding for Racetrack Memories
Authors:
Yeow Meng Chee,
Han Mao Kiah,
Alexander Vardy,
Van Khu Vu,
Eitan Yaakobi
Abstract:
Racetrack memory is a new technology which utilizes magnetic domains along a nanoscopic wire in order to obtain extremely high storage density. In racetrack memory, each magnetic domain can store a single bit of information, which can be sensed by a reading port (head). The memory has a tape-like structure which supports a shift operation that moves the domains to be read sequentially by the head.…
▽ More
Racetrack memory is a new technology which utilizes magnetic domains along a nanoscopic wire in order to obtain extremely high storage density. In racetrack memory, each magnetic domain can store a single bit of information, which can be sensed by a reading port (head). The memory has a tape-like structure which supports a shift operation that moves the domains to be read sequentially by the head. In order to increase the memory's speed, prior work studied how to minimize the latency of the shift operation, while the no less important reliability of this operation has received only a little attention.
In this work we design codes which combat shift errors in racetrack memory, called position errors. Namely, shifting the domains is not an error-free operation and the domains may be over-shifted or are not shifted, which can be modeled as deletions and sticky insertions. While it is possible to use conventional deletion and insertion-correcting codes, we tackle this problem with the special structure of racetrack memory, where the domains can be read by multiple heads. Each head outputs a noisy version of the stored data and the multiple outputs are combined in order to reconstruct the data. Under this paradigm, we will show that it is possible to correct, with at most a single bit of redundancy, $d$ deletions with $d+1$ heads if the heads are well-separated. Similar results are provided for burst of deletions, sticky insertions and combinations of both deletions and sticky insertions.
△ Less
Submitted 24 January, 2017;
originally announced January 2017.
-
Anti-concentration for polynomials of independent random variables
Authors:
Raghu Meka,
Oanh Nguyen,
Van Vu
Abstract:
We prove anti-concentration results for polynomials of independent random variables with arbitrary degree. Our results extend the classical Littlewood-Offord result for linear polynomials, and improve several earlier estimates.
We discuss applications in two different areas. In complexity theory, we prove near optimal lower bounds for computing the Parity, addressing a challenge in complexity th…
▽ More
We prove anti-concentration results for polynomials of independent random variables with arbitrary degree. Our results extend the classical Littlewood-Offord result for linear polynomials, and improve several earlier estimates.
We discuss applications in two different areas. In complexity theory, we prove near optimal lower bounds for computing the Parity, addressing a challenge in complexity theory posed by Razborov and Viola, and also address a problem concerning OR functions. In random graph theory, we derive a general anti-concentration result on the number of copies of a fixed graph in a random graph.
△ Less
Submitted 7 August, 2015; v1 submitted 3 July, 2015;
originally announced July 2015.
-
Stochastic Block Model and Community Detection in the Sparse Graphs: A spectral algorithm with optimal rate of recovery
Authors:
Peter Chin,
Anup Rao,
Van Vu
Abstract:
In this paper, we present and analyze a simple and robust spectral algorithm for the stochastic block model with $k$ blocks, for any $k$ fixed. Our algorithm works with graphs having constant edge density, under an optimal condition on the gap between the density inside a block and the density between the blocks. As a co-product, we settle an open question posed by Abbe et. al. concerning censor b…
▽ More
In this paper, we present and analyze a simple and robust spectral algorithm for the stochastic block model with $k$ blocks, for any $k$ fixed. Our algorithm works with graphs having constant edge density, under an optimal condition on the gap between the density inside a block and the density between the blocks. As a co-product, we settle an open question posed by Abbe et. al. concerning censor block models.
△ Less
Submitted 24 June, 2015; v1 submitted 20 January, 2015;
originally announced January 2015.
-
A simple SVD algorithm for finding hidden partitions
Authors:
Van Vu
Abstract:
Finding a hidden partition in a random environment is a general and important problem, which contains as subproblems many famous questions, such as finding a hidden clique, finding a hidden coloring, finding a hidden bipartition etc.
In this paper, we provide a simple SVD algorithm for this purpose, answering a question of McSherry. This algorithm is very easy to implement and works for sparse g…
▽ More
Finding a hidden partition in a random environment is a general and important problem, which contains as subproblems many famous questions, such as finding a hidden clique, finding a hidden coloring, finding a hidden bipartition etc.
In this paper, we provide a simple SVD algorithm for this purpose, answering a question of McSherry. This algorithm is very easy to implement and works for sparse graphs with optimal density.
△ Less
Submitted 15 April, 2014;
originally announced April 2014.
-
Resolution in Linguistic First Order Logic based on Linear Symmetrical Hedge Algebra
Authors:
Thi-Minh-Tam Nguyen,
Viet-Trung Vu,
The-Vinh Doan,
Duc-Khanh Tran
Abstract:
This paper focuses on resolution in linguistic first order logic with truth value taken from linear symmetrical hedge algebra. We build the basic components of linguistic first order logic, including syntax and semantics. We present a resolution principle for our logic to resolve on two clauses having contradictory linguistic truth values. Since linguistic information is uncertain, inference in ou…
▽ More
This paper focuses on resolution in linguistic first order logic with truth value taken from linear symmetrical hedge algebra. We build the basic components of linguistic first order logic, including syntax and semantics. We present a resolution principle for our logic to resolve on two clauses having contradictory linguistic truth values. Since linguistic information is uncertain, inference in our linguistic logic is approximate. Therefore, we introduce the concept of reliability in order to capture the natural approximation of the resolution inference rule.
△ Less
Submitted 30 March, 2014; v1 submitted 25 March, 2014;
originally announced March 2014.
-
Resolution in Linguistic Propositional Logic based on Linear Symmetrical Hedge Algebra
Authors:
Thi-Minh-Tam Nguyen,
Viet-Trung Vu,
The-Vinh Doan,
Duc-Khanh Tran
Abstract:
The paper introduces a propositional linguistic logic that serves as the basis for automated uncertain reasoning with linguistic information. First, we build a linguistic logic system with truth value domain based on a linear symmetrical hedge algebra. Then, we consider Gödel's t-norm and t-conorm to define the logical connectives for our logic. Next, we present a resolution inference rule, in whi…
▽ More
The paper introduces a propositional linguistic logic that serves as the basis for automated uncertain reasoning with linguistic information. First, we build a linguistic logic system with truth value domain based on a linear symmetrical hedge algebra. Then, we consider Gödel's t-norm and t-conorm to define the logical connectives for our logic. Next, we present a resolution inference rule, in which two clauses having contradictory linguistic truth values can be resolved. We also give the concept of reliability in order to capture the approximative nature of the resolution inference rule. Finally, we propose a resolution procedure with the maximal reliability.
△ Less
Submitted 30 July, 2013; v1 submitted 29 July, 2013;
originally announced July 2013.
-
Minimax Rates of Estimation for Sparse PCA in High Dimensions
Authors:
Vincent Q. Vu,
Jing Lei
Abstract:
We study sparse principal components analysis in the high-dimensional setting, where $p$ (the number of variables) can be much larger than $n$ (the number of observations). We prove optimal, non-asymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs to an $\ell_q$ ball for $q \in [0,1]$. Our bounds are sharp in $p$ and $n$ for all…
▽ More
We study sparse principal components analysis in the high-dimensional setting, where $p$ (the number of variables) can be much larger than $n$ (the number of observations). We prove optimal, non-asymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs to an $\ell_q$ ball for $q \in [0,1]$. Our bounds are sharp in $p$ and $n$ for all $q \in [0, 1]$ over a wide class of distributions. The upper bound is obtained by analyzing the performance of $\ell_q$-constrained PCA. In particular, our results provide convergence rates for $\ell_1$-constrained PCA.
△ Less
Submitted 5 February, 2012; v1 submitted 3 February, 2012;
originally announced February 2012.
-
Information In The Non-Stationary Case
Authors:
Vincent Q. Vu,
Bin Yu,
Robert E. Kass
Abstract:
Information estimates such as the ``direct method'' of Strong et al. (1998) sidestep the difficult problem of estimating the joint distribution of response and stimulus by instead estimating the difference between the marginal and conditional entropies of the response. While this is an effective estimation strategy, it tempts the practitioner to ignore the role of the stimulus and the meaning of…
▽ More
Information estimates such as the ``direct method'' of Strong et al. (1998) sidestep the difficult problem of estimating the joint distribution of response and stimulus by instead estimating the difference between the marginal and conditional entropies of the response. While this is an effective estimation strategy, it tempts the practitioner to ignore the role of the stimulus and the meaning of mutual information. We show here that, as the number of trials increases indefinitely, the direct (or ``plug-in'') estimate of marginal entropy converges (with probability 1) to the entropy of the time-averaged conditional distribution of the response, and the direct estimate of the conditional entropy converges to the time-averaged entropy of the conditional distribution of the response. Under joint stationarity and ergodicity of the response and stimulus, the difference of these quantities converges to the mutual information. When the stimulus is deterministic or non-stationary the direct estimate of information no longer estimates mutual information, which is no longer meaningful, but it remains a measure of variability of the response distribution across time.
△ Less
Submitted 18 July, 2008; v1 submitted 24 June, 2008;
originally announced June 2008.