-
SEL-CIE: Knowledge-Guided Self-Supervised Learning Framework for CIE-XYZ Reconstruction from Non-Linear sRGB Images
Authors:
Shir Barzel,
Moshe Salhov,
Ofir Lindenbaum,
Amir Averbuch
Abstract:
Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recog…
▽ More
Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recognition tasks in medical applications, where color accuracy is important. However, images are usually saved in non-linear states, and achieving CIE-XYZ color images using conventional methods is not always possible. To tackle this issue, classical methodologies have been developed that focus on reversing the acquisition pipeline. More recently, supervised learning has been employed, using paired CIE-XYZ and sRGB representations of identical images. However, obtaining a large-scale dataset of CIE-XYZ and sRGB pairs can be challenging. To overcome this limitation and mitigate the reliance on large amounts of paired data, self-supervised learning (SSL) can be utilized as a substitute for relying solely on paired data. This paper proposes a framework for using SSL methods alongside paired data to reconstruct CIE-XYZ images and re-render sRGB images, outperforming existing approaches. The proposed framework is applied to the sRGB2XYZ dataset.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models
Authors:
Guy Zamberg,
Moshe Salhov,
Ofir Lindenbaum,
Amir Averbuch
Abstract:
Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by uti…
▽ More
Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
Cross-boosting of WNNM Image Denoising method by Directional Wavelet Packets
Authors:
Amir Averbuch,
Pekka Neittaanmäki,
Valery Zheludev,
Moshe Salhov,
Jonathan Hauser
Abstract:
The paper presents an image denoising scheme by combining a method that is based on directional quasi-analytic wavelet packets (qWPs) with the state-of-the-art Weighted Nuclear Norm Minimization (WNNM) denoising algorithm. The qWP-based denoising method (qWPdn) consists of multiscale qWP transform of the degraded image, application of adaptive localized soft thresholding to the transform coefficie…
▽ More
The paper presents an image denoising scheme by combining a method that is based on directional quasi-analytic wavelet packets (qWPs) with the state-of-the-art Weighted Nuclear Norm Minimization (WNNM) denoising algorithm. The qWP-based denoising method (qWPdn) consists of multiscale qWP transform of the degraded image, application of adaptive localized soft thresholding to the transform coefficients using the Bivariate Shrinkage methodology, and restoration of the image from the thresholded coefficients from several decomposition levels. The combined method consists of several iterations of qWPdn and WNNM algorithms in a way that at each iteration the output from one algorithm boosts the input to the other. The proposed methodology couples the qWPdn capabilities to capture edges and fine texture patterns even in the severely corrupted images with utilizing the non-local self-similarity in real images that is inherent in the WNNM algorithm.
Multiple experiments, which compared the proposed methodology with six advanced denoising algorithms, including WNNM, confirmed that the combined cross-boosting algorithm outperforms most of them in terms of both quantitative measure and visual perception quality.
△ Less
Submitted 9 May, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Imbalanced Classification via a Tabular Translation GAN
Authors:
Jonathan Gradstein,
Moshe Salhov,
Yoav Tulpan,
Ofir Lindenbaum,
Amir Averbuch
Abstract:
When presented with a binary classification problem where the data exhibits severe class imbalance, most standard predictive methods may fail to accurately model the minority class. We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. This translation mechanism encourages the syn…
▽ More
When presented with a binary classification problem where the data exhibits severe class imbalance, most standard predictive methods may fail to accurately model the minority class. We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. This translation mechanism encourages the synthesized samples to be close to the class boundary. Furthermore, we explore a selection criterion to retain the most useful of the synthesized samples. Experimental results using several downstream classifiers on a variety of tabular class-imbalanced datasets show that the proposed method improves average precision when compared to alternative re-weighting and oversampling techniques.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
Imbalanced Classification via Explicit Gradient Learning From Augmented Data
Authors:
Bronislav Yasinnik,
Moshe Salhov,
Ofir Lindenbaum,
Amir Averbuch
Abstract:
Learning from imbalanced data is one of the most significant challenges in real-world classification tasks. In such cases, neural networks performance is substantially impaired due to preference towards the majority class. Existing approaches attempt to eliminate the bias through data re-sampling or re-weighting the loss in the learning process. Still, these methods tend to overfit the minority sa…
▽ More
Learning from imbalanced data is one of the most significant challenges in real-world classification tasks. In such cases, neural networks performance is substantially impaired due to preference towards the majority class. Existing approaches attempt to eliminate the bias through data re-sampling or re-weighting the loss in the learning process. Still, these methods tend to overfit the minority samples and perform poorly when the structure of the minority class is highly irregular. Here, we propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances. These additional data are incorporated in the classifier's deep-learning process, and their contributions are learned explicitly. The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios.
△ Less
Submitted 28 October, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Automated identification of transiting exoplanet candidates in NASA Transiting Exoplanets Survey Satellite (TESS) data with machine learning methods
Authors:
Leon Ofman,
Amir Averbuch,
Adi Shliselberg,
Idan Benaun,
David Segev,
Aron Rissman
Abstract:
A novel artificial intelligence (AI) technique that uses machine learning (ML) methodologies combines several algorithms, which were developed by ThetaRay, Inc., is applied to NASA's Transiting Exoplanets Survey Satellite (TESS) dataset to identify exoplanetary candidates. The AI/ML ThetaRay system is trained initially with Kepler exoplanetary data and validated with confirmed exoplanets before it…
▽ More
A novel artificial intelligence (AI) technique that uses machine learning (ML) methodologies combines several algorithms, which were developed by ThetaRay, Inc., is applied to NASA's Transiting Exoplanets Survey Satellite (TESS) dataset to identify exoplanetary candidates. The AI/ML ThetaRay system is trained initially with Kepler exoplanetary data and validated with confirmed exoplanets before its application to TESS data. Existing and new features of the data, based on various observational parameters, are constructed and used in the AI/ML analysis by employing semi-supervised and unsupervised machine learning techniques. By the application of ThetaRay system to 10,803 light curves of threshold crossing events (TCEs) produced by the TESS mission, obtained from the Mikulski Archive for Space Telescopes, the algorithm yields about 50 targets for further analysis, and we uncover three new exoplanetary candidates by further manual vetting. This study demonstrates for the first time the successful application of the particular combined multiple AI/ML-based methodologies to a large astrophysical dataset for rapid automated classification of TCEs.
△ Less
Submitted 27 August, 2021; v1 submitted 20 February, 2021;
originally announced February 2021.
-
$\ell_0$-based Sparse Canonical Correlation Analysis
Authors:
Ofir Lindenbaum,
Moshe Salhov,
Amir Averbuch,
Yuval Kluger
Abstract:
Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed \textit{canonical variates} are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalit…
▽ More
Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed \textit{canonical variates} are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalities exceeds the number of samples. Moreover, often a significant fraction of the variables measures modality-specific information, and thus removing them is beneficial for identifying the \textit{canonically correlated variates}. Here, we propose $\ell_0$-CCA, a method for learning correlated representations based on sparse subsets of variables from two observed modalities. Sparsity is obtained by multiplying the input variables by stochastic gates, whose parameters are learned together with the CCA weights via an $\ell_0$-regularized correlation loss. We further propose $\ell_0$-Deep CCA for solving the problem of non-linear sparse CCA by modeling the correlated representations using deep nets. We demonstrate the efficacy of the method using several synthetic and real examples. Most notably, by gating nuisance input variables, our approach improves the extracted representations compared to other linear, non-linear and sparse CCA-based models.
△ Less
Submitted 8 June, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.
-
The LDBC Social Network Benchmark
Authors:
Renzo Angles,
János Benjamin Antal,
Alex Averbuch,
Altan Birler,
Peter Boncz,
Márton Búr,
Orri Erling,
Andrey Gubichev,
Vlad Haprian,
Moritz Kaufmann,
Josep Lluís Larriba Pey,
Norbert Martínez,
József Marton,
Marcus Paradies,
Minh-Duc Pham,
Arnau Prat-Pérez,
David Püroja,
Mirko Spasić,
Benjamin A. Steer,
Dávid Szakállas,
Gábor Szárnyas,
Jack Waudby,
Mingxi Wu,
Yuchen Zhang
Abstract:
The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social network, characterized by its graph-shaped data. LDBC SNB consists of two workloads that focus on different functionalities: the Interactive workload (int…
▽ More
The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social network, characterized by its graph-shaped data. LDBC SNB consists of two workloads that focus on different functionalities: the Interactive workload (interactive transactional queries) and the Business Intelligence workload (analytical queries). This document contains the definition of both workloads. This includes a detailed explanation of the data used in the LDBC SNB, a detailed description for all queries, and instructions on how to generate the data and run the benchmark with the provided software.
△ Less
Submitted 7 September, 2024; v1 submitted 7 January, 2020;
originally announced January 2020.
-
Similarity Search Over Graphs Using Localized Spectral Analysis
Authors:
Yariv Aizenbud,
Amir Averbuch,
Gil Shabat,
Guy Ziv
Abstract:
This paper provides a new similarity detection algorithm. Given an input set of multi-dimensional data points, where each data point is assumed to be multi-dimensional, and an additional reference data point for similarity finding, the algorithm uses kernel method that embeds the data points into a low dimensional manifold. Unlike other kernel methods, which consider the entire data for the embedd…
▽ More
This paper provides a new similarity detection algorithm. Given an input set of multi-dimensional data points, where each data point is assumed to be multi-dimensional, and an additional reference data point for similarity finding, the algorithm uses kernel method that embeds the data points into a low dimensional manifold. Unlike other kernel methods, which consider the entire data for the embedding, our method selects a specific set of kernel eigenvectors. The eigenvectors are chosen to separate between the data points and the reference data point so that similar data points can be easily identified as being distinct from most of the members in the dataset.
△ Less
Submitted 11 July, 2017;
originally announced July 2017.
-
Kernel Scaling for Manifold Learning and Classification
Authors:
Ofir Lindenbaum,
Moshe Salhov,
Arie Yeredor,
Amir Averbuch
Abstract:
Kernel methods play a critical role in many machine learning algorithms. They are useful in manifold learning, classification, clustering and other data analysis tasks. Setting the kernel's scale parameter, also referred to as the kernel's bandwidth, highly affects the performance of the task in hand. We propose to set a scale parameter that is tailored to one of two types of tasks: classification…
▽ More
Kernel methods play a critical role in many machine learning algorithms. They are useful in manifold learning, classification, clustering and other data analysis tasks. Setting the kernel's scale parameter, also referred to as the kernel's bandwidth, highly affects the performance of the task in hand. We propose to set a scale parameter that is tailored to one of two types of tasks: classification and manifold learning. For manifold learning, we seek a scale which is best at capturing the manifold's intrinsic dimension. For classification, we propose three methods for estimating the scale, which optimize the classification results in different senses. The proposed frameworks are simulated on artificial and on real datasets. The results show a high correlation between optimal classification rates and the estimated scales. Finally, we demonstrate the approach on a seismic event classification task.
△ Less
Submitted 4 June, 2019; v1 submitted 4 July, 2017;
originally announced July 2017.
-
Multi-View Kernels for Low-Dimensional Modeling of Seismic Events
Authors:
Ofir Lindenbaum,
Yuri Bregman,
Neta Rabin,
Amir Averbuch
Abstract:
The problem of learning from seismic recordings has been studied for years. There is a growing interest in developing automatic mechanisms for identifying the properties of a seismic event. One main motivation is the ability have a reliable identification of man-made explosions. The availability of multiple high-dimensional observations has increased the use of machine learning techniques in a var…
▽ More
The problem of learning from seismic recordings has been studied for years. There is a growing interest in developing automatic mechanisms for identifying the properties of a seismic event. One main motivation is the ability have a reliable identification of man-made explosions. The availability of multiple high-dimensional observations has increased the use of machine learning techniques in a variety of fields. In this work, we propose to use a kernel-fusion based dimensionality reduction framework for generating meaningful seismic representations from raw data. The proposed method is tested on 2023 events that were recorded in Israel and in Jordan. The method achieves promising results in classification of event type as well as in estimating the location of the event. The proposed fusion and dimensionality reduction tools may be applied to other types of geophysical data.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
Incomplete Pivoted QR-based Dimensionality Reduction
Authors:
Amit Bermanis,
Aviv Rotbart,
Moshe Salhov,
Amir Averbuch
Abstract:
High-dimensional big data appears in many research fields such as image recognition, biology and collaborative filtering. Often, the exploration of such data by classic algorithms is encountered with difficulties due to `curse of dimensionality' phenomenon. Therefore, dimensionality reduction methods are applied to the data prior to its analysis. Many of these methods are based on principal compon…
▽ More
High-dimensional big data appears in many research fields such as image recognition, biology and collaborative filtering. Often, the exploration of such data by classic algorithms is encountered with difficulties due to `curse of dimensionality' phenomenon. Therefore, dimensionality reduction methods are applied to the data prior to its analysis. Many of these methods are based on principal components analysis, which is statistically driven, namely they map the data into a low-dimension subspace that preserves significant statistical properties of the high-dimensional data. As a consequence, such methods do not directly address the geometry of the data, reflected by the mutual distances between multidimensional data point. Thus, operations such as classification, anomaly detection or other machine learning tasks may be affected.
This work provides a dictionary-based framework for geometrically driven data analysis that includes dimensionality reduction, out-of-sample extension and anomaly detection. It embeds high-dimensional data in a low-dimensional subspace. This embedding preserves the original high-dimensional geometry of the data up to a user-defined distortion rate. In addition, it identifies a subset of landmark data points that constitute a dictionary for the analyzed dataset. The dictionary enables to have a natural extension of the low-dimensional embedding to out-of-sample data points, which gives rise to a distortion-based criterion for anomaly detection. The suggested method is demonstrated on synthetic and real-world datasets and achieves good results for classification, anomaly detection and out-of-sample tasks.
△ Less
Submitted 12 July, 2016;
originally announced July 2016.
-
Multi-View Kernel Consensus For Data Analysis
Authors:
Moshe Salhov,
Ofir Lindenbaum,
Yariv Aizenbud,
Avi Silberschatz,
Yoel Shkolnisky,
Amir Averbuch
Abstract:
The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phe…
▽ More
The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phenomena into the measured high dimensional observations might distort the distance metric, This distortion can effect the desired estimated low dimensional geometric structure. In this paper, we suggest to utilize the redundancy in the attribute domain by partitioning the attributes into multiple subsets we call views. The proposed methods utilize the agreement also called consensus between different views to extract valuable geometric information that unifies multiple views about the intrinsic relationships among several different observations. This unification enhances the information that a single view or a simple concatenations of views provides.
△ Less
Submitted 29 January, 2019; v1 submitted 28 June, 2016;
originally announced June 2016.
-
Gaussian Process Regression for Out-of-Sample Extension
Authors:
Oren Barkan,
Jonathan Weill,
Amir Averbuch
Abstract:
Manifold learning methods are useful for high dimensional data analysis. Many of the existing methods produce a low dimensional representation that attempts to describe the intrinsic geometric structure of the original data. Typically, this process is computationally expensive and the produced embedding is limited to the training data. In many real life scenarios, the ability to produce embedding…
▽ More
Manifold learning methods are useful for high dimensional data analysis. Many of the existing methods produce a low dimensional representation that attempts to describe the intrinsic geometric structure of the original data. Typically, this process is computationally expensive and the produced embedding is limited to the training data. In many real life scenarios, the ability to produce embedding of unseen samples is essential. In this paper we propose a Bayesian non-parametric approach for out-of-sample extension. The method is based on Gaussian Process Regression and independent of the manifold learning algorithm. Additionally, the method naturally provides a measure for the degree of abnormality for a newly arrived data point that did not participate in the training process. We derive the mathematical connection between the proposed method and the Nystrom extension and show that the latter is a special case of the former. We present extensive experimental results that demonstrate the performance of the proposed method and compare it to other existing out-of-sample extension methods.
△ Less
Submitted 5 June, 2016; v1 submitted 7 March, 2016;
originally announced March 2016.
-
Diffusion Representations
Authors:
Moshe Salhov,
Amit Bermanis,
Guy Wolf,
Amir Averbuch
Abstract:
Diffusion Maps framework is a kernel based method for manifold learning and data analysis that defines diffusion similarities by imposing a Markovian process on the given dataset. Analysis by this process uncovers the intrinsic geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel that incorporates information about the density of the…
▽ More
Diffusion Maps framework is a kernel based method for manifold learning and data analysis that defines diffusion similarities by imposing a Markovian process on the given dataset. Analysis by this process uncovers the intrinsic geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel that incorporates information about the density of the data. Thus, the manifold assumption is replaced by a more general measure-based assumption.
The measure-based diffusion kernel incorporates two separate independent representations. The first determines a measure that correlates with a density that represents normal behaviors and patterns in the data. The second consists of the analyzed multidimensional data points.
In this paper, we present a representation framework for data analysis of datasets that is based on a closed-form decomposition of the measure-based kernel. The proposed representation preserves pairwise diffusion distances that does not depend on the data size while being invariant to scale. For a stationary data, no out-of-sample extension is needed for embedding newly arrived data points in the representation space. Several aspects of the presented methodology are demonstrated on analytically generated data.
△ Less
Submitted 19 November, 2015;
originally announced November 2015.
-
MultiView Diffusion Maps
Authors:
Ofir Lindenbaum,
Arie Yeredor,
Moshe Salhov,
Amir Averbuch
Abstract:
In this paper, we address the challenging task of achieving multi-view dimensionality reduction. The goal is to effectively use the availability of multiple views for extracting a coherent low-dimensional representation of the data. The proposed method exploits the intrinsic relation within each view, as well as the mutual relations between views. The multi-view dimensionality reduction is achieve…
▽ More
In this paper, we address the challenging task of achieving multi-view dimensionality reduction. The goal is to effectively use the availability of multiple views for extracting a coherent low-dimensional representation of the data. The proposed method exploits the intrinsic relation within each view, as well as the mutual relations between views. The multi-view dimensionality reduction is achieved by defining a cross-view model in which an implied random walk process is restrained to hop between objects in the different views. The method is robust to scaling and insensitive to small structural changes in the data. We define new diffusion distances and analyze the spectra of the proposed kernel. We show that the proposed framework is useful for various machine learning applications such as clustering, classification, and manifold learning. Finally, by fusing multi-sensor seismic data we present a method for automatic identification of seismic events.
△ Less
Submitted 4 June, 2019; v1 submitted 22 August, 2015;
originally announced August 2015.
-
Randomized LU decomposition: An Algorithm for Dictionaries Construction
Authors:
Aviv Rotbart,
Gil Shabat,
Yaniv Shmueli,
Amir Averbuch
Abstract:
In recent years, distinctive-dictionary construction has gained importance due to his usefulness in data processing. Usually, one or more dictionaries are constructed from a training data and then they are used to classify signals that did not participate in the training process. A new dictionary construction algorithm is introduced. It is based on a low-rank matrix factorization being achieved by…
▽ More
In recent years, distinctive-dictionary construction has gained importance due to his usefulness in data processing. Usually, one or more dictionaries are constructed from a training data and then they are used to classify signals that did not participate in the training process. A new dictionary construction algorithm is introduced. It is based on a low-rank matrix factorization being achieved by the application of the randomized LU decomposition to a training data. This method is fast, scalable, parallelizable, consumes low memory, outperforms SVD in these categories and works also extremely well on large sparse matrices. In contrast to existing methods, the randomized LU decomposition constructs an under-complete dictionary, which simplifies both the construction and the classification processes of newly arrived signals. The dictionary construction is generic and general that fits different applications. We demonstrate the capabilities of this algorithm for file type identification, which is a fundamental task in digital security arena, performed nowadays for example by sandboxing mechanism, deep packet inspection, firewalls and anti-virus systems. We propose a content-based method that detects file types that neither depend on file extension nor on metadata. Such approach is harder to deceive and we show that only a few file fragments from a whole file are needed for a successful classification. Based on the constructed dictionaries, we show that the proposed method can effectively identify execution code fragments in PDF files.
$\textbf{Keywords.}$ Dictionary construction, classification, LU decomposition, randomized LU decomposition, content-based file detection, computer security.
△ Less
Submitted 27 January, 2018; v1 submitted 17 February, 2015;
originally announced February 2015.
-
Efficient construction of broadcast graphs
Authors:
A. Averbuch,
R. Hollander Shabtai,
Y. Roditty
Abstract:
A broadcast graph is a connected graph, $G=(V,E)$, $ |V |=n$, in which each vertex can complete broadcasting of one message within at most $t=\lceil \log n\rceil$ time units. A minimum broadcast graph on $n$ vertices is a broadcast graph with the minimum number of edges over all broadcast graphs on $n$ vertices. The cardinality of the edge set of such a graph is denoted by $B(n)$. In this paper we…
▽ More
A broadcast graph is a connected graph, $G=(V,E)$, $ |V |=n$, in which each vertex can complete broadcasting of one message within at most $t=\lceil \log n\rceil$ time units. A minimum broadcast graph on $n$ vertices is a broadcast graph with the minimum number of edges over all broadcast graphs on $n$ vertices. The cardinality of the edge set of such a graph is denoted by $B(n)$. In this paper we construct a new broadcast graph with
$B(n) \le (k+1)N -(t-\frac{k}{2}+2)2^{k}+t-k+2$, for $n=N=(2^{k}-1)2^{t+1-k}$ and
$B(n) \le (k+1-p)n -(t-\frac{k}{2}+p+2)2^{k}+t-k -(p-2)2^{p}$, for $2^{t} < n<(2^{k}-1)2^{t+1-k}$, where $t \geq 7$, $2 \le k \le \lfloor t/2 \rfloor -1$ for even $n$ and $2 \le k \le \lceil t/2 \rceil -1$ for odd $n$, $d=N-n$, $x= \lfloor \frac{d}{2^{t+1-k}} \rfloor$ and $ p = \lfloor \log_{2}{(x+1)} \rfloor$ if $x>0$ and $p=0$ if $x=0$.
The new bound is an improvement upon the bound presented by Harutyunyan and Liestman (2012) for odd values of $n$.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
Video Segmentation via Diffusion Bases
Authors:
Dina Dushnik,
Alon Schclar,
Amir Averbuch
Abstract:
Identifying moving objects in a video sequence, which is produced by a static camera, is a fundamental and critical task in many computer-vision applications. A common approach performs background subtraction, which identifies moving objects as the portion of a video frame that differs significantly from a background model. A good background subtraction algorithm has to be robust to changes in the…
▽ More
Identifying moving objects in a video sequence, which is produced by a static camera, is a fundamental and critical task in many computer-vision applications. A common approach performs background subtraction, which identifies moving objects as the portion of a video frame that differs significantly from a background model. A good background subtraction algorithm has to be robust to changes in the illumination and it should avoid detecting non-stationary background objects such as moving leaves, rain, snow, and shadows. In addition, the internal background model should quickly respond to changes in background such as objects that start to move or stop. We present a new algorithm for video segmentation that processes the input video sequence as a 3D matrix where the third axis is the time domain. Our approach identifies the background by reducing the input dimension using the \emph{diffusion bases} methodology. Furthermore, we describe an iterative method for extracting and deleting the background. The algorithm has two versions and thus covers the complete range of backgrounds: one for scenes with static backgrounds and the other for scenes with dynamic (moving) backgrounds.
△ Less
Submitted 1 May, 2013;
originally announced May 2013.
-
Missing Entries Matrix Approximation and Completion
Authors:
Gil Shabat,
Yaniv Shmueli,
Amir Averbuch
Abstract:
We describe several algorithms for matrix completion and matrix approximation when only some of its entries are known. The approximation constraint can be any whose approximated solution is known for the full matrix. For low rank approximations, similar algorithms appears recently in the literature under different names. In this work, we introduce new theorems for matrix approximation and show tha…
▽ More
We describe several algorithms for matrix completion and matrix approximation when only some of its entries are known. The approximation constraint can be any whose approximated solution is known for the full matrix. For low rank approximations, similar algorithms appears recently in the literature under different names. In this work, we introduce new theorems for matrix approximation and show that these algorithms can be extended to handle different constraints such as nuclear norm, spectral norm, orthogonality constraints and more that are different than low rank approximations. As the algorithms can be viewed from an optimization point of view, we discuss their convergence to global solution for the convex case. We also discuss the optimal step size and show that it is fixed in each iteration. In addition, the derived matrix completion flow is robust and does not require any parameters. This matrix completion flow is applicable to different spectral minimizations and can be applied to physics, mathematics and electrical engineering problems such as data reconstruction of images and data coming from PDEs such as Helmholtz equation used for electromagnetic waves.
△ Less
Submitted 29 June, 2014; v1 submitted 27 February, 2013;
originally announced February 2013.
-
Partitioning Graph Databases - A Quantitative Evaluation
Authors:
Alex Averbuch,
Martin Neumann
Abstract:
Electronic data is growing at increasing rates, in both size and connectivity: the increasing presence of, and interest in, relationships between data. An example is the Twitter social network graph. Due to this growth demand is increasing for technologies that can process such data. Currently relational databases are the predominant technology, but they are poorly suited to processing connected d…
▽ More
Electronic data is growing at increasing rates, in both size and connectivity: the increasing presence of, and interest in, relationships between data. An example is the Twitter social network graph. Due to this growth demand is increasing for technologies that can process such data. Currently relational databases are the predominant technology, but they are poorly suited to processing connected data as they are optimized for index-intensive operations. Conversely, graph databases are optimized for graph computation. They link records by direct references, avoiding index lookups, and enabling retrieval of adjacent elements in constant time, regardless of graph size. However, as data volume increases these databases outgrow the resources of one computer and data partitioning becomes necessary. We evaluate the viability of using graph partitioning algorithms to partition graph databases. A prototype partitioned database was developed. Three partitioning algorithms explored and one implemented. Three graph datasets were used: two real and one synthetically generated. These were partitioned in various ways and the impact on database performance measured. We defined one synthetic access pattern per dataset and executed each on the partitioned datasets. Evaluation took place in a simulation environment, ensuring repeatability and allowing measurement of metrics like network traffic and load balance. Results show that compared to random partitioning the partitioning algorithm reduced traffic by 40-90%. Executing the algorithm intermittently during usage maintained partition quality, while requiring only 1% the computation of initial partitioning. Strong correlations were found between theoretic quality metrics and generated network traffic under non-uniform access patterns.
△ Less
Submitted 22 January, 2013;
originally announced January 2013.