-
Spectral properties of the Laplacian of Scale-Free Percolation models
Authors:
Rajat Subhra Hazra,
Nandan Malhotra
Abstract:
We consider scale-free percolation on a discrete torus $\mathbf{V}_N$ of size $N$. Conditionally on an i.i.d. sequence of Pareto weights $(W_i)_{i\in \mathbf{V}_N}$ with tail exponent $τ-1>0$, we connect any two points $i$ and $j$ on the torus with probability
$$p_{ij}= \frac{W_iW_j}{\|i-j\|^α} \wedge 1$$ for some parameter $α>0$.
We focus on the (centred) Laplacian operator of this random gra…
▽ More
We consider scale-free percolation on a discrete torus $\mathbf{V}_N$ of size $N$. Conditionally on an i.i.d. sequence of Pareto weights $(W_i)_{i\in \mathbf{V}_N}$ with tail exponent $τ-1>0$, we connect any two points $i$ and $j$ on the torus with probability
$$p_{ij}= \frac{W_iW_j}{\|i-j\|^α} \wedge 1$$ for some parameter $α>0$.
We focus on the (centred) Laplacian operator of this random graph and study its empirical spectral distribution. We explicitly identify the limiting distribution when $α<1$ and $τ>3$, in terms of the spectral distribution of some non-commutative unbounded operators.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
The spectrum of dense kernel-based random graphs
Authors:
Alessandra Cipriani,
Rajat Subhra Hazra,
Nandan Malhotra,
Michele Salvi
Abstract:
Kernel-based random graphs (KBRGs) are a broad class of random graph models that account for inhomogeneity among vertices. We consider KBRGs on a discrete $d-$dimensional torus $\mathbf{V}_N$ of size $N^d$. Conditionally on an i.i.d.~sequence of {Pareto} weights $(W_i)_{i\in \mathbf{V}_N}$ with tail exponent $τ-1>0$, we connect any two points $i$ and $j$ on the torus with probability…
▽ More
Kernel-based random graphs (KBRGs) are a broad class of random graph models that account for inhomogeneity among vertices. We consider KBRGs on a discrete $d-$dimensional torus $\mathbf{V}_N$ of size $N^d$. Conditionally on an i.i.d.~sequence of {Pareto} weights $(W_i)_{i\in \mathbf{V}_N}$ with tail exponent $τ-1>0$, we connect any two points $i$ and $j$ on the torus with probability
$$p_{ij}= \frac{κ_σ(W_i,W_j)}{\|i-j\|^α} \wedge 1$$ for some parameter $α>0$ and $κ_σ(u,v)= (u\vee v)(u \wedge v)^σ$ for some $σ\in(0,τ-1)$.
We focus on the adjacency operator of this random graph and study its empirical spectral distribution. For $α<d$ and $τ>2$, we show that a non-trivial limiting distribution exists as $N\to\infty$ and that the corresponding measure $μ_{σ,τ}$ is absolutely continuous with respect to the Lebesgue measure. $μ_{σ,τ}$ is given by an operator-valued semicircle law, whose Stieltjes transform is characterised by a fixed point equation in an appropriate Banach space. We analyse the moments of $μ_{σ,τ}$ and prove that the second moment is finite even when the weights have infinite variance. In the case $σ=1$, corresponding to the so-called scale-free percolation random graph, we can explicitly describe the limiting measure and study its tail.
△ Less
Submitted 14 March, 2025; v1 submitted 13 February, 2025;
originally announced February 2025.
-
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Authors:
Manuel Tonneau,
Diyi Liu,
Niyati Malhotra,
Scott A. Hale,
Samuel P. Fraiberger,
Victor Orozco-Olvera,
Paul Röttger
Abstract:
To address the global challenge of online hate speech, prior research has developed detection models to flag such content on social media. However, due to systematic biases in evaluation datasets, the real-world effectiveness of these models remains unclear, particularly across geographies. We introduce HateDay, the first global hate speech dataset representative of social media settings, construc…
▽ More
To address the global challenge of online hate speech, prior research has developed detection models to flag such content on social media. However, due to systematic biases in evaluation datasets, the real-world effectiveness of these models remains unclear, particularly across geographies. We introduce HateDay, the first global hate speech dataset representative of social media settings, constructed from a random sample of all tweets posted on September 21, 2022 and covering eight languages and four English-speaking countries. Using HateDay, we uncover substantial variation in the prevalence and composition of hate speech across languages and regions. We show that evaluations on academic datasets greatly overestimate real-world detection performance, which we find is very low, especially for non-European languages. Our analysis identifies key drivers of this gap, including models' difficulty to distinguish hate from offensive speech and a mismatch between the target groups emphasized in academic datasets and those most frequently targeted in real-world settings. We argue that poor model performance makes public models ill-suited for automatic hate speech moderation and find that high moderation rates are only achievable with substantial human oversight. Our results underscore the need to evaluate detection systems on data that reflects the complexity and diversity of real-world social media.
△ Less
Submitted 3 June, 2025; v1 submitted 23 November, 2024;
originally announced November 2024.
-
Limiting Spectra of inhomogeneous random graphs
Authors:
Luca Avena,
Rajat Subhra Hazra,
Nandan Malhotra
Abstract:
We consider sparse inhomogeneous Erdős-Rényi random graph ensembles where edges are connected independently with probability $p_{ij}$. We assume that $p_{ij}= \varepsilon_N f(w_i, w_j)$ where $(w_i)_{i\ge 1}$ is a sequence of deterministic weights, $f$ is a bounded function and $N\varepsilon_N\to λ\in (0,\infty)$. We characterise the limiting moments in terms of graph homomorphisms and also classi…
▽ More
We consider sparse inhomogeneous Erdős-Rényi random graph ensembles where edges are connected independently with probability $p_{ij}$. We assume that $p_{ij}= \varepsilon_N f(w_i, w_j)$ where $(w_i)_{i\ge 1}$ is a sequence of deterministic weights, $f$ is a bounded function and $N\varepsilon_N\to λ\in (0,\infty)$. We characterise the limiting moments in terms of graph homomorphisms and also classify the contributing partitions. We present an analytic way to determine the Stieltjes transform of the limiting measure. The convergence of the empirical distribution function follows from the theory of local weak convergence in many examples but we do not rely on this theory and exploit combinatorial and analytic techniques to derive some interesting properties of the limit. We extend the methods of Khorunzhy et al. (2004) and show that a fixed point equation determines the limiting measure. The limiting measure crucially depends on $λ$ and it is known that in the homogeneous case, if $λ\to\infty$, the measure converges weakly to the semicircular law (Jung and Lee (2018)). We extend this result of interpolating between the sparse and dense regimes to the inhomogeneous setting and show that as $λ\to \infty$, the measure converges weakly to a measure which is known as the operator-valued semicircular law.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Detecting key Soccer match events to create highlights using Computer Vision
Authors:
Narayana Darapaneni,
Prashant Kumar,
Nikhil Malhotra,
Vigneswaran Sundaramurthy,
Abhaya Thakur,
Shivam Chauhan,
Krishna Chaitanya Thangeda,
Anwesh Reddy Paduri
Abstract:
The research and data science community has been fascinated with the development of automatic systems for the detection of key events in a video. Special attention in this field is given to sports video analytics which could help in identifying key events during a match and help in preparing a strategy for the games going forward. For this paper, we have chosen Football (soccer) as a sport where w…
▽ More
The research and data science community has been fascinated with the development of automatic systems for the detection of key events in a video. Special attention in this field is given to sports video analytics which could help in identifying key events during a match and help in preparing a strategy for the games going forward. For this paper, we have chosen Football (soccer) as a sport where we would want to create highlights for a given match video, through a computer vision model that aims to identify important events in a Soccer match to create highlights of the match. We built the models based on Faster RCNN and YoloV5 architectures and noticed that for the amount of data we used for training Faster RCNN did better than YoloV5 in detecting the events in the match though it was much slower. Within Faster RCNN using ResNet50 as a base model gave a better class accuracy of 95.5% as compared to 92% with VGG16 as base model completely outperforming YoloV5 for our training dataset. We tested with an original video of size 23 minutes and our model could reduce it to 4:50 minutes of highlights capturing almost all important events in the match.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Improving IT Support by Enhancing Incident Management Process with Multi-modal Analysis
Authors:
Atri Mandal,
Shivali Agarwal,
Nikhil Malhotra,
Giriprasad Sridhara,
Anupama Ray,
Daivik Swarup
Abstract:
IT support services industry is going through a major transformation with AI becoming commonplace. There has been a lot of effort in the direction of automation at every human touchpoint in the IT support processes. Incident management is one such process which has been a beacon process for AI based automation. The vision is to automate the process from the time an incident/ticket arrives till it…
▽ More
IT support services industry is going through a major transformation with AI becoming commonplace. There has been a lot of effort in the direction of automation at every human touchpoint in the IT support processes. Incident management is one such process which has been a beacon process for AI based automation. The vision is to automate the process from the time an incident/ticket arrives till it is resolved and closed. While text is the primary mode of communicating the incidents, there has been a growing trend of using alternate modalities like image to communicate the problem. A large fraction of IT support tickets today contain attached image data in the form of screenshots, log messages, invoices and so on. These attachments help in better explanation of the problem which aids in faster resolution. Anybody who aspires to provide AI based IT support, it is essential to build systems which can handle multi-modal content. In this paper we present how incident management in IT support domain can be made much more effective using multi-modal analysis. The information extracted from different modalities are correlated to enrich the information in the ticket and used for better ticket routing and resolution. We evaluate our system using about 25000 real tickets containing attachments from selected problem areas. Our results demonstrate significant improvements in both routing and resolution with the use of multi-modal ticket analysis compared to only text based analysis.
△ Less
Submitted 4 August, 2019;
originally announced August 2019.
-
Cognitive system to achieve human-level accuracy in automated assignment of helpdesk email tickets
Authors:
Atri Mandal,
Nikhil Malhotra,
Shivali Agarwal,
Anupama Ray,
Giriprasad Sridhara
Abstract:
Ticket assignment/dispatch is a crucial part of service delivery business with lot of scope for automation and optimization. In this paper, we present an end-to-end automated helpdesk email ticket assignment system, which is also offered as a service. The objective of the system is to determine the nature of the problem mentioned in an incoming email ticket and then automatically dispatch it to an…
▽ More
Ticket assignment/dispatch is a crucial part of service delivery business with lot of scope for automation and optimization. In this paper, we present an end-to-end automated helpdesk email ticket assignment system, which is also offered as a service. The objective of the system is to determine the nature of the problem mentioned in an incoming email ticket and then automatically dispatch it to an appropriate resolver group (or team) for resolution.
The proposed system uses an ensemble classifier augmented with a configurable rule engine. While design of classifier that is accurate is one of the main challenges, we also need to address the need of designing a system that is robust and adaptive to changing business needs. We discuss some of the main design challenges associated with email ticket assignment automation and how we solve them. The design decisions for our system are driven by high accuracy, coverage, business continuity, scalability and optimal usage of computational resources.
Our system has been deployed in production of three major service providers and currently assigning over 40,000 emails per month, on an average, with an accuracy close to 90% and covering at least 90% of email tickets. This translates to achieving human-level accuracy and results in a net saving of about 23000 man-hours of effort per annum.
△ Less
Submitted 9 August, 2018; v1 submitted 8 August, 2018;
originally announced August 2018.