-
VANPY: Voice Analysis Framework
Authors:
Gregory Koushnir,
Michael Fire,
Galit Fuhrmann Alpert,
Dima Kagan
Abstract:
Voice data is increasingly being used in modern digital communications, yet there is still a lack of comprehensive tools for automated voice analysis and characterization. To this end, we developed the VANPY (Voice Analysis in Python) framework for automated pre-processing, feature extraction, and classification of voice data. The VANPY is an open-source end-to-end comprehensive framework that was…
▽ More
Voice data is increasingly being used in modern digital communications, yet there is still a lack of comprehensive tools for automated voice analysis and characterization. To this end, we developed the VANPY (Voice Analysis in Python) framework for automated pre-processing, feature extraction, and classification of voice data. The VANPY is an open-source end-to-end comprehensive framework that was developed for the purpose of speaker characterization from voice data. The framework is designed with extensibility in mind, allowing for easy integration of new components and adaptation to various voice analysis applications. It currently incorporates over fifteen voice analysis components - including music/speech separation, voice activity detection, speaker embedding, vocal feature extraction, and various classification models.
Four of the VANPY's components were developed in-house and integrated into the framework to extend its speaker characterization capabilities: gender classification, emotion classification, age regression, and height regression. The models demonstrate robust performance across various datasets, although not surpassing state-of-the-art performance.
As a proof of concept, we demonstrate the framework's ability to extract speaker characteristics on a use-case challenge of analyzing character voices from the movie "Pulp Fiction." The results illustrate the framework's capability to extract multiple speaker characteristics, including gender, age, height, emotion type, and emotion intensity measured across three dimensions: arousal, dominance, and valence.
△ Less
Submitted 4 May, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Short Run Transit Route Planning Decision Support System Using a Deep Learning-Based Weighted Graph
Authors:
Nadav Shalit,
Michael Fire,
Dima Kagan,
Eran Ben-Elia
Abstract:
Public transport routing plays a crucial role in transit network design, ensuring a satisfactory level of service for passengers. However, current routing solutions rely on traditional operational research heuristics, which can be time-consuming to implement and lack the ability to provide quick solutions. Here, we propose a novel deep learning-based methodology for a decision support system that…
▽ More
Public transport routing plays a crucial role in transit network design, ensuring a satisfactory level of service for passengers. However, current routing solutions rely on traditional operational research heuristics, which can be time-consuming to implement and lack the ability to provide quick solutions. Here, we propose a novel deep learning-based methodology for a decision support system that enables public transport (PT) planners to identify short-term route improvements rapidly. By seamlessly adjusting specific sections of routes between two stops during specific times of the day, our method effectively reduces times and enhances PT services. Leveraging diverse data sources such as GTFS and smart card data, we extract features and model the transportation network as a directed graph. Using self-supervision, we train a deep learning model for predicting lateness values for road segments.
These lateness values are then utilized as edge weights in the transportation graph, enabling efficient path searching. Through evaluating the method on Tel Aviv, we are able to reduce times on more than 9\% of the routes. The improved routes included both intraurban and suburban routes showcasing a fact highlighting the model's versatility. The findings emphasize the potential of our data-driven decision support system to enhance public transport and city logistics, promoting greater efficiency and reliability in PT services.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Interruptions detection in video conferences
Authors:
Shmuel Horowitz,
Dima Kagan,
Galit Fuhrmann Alpert,
Michael Fire
Abstract:
In recent years, video conferencing (VC) popularity has skyrocketed for a wide range of activities. As a result, the number of VC users surged sharply. The sharp increase in VC usage has been accompanied by various newly emerging privacy and security challenges. VC meetings became a target for various security attacks, such as Zoombombing. Other VC-related challenges also emerged. For example, dur…
▽ More
In recent years, video conferencing (VC) popularity has skyrocketed for a wide range of activities. As a result, the number of VC users surged sharply. The sharp increase in VC usage has been accompanied by various newly emerging privacy and security challenges. VC meetings became a target for various security attacks, such as Zoombombing. Other VC-related challenges also emerged. For example, during COVID lockdowns, educators had to teach in online environments struggling with keeping students engaged for extended periods. In parallel, the amount of available VC videos has grown exponentially. Thus, users and companies are limited in finding abnormal segments in VC meetings within the converging volumes of data. Such abnormal events that affect most meeting participants may be indicators of interesting points in time, including security attacks or other changes in meeting climate, like someone joining a meeting or sharing a dramatic content. Here, we present a novel algorithm for detecting abnormal events in VC data. We curated VC publicly available recordings, including meetings with interruptions. We analyzed the videos using our algorithm, extracting time windows where abnormal occurrences were detected. Our algorithm is a pipeline that combines multiple methods in several steps to detect users' faces in each video frame, track face locations during the meeting and generate vector representations of a facial expression for each face in each frame. Vector representations are used to monitor changes in facial expressions throughout the meeting for each participant. The overall change in meeting climate is quantified using those parameters across all participants, and translating them into event anomaly detection. This is the first open pipeline for automatically detecting anomaly events in VC meetings. Our model detects abnormal events with 92.3% precision over the collected dataset.
△ Less
Submitted 25 February, 2023;
originally announced March 2023.
-
Ethnic Representation Analysis of Commercial Movie Posters
Authors:
Dima Kagan,
Mor Levy,
Michael Fire,
Galit Fuhrmann Alpert
Abstract:
In the last decades, global awareness towards the importance of diverse representation has been increasing. Lack of diversity and discrimination toward minorities did not skip the film industry. Here, we examine ethnic bias in the film industry through commercial posters, the industry's primary advertisement medium for decades. Movie posters are designed to establish the viewer's initial impressio…
▽ More
In the last decades, global awareness towards the importance of diverse representation has been increasing. Lack of diversity and discrimination toward minorities did not skip the film industry. Here, we examine ethnic bias in the film industry through commercial posters, the industry's primary advertisement medium for decades. Movie posters are designed to establish the viewer's initial impression. We developed a novel approach for evaluating ethnic bias in the film industry by analyzing nearly 125,000 posters using state-of-the-art deep learning models. Our analysis shows that while ethnic biases still exist, there is a trend of reduction of bias, as seen by several parameters. Particularly in English-speaking movies, the ethnic distribution of characters on posters from the last couple of years is reaching numbers that are approaching the actual ethnic composition of US population. An automatic approach to monitor ethnic diversity in the film industry, potentially integrated with financial value, may be of significant use for producers and policymakers.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Large-Scale Shill Bidder Detection in E-commerce
Authors:
Michael Fire,
Rami Puzis,
Dima Kagan,
Yuval Elovici
Abstract:
User feedback is one of the most effective methods to build and maintain trust in electronic commerce platforms. Unfortunately, dishonest sellers often bend over backward to manipulate users' feedback or place phony bids in order to increase their own sales and harm competitors. The black market of user feedback, supported by a plethora of shill bidders, prospers on top of legitimate electronic co…
▽ More
User feedback is one of the most effective methods to build and maintain trust in electronic commerce platforms. Unfortunately, dishonest sellers often bend over backward to manipulate users' feedback or place phony bids in order to increase their own sales and harm competitors. The black market of user feedback, supported by a plethora of shill bidders, prospers on top of legitimate electronic commerce. In this paper, we investigate the ecosystem of shill bidders based on large-scale data by analyzing hundreds of millions of users who performed billions of transactions, and we propose a machine-learning-based method for identifying communities of users that methodically provide dishonest feedback. Our results show that (1) shill bidders can be identified with high precision based on their transaction and feedback statistics; and (2) in contrast to legitimate buyers and sellers, shill bidders form cliques to support each other.
△ Less
Submitted 21 April, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Co-Membership-based Generic Anomalous Communities Detection
Authors:
Shay Lapid,
Dima Kagan,
Michael Fire
Abstract:
Nowadays, detecting anomalous communities in networks is an essential task in research, as it helps discover insights into community-structured networks. Most of the existing methods leverage either information regarding attributes of vertices or the topological structure of communities. In this study, we introduce the Co-Membership-based Generic Anomalous Communities Detection Algorithm (referred…
▽ More
Nowadays, detecting anomalous communities in networks is an essential task in research, as it helps discover insights into community-structured networks. Most of the existing methods leverage either information regarding attributes of vertices or the topological structure of communities. In this study, we introduce the Co-Membership-based Generic Anomalous Communities Detection Algorithm (referred as to CMMAC), a novel and generic method that utilizes the information of vertices co-membership in multiple communities. CMMAC is domain-free and almost unaffected by communities' sizes and densities. Specifically, we train a classifier to predict the probability of each vertex in a community being a member of the community. We then rank the communities by the aggregated membership probabilities of each community's vertices. The lowest-ranked communities are considered to be anomalous. Furthermore, we present an algorithm for generating a community-structured random network enabling the infusion of anomalous communities to facilitate research in the field. We utilized it to generate two datasets, composed of thousands of labeled anomaly-infused networks, and published them. We experimented extensively on thousands of simulated, and real-world networks, infused with artificial anomalies. CMMAC outperformed other existing methods in a range of settings. Additionally, we demonstrated that CMMAC can identify abnormal communities in real-world unlabeled networks in different domains, such as Reddit and Wikipedia.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Automatic Large Scale Detection of Red Palm Weevil Infestation using Aerial and Street View Images
Authors:
Dima Kagan,
Galit Fuhrmann Alpert,
Michael Fire
Abstract:
The spread of the Red Palm Weevil has dramatically affected date growers, homeowners and governments, forcing them to deal with a constant threat to their palm trees. Early detection of palm tree infestation has been proven to be critical in order to allow treatment that may save trees from irreversible damage, and is most commonly performed by local physical access for individual tree monitoring.…
▽ More
The spread of the Red Palm Weevil has dramatically affected date growers, homeowners and governments, forcing them to deal with a constant threat to their palm trees. Early detection of palm tree infestation has been proven to be critical in order to allow treatment that may save trees from irreversible damage, and is most commonly performed by local physical access for individual tree monitoring. Here, we present a novel method for surveillance of Red Palm Weevil infested palm trees utilizing state-of-the-art deep learning algorithms, with aerial and street-level imagery data. To detect infested palm trees we analyzed over 100,000 aerial and street-images, mapping the location of palm trees in urban areas. Using this procedure, we discovered and verified infested palm trees at various locations.
△ Less
Submitted 9 April, 2021; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Zooming Into Video Conferencing Privacy and Security Threats
Authors:
Dima Kagan,
Galit Fuhrmann Alpert,
Michael Fire
Abstract:
The COVID-19 pandemic outbreak, with its related social distancing and shelter-in-place measures, has dramatically affected ways in which people communicate with each other, forcing people to find new ways to collaborate, study, celebrate special occasions, and meet with family and friends. One of the most popular solutions that have emerged is the use of video conferencing applications to replace…
▽ More
The COVID-19 pandemic outbreak, with its related social distancing and shelter-in-place measures, has dramatically affected ways in which people communicate with each other, forcing people to find new ways to collaborate, study, celebrate special occasions, and meet with family and friends. One of the most popular solutions that have emerged is the use of video conferencing applications to replace face-to-face meetings with virtual meetings. This resulted in unprecedented growth in the number of video conferencing users. In this study, we explored privacy issues that may be at risk by attending virtual meetings. We extracted private information from collage images of meeting participants that are publicly posted on the Web. We used image processing, text recognition tools, as well as social network analysis to explore our web crawling curated dataset of over 15,700 collage images, and over 142,000 face images of meeting participants. We demonstrate that video conference users are facing prevalent security and privacy threats. Our results indicate that it is relatively easy to collect thousands of publicly available images of video conference meetings and extract personal information about the participants, including their face images, age, gender, usernames, and sometimes even full names. This type of extracted data can vastly and easily jeopardize people's security and privacy both in the online and real-world, affecting not only adults but also more vulnerable segments of society, such as young children and older adults. Finally, we show that cross-referencing facial image data with social network data may put participants at additional privacy risks they may not be aware of and that it is possible to identify users that appear in several video conference meetings, thus providing a potential to maliciously aggregate different sources of information about a target individual.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
Using Data Science to Understand the Film Industry's Gender Gap
Authors:
Dima Kagan,
Thomas Chesney,
Michael Fire
Abstract:
Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corp…
▽ More
Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women`s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular--albeit flawed--measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and novel techniques that present new opportunities in the research and analysis of movies.
△ Less
Submitted 6 August, 2019; v1 submitted 15 March, 2019;
originally announced March 2019.
-
Generic Anomalous Vertices Detection Utilizing a Link Prediction Algorithm
Authors:
Dima Kagan,
Yuval Elovici,
Michael Fire
Abstract:
In the past decade, network structures have penetrated nearly every aspect of our lives. The detection of anomalous vertices in these networks has become increasingly important, such as in exposing computer network intruders or identifying fake online reviews. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by…
▽ More
In the past decade, network structures have penetrated nearly every aspect of our lives. The detection of anomalous vertices in these networks has become increasingly important, such as in exposing computer network intruders or identifying fake online reviews. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by using features extracted from the network topology. Following the reasoning that a vertex with many improbable links has a higher likelihood of being anomalous,we employed our method on 10 networks of various scales, from a network of several dozen students to online social networks with millions of users. In every scenario, we were able to identify anomalous vertices with lower false positive rates and higher AUCs compared to other prevalent methods. Moreover, we demonstrated that the presented algorithm is efficient both in revealing fake users and in disclosing the most influential people in social networks.
△ Less
Submitted 6 June, 2017; v1 submitted 24 October, 2016;
originally announced October 2016.
-
Facebook Applications' Installation and Removal: A Temporal Analysis
Authors:
Dima Kagan,
Michael Fire,
Aviad Elyashar,
Yuval Elovici
Abstract:
Facebook applications are one of the reasons for Facebook attractiveness. Unfortunately, numerous users are not aware of the fact that many malicious Facebook applications exist. To educate users, to raise users' awareness and to improve Facebook users' security and privacy, we developed a Firefox add-on that alerts users to the number of installed applications on their Facebook profiles. In this…
▽ More
Facebook applications are one of the reasons for Facebook attractiveness. Unfortunately, numerous users are not aware of the fact that many malicious Facebook applications exist. To educate users, to raise users' awareness and to improve Facebook users' security and privacy, we developed a Firefox add-on that alerts users to the number of installed applications on their Facebook profiles. In this study, we present the temporal analysis of the Facebook applications' installation and removal dataset collected by our add-on. This dataset consists of information from 2,945 users, collected during a period of over a year. We used linear regression to analyze our dataset and discovered the linear connection between the average percentage change of newly installed Facebook applications and the number of days passed since the user initially installed our add-on. Additionally, we found out that users who used our Firefox add-on become more aware of their security and privacy installing on average fewer new applications. Finally, we discovered that on average 86.4% of Facebook users install an additional application every 4.2 days.
△ Less
Submitted 16 September, 2013;
originally announced September 2013.
-
Friend or Foe? Fake Profile Identification in Online Social Networks
Authors:
Michael Fire,
Dima Kagan,
Aviad Elyashar,
Yuval Elovici
Abstract:
The amount of personal information unwillingly exposed by users on online social networks is staggering, as shown in recent research. Moreover, recent reports indicate that these networks are infested with tens of millions of fake users profiles, which may jeopardize the users' security and privacy. To identify fake users in such networks and to improve users' security and privacy, we developed th…
▽ More
The amount of personal information unwillingly exposed by users on online social networks is staggering, as shown in recent research. Moreover, recent reports indicate that these networks are infested with tens of millions of fake users profiles, which may jeopardize the users' security and privacy. To identify fake users in such networks and to improve users' security and privacy, we developed the Social Privacy Protector software for Facebook. This software contains three protection layers, which improve user privacy by implementing different methods. The software first identifies a user's friends who might pose a threat and then restricts this "friend's" exposure to the user's personal information. The second layer is an expansion of Facebook's basic privacy settings based on different types of social network usage profiles. The third layer alerts users about the number of installed applications on their Facebook profile, which have access to their private information. An initial version of the Social Privacy Protection software received high media coverage, and more than 3,000 users from more than twenty countries have installed the software, out of which 527 used the software to restrict more than nine thousand friends. In addition, we estimate that more than a hundred users accepted the software's recommendations and removed at least 1,792 Facebook applications from their profiles. By analyzing the unique dataset obtained by the software in combination with machine learning techniques, we developed classifiers, which are able to predict which Facebook profiles have high probabilities of being fake and therefore, threaten the user's well-being. Moreover, in this study, we present statistics on users' privacy settings and statistics of the number of applications installed on Facebook profiles...
△ Less
Submitted 15 March, 2013;
originally announced March 2013.
-
Social Network Based Search for Experts
Authors:
Yehonatan Bitton,
Michael Fire,
Dima Kagan,
Bracha Shapira,
Lior Rokach,
Judit Bar-Ilan
Abstract:
Our system illustrates how information retrieved from social networks can be used for suggesting experts for specific tasks. The system is designed to facilitate the task of finding the appropriate person(s) for a job, as a conference committee member, an advisor, etc. This short description will demonstrate how the system works in the context of the HCIR2012 published tasks.
Our system illustrates how information retrieved from social networks can be used for suggesting experts for specific tasks. The system is designed to facilitate the task of finding the appropriate person(s) for a job, as a conference committee member, an advisor, etc. This short description will demonstrate how the system works in the context of the HCIR2012 published tasks.
△ Less
Submitted 14 December, 2012;
originally announced December 2012.