-
Building Knowledge Graphs Towards a Global Food Systems Datahub
Authors:
Nirmal Gelal,
Aastha Gautam,
Sanaz Saki Norouzi,
Nico Giordano,
Claudio Dias da Silva Jr,
Jean Ribert Francois,
Kelsey Andersen Onofre,
Katherine Nelson,
Stacy Hutchinson,
Xiaomao Lin,
Stephen Welch,
Romulo Lollato,
Pascal Hitzler,
Hande Küçük McGinty
Abstract:
Sustainable agricultural production aligns with several sustainability goals established by the United Nations (UN). However, there is a lack of studies that comprehensively examine sustainable agricultural practices across various products and production methods. Such research could provide valuable insights into the diverse factors influencing the sustainability of specific crops and produce whi…
▽ More
Sustainable agricultural production aligns with several sustainability goals established by the United Nations (UN). However, there is a lack of studies that comprehensively examine sustainable agricultural practices across various products and production methods. Such research could provide valuable insights into the diverse factors influencing the sustainability of specific crops and produce while also identifying practices and conditions that are universally applicable to all forms of agricultural production. While this research might help us better understand sustainability, the community would still need a consistent set of vocabularies. These consistent vocabularies, which represent the underlying datasets, can then be stored in a global food systems datahub. The standardized vocabularies might help encode important information for further statistical analyses and AI/ML approaches in the datasets, resulting in the research targeting sustainable agricultural production. A structured method of representing information in sustainability, especially for wheat production, is currently unavailable. In an attempt to address this gap, we are building a set of ontologies and Knowledge Graphs (KGs) that encode knowledge associated with sustainable wheat production using formal logic. The data for this set of knowledge graphs are collected from public data sources, experimental results collected at our experiments at Kansas State University, and a Sustainability Workshop that we organized earlier in the year, which helped us collect input from different stakeholders throughout the value chain of wheat. The modeling of the ontology (i.e., the schema) for the Knowledge Graph has been in progress with the help of our domain experts, following a modular structure using KNARM methodology. In this paper, we will present our preliminary results and schemas of our Knowledge Graph and ontologies.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Understanding Twitter Engagement with a Click-Through Rate-based Method
Authors:
Andrea Fiandro,
Jeanpierre Francois,
Isabeau Oliveri,
Simone Leonardi,
Matteo A. Senese,
Giorgio Crepaldi,
Alberto Benincasa,
Giuseppe Rizzo
Abstract:
This paper presents the POLINKS solution to the RecSys Challenge 2020 that ranked 6th in the final leaderboard. We analyze the performance of our solution that utilizes the click-through rate value to address the challenge task, we compare it with a gradient boosting model, and we report the quality indicators utilized for computing the final leaderboard.
This paper presents the POLINKS solution to the RecSys Challenge 2020 that ranked 6th in the final leaderboard. We analyze the performance of our solution that utilizes the click-through rate value to address the challenge task, we compare it with a gradient boosting model, and we report the quality indicators utilized for computing the final leaderboard.
△ Less
Submitted 30 September, 2020;
originally announced October 2020.
-
Early Identification of Services in HTTPS Traffic
Authors:
Wazen M. Shbair,
Thibault Cholez,
Jerome Francois,
Isabelle Chrisment
Abstract:
Traffic monitoring is essential for network management tasks that ensure security and QoS. However, the continuous increase of HTTPS traffic undermines the effectiveness of current service-level monitoring that can only rely on unreliable parameters from the TLS handshake (X.509 certificate, SNI) or must decrypt the traffic. We propose a new machine learning-based method to identify HTTPS services…
▽ More
Traffic monitoring is essential for network management tasks that ensure security and QoS. However, the continuous increase of HTTPS traffic undermines the effectiveness of current service-level monitoring that can only rely on unreliable parameters from the TLS handshake (X.509 certificate, SNI) or must decrypt the traffic. We propose a new machine learning-based method to identify HTTPS services without decryption. By extracting statistical features on TLS handshake packets and on a small number of application data packets, we can identify HTTPS services very early in the session. Extensive experiments performed over a significant and open dataset show that our method offers a good accuracy and a prototype implementation confirms that the early identification of HTTPS services is satisfied.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
A Survey of HTTPS Traffic and Services Identification Approaches
Authors:
Wazen M. Shbair,
Thibault Cholez,
Jerome Francois,
Isabelle Chrisment
Abstract:
HTTPS is quickly rising alongside the need of Internet users to benefit from security and privacy when accessing the Web, and it becomes the predominant application protocol on the Internet. This migration towards a secure Web using HTTPS comes with important challenges related to the management of HTTPS traffic to guarantee basic network properties such as security, QoS, reliability, etc. But enc…
▽ More
HTTPS is quickly rising alongside the need of Internet users to benefit from security and privacy when accessing the Web, and it becomes the predominant application protocol on the Internet. This migration towards a secure Web using HTTPS comes with important challenges related to the management of HTTPS traffic to guarantee basic network properties such as security, QoS, reliability, etc. But encryption undermines the effectiveness of standard monitoring techniques and makes it difficult for ISPs and network administrators to properly identify and manage the services behind HTTPS traffic. This survey details the techniques used to monitor HTTPS traffic, from the most basic level of protocol identification (TLS, HTTPS), to the finest identification of precise services. We show that protocol identification is well mastered while more precise levels keep being challenging despite recent advances. We also describe practical solutions that lead us to discuss the trade-off between security and privacy and the research directions to guarantee both of them.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Exploratory Data Analysis of a Network Telescope Traffic and Prediction of Port Probing Rates
Authors:
Mehdi Zakroum,
Abdellah Houmz,
Mounir Ghogho,
Ghita Mezzour,
Abdelkader Lahmadi,
Jérôme François,
Mohammed El Koutbi
Abstract:
Understanding the properties exhibited by large scale network probing traffic would improve cyber threat intelligence. In addition, the prediction of probing rates is a key feature for security practitioners in their endeavors for making better operational decisions and for enhancing their defense strategy skills. In this work, we study different aspects of the traffic captured by a /20 network te…
▽ More
Understanding the properties exhibited by large scale network probing traffic would improve cyber threat intelligence. In addition, the prediction of probing rates is a key feature for security practitioners in their endeavors for making better operational decisions and for enhancing their defense strategy skills. In this work, we study different aspects of the traffic captured by a /20 network telescope. First, we perform an exploratory data analysis of the collected probing activities. The investigation includes probing rates at the port level, services interesting top network probers and the distribution of probing rates by geolocation. Second, we extract the network probers exploration patterns. We model these behaviors using transition graphs decorated with probabilities of switching from a port to another. Finally, we assess the capacity of Non-stationary Autoregressive and Vector Autoregressive models in predicting port probing rates as a first step towards using more robust models for better forecasting performance.
△ Less
Submitted 27 April, 2019; v1 submitted 23 December, 2018;
originally announced December 2018.
-
Finding undetected protein associations in cell signaling by belief propagation
Authors:
M. Bailly-Bechet,
C. Borgs,
A. Braunstein,
J. Chayes,
A. Dagkessamanskaia,
J. -M. François,
R. Zecchina
Abstract:
External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-prote…
▽ More
External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.
△ Less
Submitted 24 January, 2011;
originally announced January 2011.
-
Predictable Disruption Tolerant Networks and Delivery Guarantees
Authors:
Jean-Marc Francois,
Guy Leduc
Abstract:
This article studies disruption tolerant networks (DTNs) where each node knows the probabilistic distribution of contacts with other nodes. It proposes a framework that allows one to formalize the behaviour of such a network. It generalizes extreme cases that have been studied before where (a) either nodes only know their contact frequency with each other or (b) they have a perfect knowledge of…
▽ More
This article studies disruption tolerant networks (DTNs) where each node knows the probabilistic distribution of contacts with other nodes. It proposes a framework that allows one to formalize the behaviour of such a network. It generalizes extreme cases that have been studied before where (a) either nodes only know their contact frequency with each other or (b) they have a perfect knowledge of who meets who and when. This paper then gives an example of how this framework can be used; it shows how one can find a packet forwarding algorithm optimized to meet the 'delay/bandwidth consumption' trade-off: packets are duplicated so as to (statistically) guarantee a given delay or delivery probability, but not too much so as to reduce the bandwidth, energy, and memory consumption.
△ Less
Submitted 6 December, 2006;
originally announced December 2006.