-
Challenges for Predictive Modeling with Neural Network Techniques using Error-Prone Dietary Intake Data
Authors:
Dylan Spicker,
Amir Nazemi,
Joy Hutchinson,
Paul Fieguth,
Sharon I. Kirkpatrick,
Michael Wallace,
Kevin W. Dodd
Abstract:
Dietary intake data are routinely drawn upon to explore diet-health relationships. However, these data are often subject to measurement error, distorting the true relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models…
▽ More
Dietary intake data are routinely drawn upon to explore diet-health relationships. However, these data are often subject to measurement error, distorting the true relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of machine learning techniques, and in particular, neural networks. Neural networks are computational models that are able to capture highly complex, nonlinear relationships so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling has not been systematically investigated. However, dietary intake data are typically collected using self-report methods and are prone to large amounts of measurement error. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks, and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play on model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains make them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques, compared to more traditional statistical procedures.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches
Authors:
Chi-en Amy Tai,
Matthew Keller,
Saeejith Nair,
Yuhao Chen,
Yifan Wu,
Olivia Markham,
Krish Parmar,
Pengcheng Xi,
Heather Keller,
Sharon Kirkpatrick,
Alexander Wong
Abstract:
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating, as malnutrition has been directly linked to decreased quality of life. However self-reporting methods such as food diaries suffer from substantial bias. Other conventional dietary assessment techniques and emerging alternative approaches such as mobile applications incur high time costs an…
▽ More
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating, as malnutrition has been directly linked to decreased quality of life. However self-reporting methods such as food diaries suffer from substantial bias. Other conventional dietary assessment techniques and emerging alternative approaches such as mobile applications incur high time costs and may necessitate trained personnel. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images, but the lack of comprehensive datasets with diverse viewpoints, modalities and food annotations hinders the accuracy and realism of such methods. To address this limitation, we introduce NutritionVerse-Synth, the first large-scale dataset of 84,984 photorealistic synthetic 2D food images with associated dietary information and multimodal annotations (including depth images, instance masks, and semantic masks). Additionally, we collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism. Leveraging these novel datasets, we develop and benchmark NutritionVerse, an empirical study of various dietary intake estimation approaches, including indirect segmentation-based and direct prediction networks. We further fine-tune models pretrained on synthetic data with real images to provide insights into the fusion of synthetic and real data. Finally, we release both datasets (NutritionVerse-Synth, NutritionVerse-Real) on https://www.kaggle.com/nutritionverse/datasets as part of an open initiative to accelerate machine learning for dietary sensing.
△ Less
Submitted 1 September, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Hard Optimization Problems have Soft Edges
Authors:
Raffaele Marino,
Scott Kirkpatrick
Abstract:
Finding a Maximum Clique is a classic property test from graph theory; find any one of the largest complete subgraphs in an Erdös-Rényi G(N, p) random graph. We use Maximum Clique to explore the structure of the problem as a function of N, the graph size, and K, the clique size sought. It displays a complex phase boundary, a staircase of steps at each of which 2log2 N and Kmax, the maximum size of…
▽ More
Finding a Maximum Clique is a classic property test from graph theory; find any one of the largest complete subgraphs in an Erdös-Rényi G(N, p) random graph. We use Maximum Clique to explore the structure of the problem as a function of N, the graph size, and K, the clique size sought. It displays a complex phase boundary, a staircase of steps at each of which 2log2 N and Kmax, the maximum size of a clique that can be found, increases by 1. Each of its boundaries has a finite width, and these widths allow local algorithms to find cliques beyond the limits defined by the study of infinite systems. We explore the performance of a number of extensions of traditional fast local algorithms, and find that much of the "hard" space remains accessible at finite N. The "hidden clique" problem embeds a clique somewhat larger than those which occur naturally in a G(N, p) random graph. Since such a clique is unique, we find that local searches which stop early, once evidence for the hidden clique is found, may outperform the best message passing or spectral algorithms.
△ Less
Submitted 25 May, 2023; v1 submitted 11 September, 2022;
originally announced September 2022.
-
Large independent sets on random $d$-regular graphs with fixed degree $d$
Authors:
Raffaele Marino,
Scott Kirkpatrick
Abstract:
This paper presents a linear prioritized local algorithm that computes large independent sets on a random $d$-regular graph with small and fixed degree $d$. We studied experimentally the independence ratio obtained by the algorithm when $ d \in [3,100]$. For all $d \in [5,100]$, our results are larger than lower bounds calculated by exact methods, thus providing improved estimates of lower bounds.
This paper presents a linear prioritized local algorithm that computes large independent sets on a random $d$-regular graph with small and fixed degree $d$. We studied experimentally the independence ratio obtained by the algorithm when $ d \in [3,100]$. For all $d \in [5,100]$, our results are larger than lower bounds calculated by exact methods, thus providing improved estimates of lower bounds.
△ Less
Submitted 17 August, 2021; v1 submitted 27 March, 2020;
originally announced March 2020.
-
From Megabits to CPU~Ticks: Enriching a Demand Trace in the Age of MEC
Authors:
Francesco Malandrino,
Carla Fabiana Chiasserini,
Giuseppe Avino,
Marco Malinverno,
Scott Kirkpatrick
Abstract:
All the content consumed by mobile users, be it a web page or a live stream, undergoes some processing along the way; as an example, web pages and videos are transcoded to fit each device's screen. The recent multi-access edge computing (MEC) paradigm envisions performing such processing within the cellular network, as opposed to resorting to a cloud server on the Internet. Designing a MEC network…
▽ More
All the content consumed by mobile users, be it a web page or a live stream, undergoes some processing along the way; as an example, web pages and videos are transcoded to fit each device's screen. The recent multi-access edge computing (MEC) paradigm envisions performing such processing within the cellular network, as opposed to resorting to a cloud server on the Internet. Designing a MEC network, i.e., placing and dimensioning the computational facilities therein, requires information on how much computational power is required to produce the contents needed by the users. However, real-world demand traces only contain information on how much data is downloaded. In this paper, we demonstrate how to {\em enrich} demand traces with information about the computational power needed to process the different types of content, and we show the substantial benefit that can be obtained from using such enriched traces for the design of MEC-based networks.
△ Less
Submitted 23 September, 2018;
originally announced September 2018.
-
Revisiting the Challenges of MaxClique
Authors:
Raffaele Marino,
Scott Kirkpatrick
Abstract:
The MaxClique problem, finding the largest complete subgraph in an Erd{ö}s-R{é}nyi $G(N,p)$ random graph in the large $N$ limit, is a well-known example of a simple problem for which finding any approximate solution within a factor of $2$ of the known, probabilistically determined limit, appears to require P$=$NP. This type of search has practical importance in very large graphs. Algorithmic appro…
▽ More
The MaxClique problem, finding the largest complete subgraph in an Erd{ö}s-R{é}nyi $G(N,p)$ random graph in the large $N$ limit, is a well-known example of a simple problem for which finding any approximate solution within a factor of $2$ of the known, probabilistically determined limit, appears to require P$=$NP. This type of search has practical importance in very large graphs. Algorithmic approaches run into phase boundaries long before they reach the size of the largest likely solutions. And, most intriguing, there is an extensive literature of \textit{challenges} posed for concrete methods of finding maximum naturally occurring as well as artificially hidden cliques, with computational costs that are at most polynomial in the size of the problem.
We use the probabilistic approach in a novel way to provide a more insightful test of constructive algorithms for this problem. We show that extensions of existing methods of greedy local search will be able to meet the \textit{challenges} for practical problems of size $N$ as large as $10^{10}$ and perhaps more. Experiments with spectral methods that treat a single large clique of size $αN^{1/2}$ \textit{planted} in the graph as an impurity level in a tight binding energy band show that such a clique can be detected when $α\geq \approx1.0$. Belief propagation using a recent \textit{approximate message passing} (\textbf{AMP}) scheme of inference pushes this limit down to $α\sim \sqrt{1/e}$. Exhaustive local search (with early stopping when the planted clique is found) does even better on problems of practical size, and proves to be the fastest solution method for this problem.
△ Less
Submitted 8 May, 2019; v1 submitted 24 July, 2018;
originally announced July 2018.
-
Mining the Air -- for Research in Social Science and Networking Measurement
Authors:
Scott Kirkpatrick,
Ron Bekkerman,
Adi Zmirli,
Francesco Malandrino
Abstract:
Smartphone apps provide a vitally important opportunity for monitoring human mobility, human experience of ubiquitous information aids, and human activity in our increasingly well-instrumented spaces. As wireless data capabilities move steadily up in performance, from 2&3G to 4G (today's LTE) and 5G, it has become more important to measure human activity in this connected world from the phones the…
▽ More
Smartphone apps provide a vitally important opportunity for monitoring human mobility, human experience of ubiquitous information aids, and human activity in our increasingly well-instrumented spaces. As wireless data capabilities move steadily up in performance, from 2&3G to 4G (today's LTE) and 5G, it has become more important to measure human activity in this connected world from the phones themselves. The newer protocols serve larger areas than ever before and a wider range of data, not just voice calls, so only the phone can accurately measure its location. Access to the application activity permits not only monitoring the performance and spatial coverage with which the users are served, but as a crowd-sourced, unbiased background source of input on all these subjects, becomes a uniquely valuable resource for input to social science and government as well as telecom providers
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
Cellular Network Traces Towards 5G: Usage, Analysis and Generation
Authors:
Francesco Malandrino,
Carla-Fabiana Chiasserini,
Scott Kirkpatrick
Abstract:
Deployment and demand traces are a crucial tool to study today's LTE systems, as well as their evolution toward 5G. In this paper, we use a set of real-world, crowdsourced traces, coming from the WeFi and OpenSignal apps, to investigate how present-day networks are deployed, and the load they serve. Given this information, we present a way to generate synthetic deployment and demand profiles, reta…
▽ More
Deployment and demand traces are a crucial tool to study today's LTE systems, as well as their evolution toward 5G. In this paper, we use a set of real-world, crowdsourced traces, coming from the WeFi and OpenSignal apps, to investigate how present-day networks are deployed, and the load they serve. Given this information, we present a way to generate synthetic deployment and demand profiles, retaining the same features of their real-world counterparts. We further discuss a methodology using traces (both real-world and synthetic) to assess (i) to which extent the current deployment is adequate to the current and future demand, and (ii) the effectiveness of the existing strategies to improve network capacity. Applying our methodology to real-world traces, we find that present-day LTE deployments consist of multiple, entangled, medium- to large-sized cells. Furthermore, although today's LTE networks are overprovisioned when compared to the present traffic demand, they will need substantial capacity improvements in order to face the load increase forecasted between now and 2020.
△ Less
Submitted 14 April, 2018;
originally announced April 2018.
-
How Close to the Edge? Delay/utilization tradeoffs in MEC
Authors:
Francesco Malandrino,
Scott Kirkpatrick,
Carla-Fabiana Chiasserini
Abstract:
Virtually all of the rapidly increasing data traffic consumed by mobile users requires some kind of processing, normally performed at cloud servers. A recent thrust, {\em mobile edge computing}, moves such processing to servers {\em within} the cellular mobile network. The large temporal and spatial variations to which mobile data usage is subject could make the reduced latency that edge clouds of…
▽ More
Virtually all of the rapidly increasing data traffic consumed by mobile users requires some kind of processing, normally performed at cloud servers. A recent thrust, {\em mobile edge computing}, moves such processing to servers {\em within} the cellular mobile network. The large temporal and spatial variations to which mobile data usage is subject could make the reduced latency that edge clouds offer come at an unacceptable cost in redundant and underutilized infrastructure. We present some first empirical results on this question, based on large scale sampled crowd-sourced traces from several major cities spanning multiple operators and identifying the applications in use. We find opportunities to obtain both high server utilization and low application latency, but the best approaches will depend on the individual network operator's deployment strategy and geographic specifics of the cities we study.
△ Less
Submitted 25 November, 2016;
originally announced November 2016.
-
The Impact of Vehicular Traffic Demand on 5G Caching Architectures: a Data-Driven Study
Authors:
Francesco Malandrino,
Carla-Fabiana Chiasserini,
Scott Kirkpatrick
Abstract:
The emergence of in-vehicle entertainment systems and self-driving vehicles, and the latter's need for high-resolution, up-to-date maps, will bring a further increase in the amount of data vehicles consume. Considering how difficult WiFi offloading in vehicular environments is, the bulk of this additional load will be served by cellular networks. Cellular networks, in turn, will resort to caching…
▽ More
The emergence of in-vehicle entertainment systems and self-driving vehicles, and the latter's need for high-resolution, up-to-date maps, will bring a further increase in the amount of data vehicles consume. Considering how difficult WiFi offloading in vehicular environments is, the bulk of this additional load will be served by cellular networks. Cellular networks, in turn, will resort to caching at the network edge in order to reduce the strain on their core network, an approach also known as mobile edge computing, or fog computing. In this work, we exploit a real-world, large-scale trace coming from the users of the We-Fi app in order to (i) understand how significant the contribution of vehicular users is to the global traffic demand; (ii) compare the performance of different caching architectures; and (iii) studying how such a performance is influenced by recommendation systems and content locality. We express the price of fog computing through a metric called price-of-fog, accounting for the extra caches to deploy compared to a traditional, centralized approach. We find that fog computing allows a very significant reduction of the load on the core network, and the price thereof is low in all cases and becomes negligible if content demand is location specific. We can therefore conclude that vehicular networks make an excellent case for the transition to mobile-edge caching: thanks to the peculiar features of vehicular demand, we can obtain all the benefits of fog computing, including a reduction of the load on the core network, reducing the disadvantages to a minimum.
△ Less
Submitted 24 November, 2016;
originally announced November 2016.
-
What is LTE actually used for? An answer through multi-operator, crowd-sourced measurement
Authors:
Francesco Malandrino,
Scott Kirkpatrick,
Danny Bickson
Abstract:
LTE networks are commonplace nowadays; however, comparatively little is known about where (and why) they are deployed, and the demand they serve. We shed some light on these issues through large-scale, crowd-sourced measurement. Our data, collected by users of the WeFi app, spans multiple operators and multiple cities, allowing us to observe a wide variety of deployment patterns. Surprisingly, we…
▽ More
LTE networks are commonplace nowadays; however, comparatively little is known about where (and why) they are deployed, and the demand they serve. We shed some light on these issues through large-scale, crowd-sourced measurement. Our data, collected by users of the WeFi app, spans multiple operators and multiple cities, allowing us to observe a wide variety of deployment patterns. Surprisingly, we find that LTE is frequently used to improve the {\em coverage} of network rather than the capacity thereof, and that no evidence shows that video traffic be a primary driver for its deployment. Our insights suggest that such factors as pre-existing networks and commercial policies have a deeper impact on deployment decisions than purely technical considerations.
△ Less
Submitted 23 November, 2016;
originally announced November 2016.
-
The Price of Fog: a Data-Driven Study on Caching Architectures in Vehicular Networks
Authors:
Francesco Malandrino,
Carla-Fabiana Chiasserini,
Scott Kirkpatrick
Abstract:
Vehicular users are expected to consume large amounts of data, for both entertainment and navigation purposes. This will put a strain on cellular networks, which will be able to cope with such a load only if proper caching is in place, this in turn begs the question of which caching architecture is the best-suited to deal with vehicular content consumption. In this paper, we leverage a large-scale…
▽ More
Vehicular users are expected to consume large amounts of data, for both entertainment and navigation purposes. This will put a strain on cellular networks, which will be able to cope with such a load only if proper caching is in place, this in turn begs the question of which caching architecture is the best-suited to deal with vehicular content consumption. In this paper, we leverage a large-scale, crowd-collected trace to (i) characterize the vehicular traffic demand, in terms of overall magnitude and content breakup, (ii) assess how different caching approaches perform against such a real-world load, (iii) study the effect of recommendation systems and local contents. We define a price-of-fog metric, expressing the additional caching capacity to deploy when moving from traditional, centralized caching architectures to a "fog computing" approach, where caches are closer to the network edge. We find that for location-specific contents, such as the ones that vehicular users are most likely to request, such a price almost disappears. Vehicular networks thus make a strong case for the adoption of mobile-edge caching, as we are able to reap the benefit thereof -- including a reduction in the distance traveled by data, within the core network -- with little or no of the associated disadvantages.
△ Less
Submitted 20 May, 2016;
originally announced May 2016.
-
Social Networks and Spin Glasses
Authors:
Scott Kirkpatrick,
Alex Kulakovsky,
Manuel Cebrian,
Alex Pentland
Abstract:
The networks formed from the links between telephones observed in a month's call detail records (CDRs) in the UK are analyzed, looking for the characteristics thought to identify a communications network or a social network. Some novel methods are employed. We find similarities to both types of network. We conclude that, just as analogies to spin glasses have proved fruitful for optimization of la…
▽ More
The networks formed from the links between telephones observed in a month's call detail records (CDRs) in the UK are analyzed, looking for the characteristics thought to identify a communications network or a social network. Some novel methods are employed. We find similarities to both types of network. We conclude that, just as analogies to spin glasses have proved fruitful for optimization of large scale practical problems, there will be opportunities to exploit a statistical mechanics of the formation and dynamics of social networks in today's electronically connected world.
△ Less
Submitted 31 October, 2011; v1 submitted 7 August, 2010;
originally announced August 2010.
-
New Model of Internet Topology Using k-shell Decomposition
Authors:
Shai Carmi,
Shlomo Havlin,
Scott Kirkpatrick,
Yuval Shavitt,
Eran Shir
Abstract:
We introduce and use k-shell decomposition to investigate the topology of the Internet at the AS level. Our analysis separates the Internet into three sub-components: (a) a nucleus which is a small (~100 nodes) very well connected globally distributed subgraph; (b) a fractal sub-component that is able to connect the bulk of the Internet without congesting the nucleus, with self similar propertie…
▽ More
We introduce and use k-shell decomposition to investigate the topology of the Internet at the AS level. Our analysis separates the Internet into three sub-components: (a) a nucleus which is a small (~100 nodes) very well connected globally distributed subgraph; (b) a fractal sub-component that is able to connect the bulk of the Internet without congesting the nucleus, with self similar properties and critical exponents; and (c) dendrite-like structures, usually isolated nodes that are connected to the rest of the network through the nucleus only. This unique decomposition is robust, and provides insight into the underlying structure of the Internet and its functional consequences. Our approach is general and useful also when studying other complex networks.
△ Less
Submitted 17 July, 2006;
originally announced July 2006.
-
Selfish vs. Unselfish Optimization of Network Creation
Authors:
Johannes J. Schneider,
Scott Kirkpatrick
Abstract:
We investigate several variants of a network creation model: a group of agents builds up a network between them while trying to keep the costs of this network small. The cost function consists of two addends, namely (i) a constant amount for each edge an agent buys and (ii) the minimum number of hops it takes sending messages to other agents. Despite the simplicity of this model, various complex…
▽ More
We investigate several variants of a network creation model: a group of agents builds up a network between them while trying to keep the costs of this network small. The cost function consists of two addends, namely (i) a constant amount for each edge an agent buys and (ii) the minimum number of hops it takes sending messages to other agents. Despite the simplicity of this model, various complex network structures emerge depending on the weight between the two addends of the cost function and on the selfish or unselfish behaviour of the agents.
△ Less
Submitted 3 August, 2005;
originally announced August 2005.