-
Building Your Own Product Copilot: Challenges, Opportunities, and Needs
Authors:
Chris Parnin,
Gustavo Soares,
Rahul Pandita,
Sumit Gulwani,
Jessica Rich,
Austin Z. Henley
Abstract:
A race is underway to embed advanced AI capabilities into products. These product copilots enable users to ask questions in natural language and receive relevant responses that are specific to the user's context. In fact, virtually every large technology company is looking to add these capabilities to their software products. However, for most software engineers, this is often their first encounte…
▽ More
A race is underway to embed advanced AI capabilities into products. These product copilots enable users to ask questions in natural language and receive relevant responses that are specific to the user's context. In fact, virtually every large technology company is looking to add these capabilities to their software products. However, for most software engineers, this is often their first encounter with integrating AI-powered technology. Furthermore, software engineering processes and tools have not caught up with the challenges and scale involved with building AI-powered applications. In this work, we present the findings of an interview study with 26 professional software engineers responsible for building product copilots at various companies. From our interviews, we found pain points at every step of the engineering process and the challenges that strained existing development practices. We then conducted group brainstorming sessions to collaborative on opportunities and tool designs for the broader software engineering community.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Electrical Vehicle Fleet Routing Accounting for Dynamic Battery Degradation
Authors:
Daniel Gebbran,
Jeppe Rich,
Tomislav Dragicevic
Abstract:
The increasing uptake of electrical vehicles (EVs) has increased the awareness of battery degradation costs and how they can be minimized. However, from a planning perspective it is difficult to integrate battery degradation models into existing route planning models and to assess how policies that aim at reducing battery degradation affect route planning costs and degradation across the fleet. In…
▽ More
The increasing uptake of electrical vehicles (EVs) has increased the awareness of battery degradation costs and how they can be minimized. However, from a planning perspective it is difficult to integrate battery degradation models into existing route planning models and to assess how policies that aim at reducing battery degradation affect route planning costs and degradation across the fleet. In this paper, a simple transportation vehicle routing problem (VRP) is formulated as a mixed-integer nonlinear problem (MINLP), with a modification that allows monitoring the maximum and minimum depth-of-discharge (DoD) of the entire fleet. This allows us to measure the battery health degradation during the online optimization process. The results show that accounting for the impact of different route characteristics on battery degradation can have an impact on the route planning of the entire fleet as well as the battery degradation for all vehicles. The latter is achieved by forcing vehicles to adapt to certain DoD boundaries in the long term.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Large Scale Passenger Detection with Smartphone/Bus Implicit Interaction and Multisensory Unsupervised Cause-effect Learning
Authors:
Valentino Servizi,
Dan R. Persson,
Francisco C. Pereira,
Hannah Villadsen,
Per Bækgaard,
Jeppe Rich,
Otto A. Nielsen
Abstract:
Intelligent Transportation Systems (ITS) underpin the concept of Mobility as a Service (MaaS), which requires universal and seamless users' access across multiple public and private transportation systems while allowing operators' proportional revenue sharing. Current user sensing technologies such as Walk-in/Walk-out (WIWO) and Check-in/Check-out (CICO) have limited scalability for large-scale de…
▽ More
Intelligent Transportation Systems (ITS) underpin the concept of Mobility as a Service (MaaS), which requires universal and seamless users' access across multiple public and private transportation systems while allowing operators' proportional revenue sharing. Current user sensing technologies such as Walk-in/Walk-out (WIWO) and Check-in/Check-out (CICO) have limited scalability for large-scale deployments. These limitations prevent ITS from supporting analysis, optimization, calculation of revenue sharing, and control of MaaS comfort, safety, and efficiency. We focus on the concept of implicit Be-in/Be-out (BIBO) smartphone-sensing and classification.
To close the gap and enhance smartphones towards MaaS, we developed a proprietary smartphone-sensing platform collecting contemporary Bluetooth Low Energy (BLE) signals from BLE devices installed on buses and Global Positioning System (GPS) locations of both buses and smartphones. To enable the training of a model based on GPS features against the BLE pseudo-label, we propose the Cause-Effect Multitask Wasserstein Autoencoder (CEMWA). CEMWA combines and extends several frameworks around Wasserstein autoencoders and neural networks. As a dimensionality reduction tool, CEMWA obtains an auto-validated representation of a latent space describing users' smartphones within the transport system. This representation allows BIBO clustering via DBSCAN.
We perform an ablation study of CEMWA's alternative architectures and benchmark against the best available supervised methods. We analyze performance's sensitivity to label quality. Under the naïve assumption of accurate ground truth, XGBoost outperforms CEMWA. Although XGBoost and Random Forest prove to be tolerant to label noise, CEMWA is agnostic to label noise by design and provides the best performance with an 88\% F1 score.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Estimating Causal Effects with the Neural Autoregressive Density Estimator
Authors:
Sergio Garrido,
Stanislav S. Borysov,
Jeppe Rich,
Francisco C. Pereira
Abstract:
Estimation of causal effects is fundamental in situations were the underlying system will be subject to active interventions. Part of building a causal inference engine is defining how variables relate to each other, that is, defining the functional relationship between variables given conditional dependencies. In this paper, we deviate from the common assumption of linear relationships in causal…
▽ More
Estimation of causal effects is fundamental in situations were the underlying system will be subject to active interventions. Part of building a causal inference engine is defining how variables relate to each other, that is, defining the functional relationship between variables given conditional dependencies. In this paper, we deviate from the common assumption of linear relationships in causal models by making use of neural autoregressive density estimators and use them to estimate causal effects within the Pearl's do-calculus framework. Using synthetic data, we show that the approach can retrieve causal effects from non-linear systems without explicitly modeling the interactions between the variables.
△ Less
Submitted 1 March, 2021; v1 submitted 17 August, 2020;
originally announced August 2020.
-
Prediction of rare feature combinations in population synthesis: Application of deep generative modelling
Authors:
Sergio Garrido,
Stanislav S. Borysov,
Francisco C. Pereira,
Jeppe Rich
Abstract:
In population synthesis applications, when considering populations with many attributes, a fundamental problem is the estimation of rare combinations of feature attributes. Unsurprisingly, it is notably more difficult to reliably representthe sparser regions of such multivariate distributions and in particular combinations of attributes which are absent from the original sample. In the literature…
▽ More
In population synthesis applications, when considering populations with many attributes, a fundamental problem is the estimation of rare combinations of feature attributes. Unsurprisingly, it is notably more difficult to reliably representthe sparser regions of such multivariate distributions and in particular combinations of attributes which are absent from the original sample. In the literature this is commonly known as sampling zeros for which no systematic solution has been proposed so far. In this paper, two machine learning algorithms, from the family of deep generative models,are proposed for the problem of population synthesis and with particular attention to the problem of sampling zeros. Specifically, we introduce the Wasserstein Generative Adversarial Network (WGAN) and the Variational Autoencoder(VAE), and adapt these algorithms for a large-scale population synthesis application. The models are implemented on a Danish travel survey with a feature-space of more than 60 variables. The models are validated in a cross-validation scheme and a set of new metrics for the evaluation of the sampling-zero problem is proposed. Results show how these models are able to recover sampling zeros while keeping the estimation of truly impossible combinations, the structural zeros, at a comparatively low level. Particularly, for a low dimensional experiment, the VAE, the marginal sampler and the fully random sampler generate 5%, 21% and 26%, respectively, more structural zeros per sampling zero generated by the WGAN, while for a high dimensional case, these figures escalate to 44%, 2217% and 170440%, respectively. This research directly supports the development of agent-based systems and in particular cases where detailed socio-economic or geographical representations are required.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Introducing Super Pseudo Panels: Application to Transport Preference Dynamics
Authors:
Stanislav S. Borysov,
Jeppe Rich
Abstract:
We propose a new approach for constructing synthetic pseudo-panel data from cross-sectional data. The pseudo panel and the preferences it intends to describe is constructed at the individual level and is not affected by aggregation bias across cohorts. This is accomplished by creating a high-dimensional probabilistic model representation of the entire data set, which allows sampling from the proba…
▽ More
We propose a new approach for constructing synthetic pseudo-panel data from cross-sectional data. The pseudo panel and the preferences it intends to describe is constructed at the individual level and is not affected by aggregation bias across cohorts. This is accomplished by creating a high-dimensional probabilistic model representation of the entire data set, which allows sampling from the probabilistic model in such a way that all of the intrinsic correlation properties of the original data are preserved. The key to this is the use of deep learning algorithms based on the Conditional Variational Autoencoder (CVAE) framework. From a modelling perspective, the concept of a model-based resampling creates a number of opportunities in that data can be organized and constructed to serve very specific needs of which the forming of heterogeneous pseudo panels represents one. The advantage, in that respect, is the ability to trade a serious aggregation bias (when aggregating into cohorts) for an unsystematic noise disturbance. Moreover, the approach makes it possible to explore high-dimensional sparse preference distributions and their linkage to individual specific characteristics, which is not possible if applying traditional pseudo-panel methods. We use the presented approach to reveal the dynamics of transport preferences for a fixed pseudo panel of individuals based on a large Danish cross-sectional data set covering the period from 2006 to 2016. The model is also utilized to classify individuals into 'slow' and 'fast' movers with respect to the speed at which their preferences change over time. It is found that the prototypical fast mover is a young woman who lives as a single in a large city whereas the typical slow mover is a middle-aged man with high income from a nuclear family who lives in a detached house outside a city.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Scalable Population Synthesis with Deep Generative Modeling
Authors:
Stanislav S. Borysov,
Jeppe Rich,
Francisco C. Pereira
Abstract:
Population synthesis is concerned with the generation of synthetic yet realistic representations of populations. It is a fundamental problem in the modeling of transport where the synthetic populations of micro-agents represent a key input to most agent-based models. In this paper, a new methodological framework for how to 'grow' pools of micro-agents is presented. The model framework adopts a dee…
▽ More
Population synthesis is concerned with the generation of synthetic yet realistic representations of populations. It is a fundamental problem in the modeling of transport where the synthetic populations of micro-agents represent a key input to most agent-based models. In this paper, a new methodological framework for how to 'grow' pools of micro-agents is presented. The model framework adopts a deep generative modeling approach from machine learning based on a Variational Autoencoder (VAE). Compared to the previous population synthesis approaches, including Iterative Proportional Fitting (IPF), Gibbs sampling and traditional generative models such as Bayesian Networks or Hidden Markov Models, the proposed method allows fitting the full joint distribution for high dimensions. The proposed methodology is compared with a conventional Gibbs sampler and a Bayesian Network by using a large-scale Danish trip diary. It is shown that, while these two methods outperform the VAE in the low-dimensional case, they both suffer from scalability issues when the number of modeled attributes increases. It is also shown that the Gibbs sampler essentially replicates the agents from the original sample when the required conditional distributions are estimated as frequency tables. In contrast, the VAE allows addressing the problem of sampling zeros by generating agents that are virtually different from those in the original data but have similar statistical properties. The presented approach can support agent-based modeling at all levels by enabling richer synthetic populations with smaller zones and more detailed individual characteristics.
△ Less
Submitted 1 May, 2019; v1 submitted 21 August, 2018;
originally announced August 2018.
-
Data science for urban equity: Making gentrification an accessible topic for data scientists, policymakers, and the community
Authors:
Bernease Herman,
Gundula Proksch,
Rachel Berney,
Hillary Dawkins,
Jacob Kovacs,
Yahui Ma,
Jacob Rich,
Amanda Tan
Abstract:
The University of Washington eScience Institute runs an annual Data Science for Social Good (DSSG) program that selects four projects each year to train students from a wide range of disciplines while helping community members execute social good projects, often with an urban focus.
We present observations and deliberations of one such project, the DSSG 2017 'Equitable Futures' project, which in…
▽ More
The University of Washington eScience Institute runs an annual Data Science for Social Good (DSSG) program that selects four projects each year to train students from a wide range of disciplines while helping community members execute social good projects, often with an urban focus.
We present observations and deliberations of one such project, the DSSG 2017 'Equitable Futures' project, which investigates the ongoing gentrification process and the increasingly inequitable access to opportunities in Seattle. Similar processes can be observed in many major cities. The project connects issues usually analyzed in the disciplines of the built environment, geography, sociology, economics, social work and city governments with data science methodologies and visualizations.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.
-
Towards Bottom-Up Analysis of Social Food
Authors:
Jaclyn Rich,
Hamed Haddadi,
Timothy M. Hospedales
Abstract:
Social media provide a wealth of information for research into public health by providing a rich mix of personal data, location, hashtags, and social network information. Among these, Instagram has been recently the subject of many computational social science studies. However despite Instagram's focus on image sharing, most studies have exclusively focused on the hashtag and social network struct…
▽ More
Social media provide a wealth of information for research into public health by providing a rich mix of personal data, location, hashtags, and social network information. Among these, Instagram has been recently the subject of many computational social science studies. However despite Instagram's focus on image sharing, most studies have exclusively focused on the hashtag and social network structure. In this paper we perform the first large scale content analysis of Instagram posts, addressing both the image and the associated hashtags, aiming to understand the content of partially-labelled images taken in-the-wild and the relationship with hashtags that individuals use as noisy labels. In particular, we explore the possibility of learning to recognise food image content in a data driven way, discovering both the categories of food, and how to recognise them, purely from social network data. Notably, we demonstrate that our approach to food recognition can often achieve accuracies greater than 70% in recognising popular food-related image categories, despite using no manual annotation. We highlight the current capabilities and future challenges and opportunities for such data-driven analysis of image content and the relation to hashtags.
△ Less
Submitted 14 March, 2016;
originally announced March 2016.