-
Quantized symbolic time series approximation
Authors:
Erin Carson,
Xinye Chen,
Cheng Kang
Abstract:
Time series are ubiquitous in numerous science and engineering domains, e.g., signal processing, bioinformatics, and astronomy. Previous work has verified the efficacy of symbolic time series representation in a variety of engineering applications due to its storage efficiency and numerosity reduction. The most recent symbolic aggregate approximation technique, ABBA, has been shown to preserve ess…
▽ More
Time series are ubiquitous in numerous science and engineering domains, e.g., signal processing, bioinformatics, and astronomy. Previous work has verified the efficacy of symbolic time series representation in a variety of engineering applications due to its storage efficiency and numerosity reduction. The most recent symbolic aggregate approximation technique, ABBA, has been shown to preserve essential shape information of time series and improve downstream applications, e.g., neural network inference regarding prediction and anomaly detection in time series.
Motivated by the emergence of high-performance hardware which enables efficient computation for low bit-width representations, we present a new quantization-based ABBA symbolic approximation technique, QABBA, which exhibits improved storage efficiency while retaining the original speed and accuracy of symbolic reconstruction. We prove an upper bound for the error arising from quantization and discuss how the number of bits should be chosen to balance this with other errors.
An application of QABBA with large language models (LLMs) for time series regression is also presented, and its utility is investigated. By representing the symbolic chain of patterns on time series, QABBA not only avoids the training of embedding from scratch, but also achieves a new state-of-the-art on Monash regression dataset. The symbolic approximation to the time series offers a more efficient way to fine-tune LLMs on the time series regression task which contains various application domains. We further present a set of extensive experiments performed across various well-established datasets to demonstrate the advantages of the QABBA method for symbolic approximation.
△ Less
Submitted 9 April, 2025; v1 submitted 20 November, 2024;
originally announced November 2024.
-
Association between built environment characteristics and school run traffic congestion in Beijing, China
Authors:
Chaogui Kang,
Xiaxin Wu,
Jialei Shi,
Chao Yang
Abstract:
School-escorted trips are a significant contributor to traffic congestion. Existing studies mainly compare road traffic during student pick-up/drop-off hours with off-peak times, often overlooking the fact that school-run traffic congestion is unevenly distributed across areas with different built environment characteristics. We examine the relationship between the built environment and school-run…
▽ More
School-escorted trips are a significant contributor to traffic congestion. Existing studies mainly compare road traffic during student pick-up/drop-off hours with off-peak times, often overlooking the fact that school-run traffic congestion is unevenly distributed across areas with different built environment characteristics. We examine the relationship between the built environment and school-run traffic congestion, using Beijing, China, as a case study. First, we use multi-source geospatial data to assess the built environment characteristics around schools across five dimensions: spatial concentration, transportation infrastructure, street topology, spatial richness, and scenescapes. Second, employing a generalized ordered logit model, we analyze how traffic congestion around schools varies during peak hours on school days, regular non-school days, and national college entrance exam days. Lastly, we identify the built environment factors contributing to school-run traffic congestion through multivariable linear regression and Shapley value explanations. Our findings reveal that: (1) School runs significantly exacerbate traffic congestion around schools, reducing the likelihood of free-flow by 8.34\% during school run times; (2) School-run traffic congestion is more severe in areas with multiple schools, bus stops, and scenescapes related to business and financial functions. These insights can inform the planning of new schools and urban upgrade strategies aimed at reducing traffic congestion.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Data-driven Power Flow Linearization: Theory
Authors:
Mengshuo Jia,
Gabriela Hug,
Ning Zhang,
Zhaojian Wang,
Yi Wang,
Chongqing Kang
Abstract:
This two-part tutorial dives into the field of data-driven power flow linearization (DPFL), a domain gaining increased attention. DPFL stands out for its higher approximation accuracy, wide adaptability, and better ability to implicitly incorporate the latest system attributes. This renders DPFL a potentially superior option for managing the significant fluctuations from renewable energy sources,…
▽ More
This two-part tutorial dives into the field of data-driven power flow linearization (DPFL), a domain gaining increased attention. DPFL stands out for its higher approximation accuracy, wide adaptability, and better ability to implicitly incorporate the latest system attributes. This renders DPFL a potentially superior option for managing the significant fluctuations from renewable energy sources, a step towards realizing a more sustainable energy future, by translating the higher model accuracy into increased economic efficiency and less energy losses. To conduct a deep and rigorous reexamination, this tutorial first classifies existing DPFL methods into DPFL training algorithms and supportive techniques. Their mathematical models, analytical solutions, capabilities, limitations, and generalizability are systematically examined, discussed, and summarized. In addition, this tutorial reviews existing DPFL experiments, examining the settings of test systems, the fidelity of datasets, and the comparison made among a limited number of DPFL methods. Further, this tutorial implements extensive numerical comparisons of all existing DPFL methods (40 methods in total) and four classic physics-driven approaches, focusing on their generalizability, applicability, accuracy, and computational efficiency. Through these simulationmethodss, this tutorial aims to reveal the actual performance of all the methods (including the performances exposed to data noise or outliers), guiding the selection of appropriate linearization methods. Furthermore, this tutorial discusses future directions based on the theoretical and numerical insights gained. As the first part, this paper reexamines DPFL theories, covering all the training algorithms and supportive techniques. Capabilities, limitations, and aspects of generalizability, which were previously unmentioned in the literature, have been identified.
△ Less
Submitted 10 June, 2024;
originally announced July 2024.
-
Data-driven Power Flow Linearization: Simulation
Authors:
Mengshuo Jia,
Gabriela Hug,
Ning Zhang,
Zhaojian Wang,
Yi Wang,
Chongqing Kang
Abstract:
Building on the theoretical insights of Part I, this paper, as the second part of the tutorial, dives deeper into data-driven power flow linearization (DPFL), focusing on comprehensive numerical testing. The necessity of these simulations stems from the theoretical analysis's inherent limitations, particularly the challenge of identifying the differences in real-world performance among DPFL method…
▽ More
Building on the theoretical insights of Part I, this paper, as the second part of the tutorial, dives deeper into data-driven power flow linearization (DPFL), focusing on comprehensive numerical testing. The necessity of these simulations stems from the theoretical analysis's inherent limitations, particularly the challenge of identifying the differences in real-world performance among DPFL methods with overlapping theoretical capabilities and/or limitations. The absence of a comprehensive numerical comparison of DPFL approaches in the literature also motivates this paper, especially given the fact that over 95% of existing DPFL studies have not provided any open-source codes. To bridge the gap, this paper first reviews existing DPFL experiments, examining the adopted test scenarios, load fluctuation settings, data sources, considerations for data noise/outliers, and the comparison made so far. Subsequently, this paper evaluates a total of 44 methods, containing over 30 existing DPFL approaches, some innovative DPFL techniques, and several classic physics-driven power flow linearization methods for benchmarking. The evaluation spans various dimensions, including generalizability, applicability, accuracy, and computational efficiency, using various different test cases scaling from 9-bus to 1354-bus systems. The numerical analysis identifies and examines significant trends and consistent findings across all methods under various test cases. It also offers theoretical insights into phenomena like under-performance, failure, excessive computation times, etc. Overall, this paper identifies the differences in the performances of the wide range of DPFL methods, reveals gaps not evident from theoretical discussions, assists in method selection for real-world applications, and provides thorough discussions on open questions within DPFL research, indicating ten potential future directions.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
The impact of complexity in the built environment on vehicular routing behavior: Insights from an empirical study of taxi mobility in Beijing, China
Authors:
Chaogui Kang,
Zheren Liu
Abstract:
The modeling of disaggregated vehicular mobility and its associations with the ambient urban built environment is essential for developing operative transport intervention and urban optimization plans. However, established vehicular route choice models failed to fully consider the bounded behavioral rationality and the complex characteristics of the urban built environment affecting drivers' route…
▽ More
The modeling of disaggregated vehicular mobility and its associations with the ambient urban built environment is essential for developing operative transport intervention and urban optimization plans. However, established vehicular route choice models failed to fully consider the bounded behavioral rationality and the complex characteristics of the urban built environment affecting drivers' route choice preference. Therefore, the spatio-temporal characteristics of vehicular mobility patterns were not fully explained, which limited the granular implementation of relevant transport interventions. To address this limitation, we proposed a vehicular route choice model that mimics the anchoring effect and the exposure preference while driving. The proposed model enables us to quantitatively examine the impact of the built environment on vehicular routing behavior, which has been largely neglected in previous studies. Results show that the proposed model performs 12% better than the conventional vehicular route choice model based on the shortest path principle. Our empirical analysis of taxi drivers' routing behavior patterns in Beijing, China uncovers that drivers are inclined to choose routes with shorter time duration and with less loss at traversal intersections. Counterintuitively, we also found that drivers heavily rely on circuitous ring roads and expressways to deliver passengers, which are unexpectedly longer than the shortest paths. Moreover, characteristics of the urban built environment including road eccentricity, centrality, average road length, land use diversity, sky visibility, and building coverage can affect drivers' route choice behaviors, accounting for about 5% of the increase in the proposed model's performance. We also refine the above explorations according to the modeling results of trips that differ in departure time, travel distance, and occupation status.
△ Less
Submitted 12 October, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints
Authors:
Yinpeng Dong,
Shouwei Ruan,
Hang Su,
Caixin Kang,
Xingxing Wei,
Jun Zhu
Abstract:
Recent studies have demonstrated that visual recognition models lack robustness to distribution shift. However, current work mainly considers model robustness to 2D image transformations, leaving viewpoint changes in the 3D world less explored. In general, viewpoint changes are prevalent in various real-world applications (e.g., autonomous driving), making it imperative to evaluate viewpoint robus…
▽ More
Recent studies have demonstrated that visual recognition models lack robustness to distribution shift. However, current work mainly considers model robustness to 2D image transformations, leaving viewpoint changes in the 3D world less explored. In general, viewpoint changes are prevalent in various real-world applications (e.g., autonomous driving), making it imperative to evaluate viewpoint robustness. In this paper, we propose a novel method called ViewFool to find adversarial viewpoints that mislead visual recognition models. By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints under an entropic regularizer, which helps to handle the fluctuations of the real camera pose and mitigate the reality gap between the real objects and their neural representations. Experiments validate that the common image classifiers are extremely vulnerable to the generated adversarial viewpoints, which also exhibit high cross-model transferability. Based on ViewFool, we introduce ImageNet-V, a new out-of-distribution dataset for benchmarking viewpoint robustness of image classifiers. Evaluation results on 40 classifiers with diverse architectures, objective functions, and data augmentations reveal a significant drop in model performance when tested on ImageNet-V, which provides a possibility to leverage ViewFool as an effective data augmentation strategy to improve viewpoint robustness.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Estimating Demand Flexibility Using Siamese LSTM Neural Networks
Authors:
Guangchun Ruan,
Daniel S. Kirschen,
Haiwang Zhong,
Qing Xia,
Chongqing Kang
Abstract:
There is an opportunity in modern power systems to explore the demand flexibility by incentivizing consumers with dynamic prices. In this paper, we quantify demand flexibility using an efficient tool called time-varying elasticity, whose value may change depending on the prices and decision dynamics. This tool is particularly useful for evaluating the demand response potential and system reliabili…
▽ More
There is an opportunity in modern power systems to explore the demand flexibility by incentivizing consumers with dynamic prices. In this paper, we quantify demand flexibility using an efficient tool called time-varying elasticity, whose value may change depending on the prices and decision dynamics. This tool is particularly useful for evaluating the demand response potential and system reliability. Recent empirical evidences have highlighted some abnormal features when studying demand flexibility, such as delayed responses and vanishing elasticities after price spikes. Existing methods fail to capture these complicated features because they heavily rely on some predefined (often over-simplified) regression expressions. Instead, this paper proposes a model-free methodology to automatically and accurately derive the optimal estimation pattern. We further develop a two-stage estimation process with Siamese long short-term memory (LSTM) networks. Here, a LSTM network encodes the price response, while the other network estimates the time-varying elasticities. In the case study, the proposed framework and models are validated to achieve higher overall estimation accuracy and better description for various abnormal features when compared with the state-of-the-art methods.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Transportation Interventions Reshaping NYC Commute: the Probabilistic Simulation Framework Assessing the Impacts of Ridesharing and Manhattan Congestion Surcharge
Authors:
Devashish Khulbe,
Chaogui Kang,
Stanislav Sobolevsky
Abstract:
Understanding holistic impact of planned transportation solutions and interventions on urban systems is challenged by their complexity but critical for decision making. The cornerstone for such impact assessments is estimating the transportation mode-shift resulting from the intervention. And while transportation planning has well-established models for the mode-choice assessment such as the neste…
▽ More
Understanding holistic impact of planned transportation solutions and interventions on urban systems is challenged by their complexity but critical for decision making. The cornerstone for such impact assessments is estimating the transportation mode-shift resulting from the intervention. And while transportation planning has well-established models for the mode-choice assessment such as the nested multinomial logit model, an individual choice simulation could be better suited for addressing the mode-shift allowing to consistently account for individual preferences. In addition, no model perfectly represents the reality while the available ground truth data on the actual transportation choices needed to infer the model is often incomplete or inconsistent. The present paper addresses those challenges by offering an individual mode-choice and mode-shift simulation model and the Bayesian inference framework. It accounts for uncertainties in the data as well as the model estimate and translates them into uncertainties of the resulting mode-shift and the impacts. The framework is evaluated on the two intervention cases: introducing ride-sharing for-hire-vehicles in NYC as well the recent introduction of the Manhattan Congestion Surcharge. Being successfully evaluated on the cases above, the framework can be used for assessing mode-shift and resulting economic, social and environmental implications for any future urban transportation solutions and policies being considered by decision-makers or transportation companies.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Sparse Oblique Decision Tree for Power System Security Rules Extraction and Embedding
Authors:
Qingchun Hou,
Ning Zhang,
Daniel S. Kirschen,
Ershun Du,
Yaohua Cheng,
Chongqing Kang
Abstract:
Increasing the penetration of variable generation has a substantial effect on the operational reliability of power systems. The higher level of uncertainty that stems from this variability makes it more difficult to determine whether a given operating condition will be secure or insecure. Data-driven techniques provide a promising way to identify security rules that can be embedded in economic dis…
▽ More
Increasing the penetration of variable generation has a substantial effect on the operational reliability of power systems. The higher level of uncertainty that stems from this variability makes it more difficult to determine whether a given operating condition will be secure or insecure. Data-driven techniques provide a promising way to identify security rules that can be embedded in economic dispatch model to keep power system operating states secure. This paper proposes using a sparse weighted oblique decision tree to learn accurate, understandable, and embeddable security rules that are linear and can be extracted as sparse matrices using a recursive algorithm. These matrices can then be easily embedded as security constraints in power system economic dispatch calculations using the Big-M method. Tests on several large datasets with high renewable energy penetration demonstrate the effectiveness of the proposed method. In particular, the sparse weighted oblique decision tree outperforms the state-of-art weighted oblique decision tree while keeping the security rules simple. When embedded in the economic dispatch, these rules significantly increase the percentage of secure states and reduce the average solution time.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Bounding Regression Errors in Data-driven Power Grid Steady-state Models
Authors:
Yuxiao Liu,
Bolun Xu,
Audun Botterud,
Ning Zhang,
Chongqing Kang
Abstract:
Data-driven models analyze power grids under incomplete physical information, and their accuracy has been mostly validated empirically using certain training and testing datasets. This paper explores error bounds for data-driven models under all possible training and testing scenarios, and proposes an evaluation implementation based on Rademacher complexity theory. We answer key questions for data…
▽ More
Data-driven models analyze power grids under incomplete physical information, and their accuracy has been mostly validated empirically using certain training and testing datasets. This paper explores error bounds for data-driven models under all possible training and testing scenarios, and proposes an evaluation implementation based on Rademacher complexity theory. We answer key questions for data-driven models: how much training data is required to guarantee a certain error bound, and how partial physical knowledge can be utilized to reduce the required amount of data. Our results are crucial for the evaluation and application of data-driven models in power grid analysis. We demonstrate the proposed method by finding generalization error bounds for two applications, i.e. branch flow linearization and external network equivalent under different degrees of physical knowledge. Results identify how the bounds decrease with additional power grid physical knowledge or more training data.
△ Less
Submitted 26 May, 2020; v1 submitted 29 October, 2019;
originally announced October 2019.
-
Probabilistic duck curve in high PV penetration power system: Concept, modeling, and empirical analysis in China
Authors:
Qingchun Hou,
Ning Zhang,
Ershun Du,
Miao Miao,
Fei Peng,
Chongqing Kang
Abstract:
The high penetration of photovoltaic (PV) is reshaping the electricity net-load curve and has a significant impact on power system operation and planning. The concept of duck curve is widely used to describe the timing imbalance between peak demand and PV generation. The traditional duck curve is deterministic and only shows a single extreme or typical scenario during a day. Thus, it cannot captur…
▽ More
The high penetration of photovoltaic (PV) is reshaping the electricity net-load curve and has a significant impact on power system operation and planning. The concept of duck curve is widely used to describe the timing imbalance between peak demand and PV generation. The traditional duck curve is deterministic and only shows a single extreme or typical scenario during a day. Thus, it cannot capture both the probability of that scenario and the uncertainty of PV generation and loads. These weaknesses limit the application of the duck curve on power system planning under high PV penetration. To address this issue, the novel concepts of probabilistic duck curve (PDC) and probabilistic ramp curve (PRC) are proposed to accurately model the uncertainty and variability of electricity net load and ramp under high PV penetration. An efficient method is presented for modeling PDC and PRC using kernel density estimation, copula function, and dependent discrete convolution. Several indices are designed to quantify the characteristics of the PDC and PRC. For the application, we demonstrate how the PDC and PRC will benefit flexible resource planning. Finally, an empirical study on the Qinghai provincial power system of China validates the effectiveness of the presented method. The results of PDC and PRC intuitively illustrate that the ramp demand and the valley of net load face considerable uncertainty under high PV penetration. The results of flexible resource planning indicate that retrofitting coal-fired units has remarkable performance on enhancing the power system flexibility in Qinghai. In average, reducing the minimal output of coal-fired units by 1 MW will increase PV accommodation by over 4 MWh each day.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Combining Probabilistic Load Forecasts
Authors:
Yi Wang,
Ning Zhang,
Yushi Tan,
Tao Hong,
Daniel Kirschen,
Chongqing Kang
Abstract:
Probabilistic load forecasts provide comprehensive information about future load uncertainties. In recent years, many methodologies and techniques have been proposed for probabilistic load forecasting. Forecast combination, a widely recognized best practice in point forecasting literature, has never been formally adopted to combine probabilistic load forecasts. This paper proposes a constrained qu…
▽ More
Probabilistic load forecasts provide comprehensive information about future load uncertainties. In recent years, many methodologies and techniques have been proposed for probabilistic load forecasting. Forecast combination, a widely recognized best practice in point forecasting literature, has never been formally adopted to combine probabilistic load forecasts. This paper proposes a constrained quantile regression averaging (CQRA) method to create an improved ensemble from several individual probabilistic forecasts. We formulate the CQRA parameter estimation problem as a linear program with the objective of minimizing the pinball loss, with the constraints that the parameters are nonnegative and summing up to one. We demonstrate the effectiveness of the proposed method using two publicly available datasets, the ISO New England data and Irish smart meter data. Comparing with the best individual probabilistic forecast, the ensemble can reduce the pinball score by 4.39% on average. The proposed ensemble also demonstrates superior performance over nine other benchmark ensembles.
△ Less
Submitted 18 March, 2018;
originally announced March 2018.
-
Inequality Constraints in Causal Models with Hidden Variables
Authors:
Changsung Kang,
Jin Tian
Abstract:
We present a class of inequality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network, in which some of the variables remain unmeasured. We derive bounds on causal effects that are not directly measured in randomized experiments. We derive instrumental inequality type of constraints on nonexperimental distributions. The results h…
▽ More
We present a class of inequality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network, in which some of the variables remain unmeasured. We derive bounds on causal effects that are not directly measured in randomized experiments. We derive instrumental inequality type of constraints on nonexperimental distributions. The results have applications in testing causal models with observational or experimental data.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Polynomial Constraints in Causal Bayesian Networks
Authors:
Changsung Kang,
Jin Tian
Abstract:
We use the implicitization procedure to generate polynomial equality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network with hidden variables. We show how we may reduce the complexity of the implicitization problem and make the problem tractable in certain causal Bayesian networks. We also show some preliminary results on the a…
▽ More
We use the implicitization procedure to generate polynomial equality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network with hidden variables. We show how we may reduce the complexity of the implicitization problem and make the problem tractable in certain causal Bayesian networks. We also show some preliminary results on the algebraic structure of polynomial constraints. The results have applications in distinguishing between causal models and in testing causal models with combined observational and experimental data.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.