-
AugMapNet: Improving Spatial Latent Structure via BEV Grid Augmentation for Enhanced Vectorized Online HD Map Construction
Authors:
Thomas Monninger,
Md Zafar Anwar,
Stanislaw Antol,
Steffen Staab,
Sihao Ding
Abstract:
Autonomous driving requires an understanding of the infrastructure elements, such as lanes and crosswalks. To navigate safely, this understanding must be derived from sensor data in real-time and needs to be represented in vectorized form. Learned Bird's-Eye View (BEV) encoders are commonly used to combine a set of camera images from multiple views into one joint latent BEV grid. Traditionally, fr…
▽ More
Autonomous driving requires an understanding of the infrastructure elements, such as lanes and crosswalks. To navigate safely, this understanding must be derived from sensor data in real-time and needs to be represented in vectorized form. Learned Bird's-Eye View (BEV) encoders are commonly used to combine a set of camera images from multiple views into one joint latent BEV grid. Traditionally, from this latent space, an intermediate raster map is predicted, providing dense spatial supervision but requiring post-processing into the desired vectorized form. More recent models directly derive infrastructure elements as polylines using vectorized map decoders, providing instance-level information. Our approach, Augmentation Map Network (AugMapNet), proposes latent BEV grid augmentation, a novel technique that significantly enhances the latent BEV representation. AugMapNet combines vector decoding and dense spatial supervision more effectively than existing architectures while remaining as straightforward to integrate and as generic as auxiliary supervision. Experiments on nuScenes and Argoverse2 datasets demonstrate significant improvements in vectorized map prediction performance up to 13.3% over the StreamMapNet baseline on 60m range and greater improvements on larger ranges. We confirm transferability by applying our method to another baseline and find similar improvements. A detailed analysis of the latent BEV grid confirms a more structured latent space of AugMapNet and shows the value of our novel concept beyond pure performance improvement. The code will be released soon.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
Authors:
Nikhil Khandekar,
Qiao Jin,
Guangzhi Xiong,
Soren Dunn,
Serina S Applebaum,
Zain Anwar,
Maame Sarfo-Gyamfi,
Conrad W Safranek,
Abid A Anwar,
Andrew Zhang,
Aidan Gilson,
Maxwell B Singer,
Amisha Dave,
Andrew Taylor,
Aidong Zhang,
Qingyu Chen,
Zhiyong Lu
Abstract:
As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative e…
▽ More
As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative equations and rule-based reasoning paradigms for evidence-based decision support. To this end, we propose MedCalc-Bench, a first-of-its-kind dataset focused on evaluating the medical calculation capability of LLMs. MedCalc-Bench contains an evaluation set of over 1000 manually reviewed instances from 55 different medical calculation tasks. Each instance in MedCalc-Bench consists of a patient note, a question requesting to compute a specific medical value, a ground truth answer, and a step-by-step explanation showing how the answer is obtained. While our evaluation results show the potential of LLMs in this area, none of them are effective enough for clinical settings. Common issues include extracting the incorrect entities, not using the correct equation or rules for a calculation task, or incorrectly performing the arithmetic for the computation. We hope our study highlights the quantitative knowledge and reasoning gaps in LLMs within medical settings, encouraging future improvements of LLMs for various clinical calculation tasks.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation
Authors:
Thomas Monninger,
Vandana Dokkadi,
Md Zafar Anwar,
Steffen Staab
Abstract:
Autonomous driving requires an accurate representation of the environment. A strategy toward high accuracy is to fuse data from several sensors. Learned Bird's-Eye View (BEV) encoders can achieve this by mapping data from individual sensors into one joint latent space. For cost-efficient camera-only systems, this provides an effective mechanism to fuse data from multiple cameras with different vie…
▽ More
Autonomous driving requires an accurate representation of the environment. A strategy toward high accuracy is to fuse data from several sensors. Learned Bird's-Eye View (BEV) encoders can achieve this by mapping data from individual sensors into one joint latent space. For cost-efficient camera-only systems, this provides an effective mechanism to fuse data from multiple cameras with different views. Accuracy can further be improved by aggregating sensor information over time. This is especially important in monocular camera systems to account for the lack of explicit depth and velocity measurements. Thereby, the effectiveness of developed BEV encoders crucially depends on the operators used to aggregate temporal information and on the used latent representation spaces. We analyze BEV encoders proposed in the literature and compare their effectiveness, quantifying the effects of aggregation operators and latent representations. While most existing approaches aggregate temporal information either in image or in BEV latent space, our analyses and performance comparisons suggest that these latent representations exhibit complementary strengths. Therefore, we develop a novel temporal BEV encoder, TempBEV, which integrates aggregated temporal information from both latent spaces. We consider subsequent image frames as stereo through time and leverage methods from optical flow estimation for temporal stereo encoding. Empirical evaluation on the NuScenes dataset shows a significant improvement by TempBEV over the baseline for 3D object detection and BEV segmentation. The ablation uncovers a strong synergy of joint temporal aggregation in the image and BEV latent space. These results indicate the overall effectiveness of our approach and make a strong case for aggregating temporal information in both image and BEV latent spaces.
△ Less
Submitted 18 September, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Privacy Limits in Power-Law Bipartite Networks under Active Fingerprinting Attacks
Authors:
M. Shariatnasab,
F. Shirani,
Z. Anwar
Abstract:
This work considers the fundamental privacy limits under active fingerprinting attacks in power-law bipartite networks. The scenario arises naturally in social network analysis, tracking user mobility in wireless networks, and forensics applications, among others. A stochastic growing network generation model -- called the popularity-based model -- is investigated, where the bipartite network is g…
▽ More
This work considers the fundamental privacy limits under active fingerprinting attacks in power-law bipartite networks. The scenario arises naturally in social network analysis, tracking user mobility in wireless networks, and forensics applications, among others. A stochastic growing network generation model -- called the popularity-based model -- is investigated, where the bipartite network is generated iteratively, and in each iteration vertices attract new edges based on their assigned popularity values. It is shown that using the appropriate choice of initial popularity values, the node degree distribution follows a power-law distribution with arbitrary parameter $α>2$, i.e. fraction of nodes with degree $d$ is proportional to $d^{-α}$. An active fingerprinting deanonymization attack strategy called the augmented information threshold attack strategy (A-ITS) is proposed which uses the attacker's knowledge of the node degree distribution along with the concept of information values for deanonymization. Sufficient conditions for the success of the A-ITS, based on network parameters, are derived. It is shown through simulations that the proposed attack significantly outperforms the state-of-the-art attack strategies.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Position-Sensing Graph Neural Networks: Proactively Learning Nodes Relative Positions
Authors:
Zhenyue Qin,
Yiqun Zhang Saeed Anwar,
Dongwoo Kim,
Yang Liu,
Pan Ji,
Tom Gedeon
Abstract:
Most existing graph neural networks (GNNs) learn node embeddings using the framework of message passing and aggregation. Such GNNs are incapable of learning relative positions between graph nodes within a graph. To empower GNNs with the awareness of node positions, some nodes are set as anchors. Then, using the distances from a node to the anchors, GNNs can infer relative positions between nodes.…
▽ More
Most existing graph neural networks (GNNs) learn node embeddings using the framework of message passing and aggregation. Such GNNs are incapable of learning relative positions between graph nodes within a graph. To empower GNNs with the awareness of node positions, some nodes are set as anchors. Then, using the distances from a node to the anchors, GNNs can infer relative positions between nodes. However, P-GNNs arbitrarily select anchors, leading to compromising position-awareness and feature extraction. To eliminate this compromise, we demonstrate that selecting evenly distributed and asymmetric anchors is essential. On the other hand, we show that choosing anchors that can aggregate embeddings of all the nodes within a graph is NP-complete. Therefore, devising efficient optimal algorithms in a deterministic approach is practically not feasible. To ensure position-awareness and bypass NP-completeness, we propose Position-Sensing Graph Neural Networks (PSGNNs), learning how to choose anchors in a back-propagatable fashion. Experiments verify the effectiveness of PSGNNs against state-of-the-art GNNs, substantially improving performance on various synthetic and real-world graph datasets while enjoying stable scalability. Specifically, PSGNNs on average boost AUC more than 14% for pairwise node classification and 18% for link prediction over the existing state-of-the-art position-aware methods. Our source code is publicly available at: https://github.com/ZhenyueQin/PSGNN.
△ Less
Submitted 3 November, 2024; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Leveraging Social-Network Infrastructure to Improve Peer-to-Peer Overlay Performance: Results from Orkut
Authors:
Zahid Anwar,
William Yurcik,
Vivek Pandey,
Asim Shankar,
Indranil Gupta,
Roy H. Campbell
Abstract:
Application-level peer-to-peer (P2P) network overlays are an emerging paradigm that facilitates decentralization and flexibility in the scalable deployment of applications such as group communication, content delivery, and data sharing. However the construction of the overlay graph topology optimized for low latency, low link and node stress and lookup performance is still an open problem. We pr…
▽ More
Application-level peer-to-peer (P2P) network overlays are an emerging paradigm that facilitates decentralization and flexibility in the scalable deployment of applications such as group communication, content delivery, and data sharing. However the construction of the overlay graph topology optimized for low latency, low link and node stress and lookup performance is still an open problem. We present a design of an overlay constructed on top of a social network and show that it gives a sizable improvement in lookups, average round-trip delay and scalability as opposed to other overlay topologies. We build our overlay on top of the topology of a popular real-world social network namely Orkut. We show Orkuts suitability for our purposes by evaluating the clustering behavior of its graph structure and the socializing pattern of its members.
△ Less
Submitted 28 September, 2005;
originally announced September 2005.