-
MojoFrame: Dataframe Library in Mojo Language
Authors:
Shengya Huang,
Zhaoheng Li,
Derek Werner,
Yongjoo Park
Abstract:
Mojo is an emerging programming language built on MLIR (Multi-Level Intermediate Representation) and JIT compilation. It enables transparent optimizations with respect to the underlying hardware (e.g., CPUs, GPUs), while allowing users to express their logic using Python-like user-friendly syntax. Mojo has been shown to offer great performance in tensor operations; however, its performance has not…
▽ More
Mojo is an emerging programming language built on MLIR (Multi-Level Intermediate Representation) and JIT compilation. It enables transparent optimizations with respect to the underlying hardware (e.g., CPUs, GPUs), while allowing users to express their logic using Python-like user-friendly syntax. Mojo has been shown to offer great performance in tensor operations; however, its performance has not been tested for relational operations (e.g., filtering, join, and group-by), which are common in data science workflows. To date, no dataframe implementation exists in the Mojo ecosystem.
In this paper, we introduce the first Mojo-native dataframe library, called MojoFrame, that supports core relational operations and user-defined functions (UDFs). MojoFrame is built on top of Mojo's tensor to achieve fast operations on numeric columns, while utilizing a cardinality-aware approach to effectively integrate non-numeric columns for flexible data representation. To achieve high efficiency, MojoFrame takes significantly different approaches than existing libraries. MojoFrame supports all operations for TPC-H queries, and achieves up to 2.97x speedup versus existing dataframe libraries in other programming languages. Nevertheless, there remain optimization opportunities for MojoFrame (and the Mojo language), particularly in data loading and dictionary operations.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Contact tracing Inspired Efficient Computation by Energy Tracing
Authors:
Wending Mai,
Ronald P. Jenkins,
Yifan Chen,
Douglas H. Werner
Abstract:
Inspired by the epidemic contact tracing technique, we propose a method to efficiently solve electromagnetics by tracing the energy distribution. The computational domain is adaptively decomposed, and the available computational resources are focused on those energy-active (infections) and their adjacent (exposed) domains, while avoiding the unnecessary computation of energy-null (unexposed) domai…
▽ More
Inspired by the epidemic contact tracing technique, we propose a method to efficiently solve electromagnetics by tracing the energy distribution. The computational domain is adaptively decomposed, and the available computational resources are focused on those energy-active (infections) and their adjacent (exposed) domains, while avoiding the unnecessary computation of energy-null (unexposed) domains. As an example, we employ this method to solve several optics problems. The proposed method shows high efficiency while maintaining a good accuracy. The energy tracing method is based on the causality principle, and therefore is potentially transformative into other computational physics and associated algorithms.
△ Less
Submitted 8 August, 2022; v1 submitted 9 July, 2022;
originally announced July 2022.
-
Fixed Parameter Complexity and Approximability of Norm Maximization
Authors:
Christian Knauer,
Stefan König,
Daniel Werner
Abstract:
The problem of maximizing the $p$-th power of a $p$-norm over a halfspace-presented polytope in $\R^d$ is a convex maximization problem which plays a fundamental role in computational convexity. It has been shown in 1986 that this problem is $\NP$-hard for all values $p \in \mathbb{N}$, if the dimension $d$ of the ambient space is part of the input. In this paper, we use the theory of parametrized…
▽ More
The problem of maximizing the $p$-th power of a $p$-norm over a halfspace-presented polytope in $\R^d$ is a convex maximization problem which plays a fundamental role in computational convexity. It has been shown in 1986 that this problem is $\NP$-hard for all values $p \in \mathbb{N}$, if the dimension $d$ of the ambient space is part of the input. In this paper, we use the theory of parametrized complexity to analyze how heavily the hardness of norm maximization relies on the parameter $d$.
More precisely, we show that for $p=1$ the problem is fixed parameter tractable but that for all $p \in \mathbb{N} \setminus \{1\}$ norm maximization is W[1]-hard.
Concerning approximation algorithms for norm maximization, we show that for fixed accuracy, there is a straightforward approximation algorithm for norm maximization in FPT running time, but there is no FPT approximation algorithm, the running time of which depends polynomially on the accuracy.
As with the $\NP$-hardness of norm maximization, the W[1]-hardness immediately carries over to various radius computation tasks in Computational Convexity.
△ Less
Submitted 24 July, 2013;
originally announced July 2013.
-
Ontology-based Recommender System of Economic Articles
Authors:
David Werner,
Christophe Cruz,
Christophe Nicolle
Abstract:
Decision makers need economical information to drive their decisions. The Company Actualis SARL is specialized in the production and distribution of a press review about French regional economic actors. This economic review represents for a client a prospecting tool on partners and competitors. To reduce the overload of useless information, the company is moving towards a customized review for eac…
▽ More
Decision makers need economical information to drive their decisions. The Company Actualis SARL is specialized in the production and distribution of a press review about French regional economic actors. This economic review represents for a client a prospecting tool on partners and competitors. To reduce the overload of useless information, the company is moving towards a customized review for each customer. Three issues appear to achieve this goal. First, how to identify the elements in the text in order to extract objects that match with the recommendation's criteria presented? Second, How to define the structure of these objects, relationships and articles in order to provide a source of knowledge usable by the extraction process to produce new knowledge from articles? The latter issue is the feedback on customer experience to identify the quality of distributed information in real-time and to improve the relevance of the recommendations. This paper presents a new type of recommendation based on the semantic description of both articles and user profile.
△ Less
Submitted 21 January, 2013;
originally announced January 2013.
-
A Lower Bound for Shallow Partitions
Authors:
Wolfgang Mulzer,
Daniel Werner
Abstract:
Let P be a planar n-point set. A k-partition of P is a subdivision of P into n/k parts of roughly equal size and a sequence of triangles such that each part is contained in a triangle. A line is k-shallow if it has at most k points of P below it.
The crossing number of a k-partition is the maximum number of triangles in the partition that any k-shallow line intersects. We give a lower bound of O…
▽ More
Let P be a planar n-point set. A k-partition of P is a subdivision of P into n/k parts of roughly equal size and a sequence of triangles such that each part is contained in a triangle. A line is k-shallow if it has at most k points of P below it.
The crossing number of a k-partition is the maximum number of triangles in the partition that any k-shallow line intersects. We give a lower bound of Omega(log (n/k)/loglog(n/k)) for this crossing number, answering a 20-year old question of Matousek.
△ Less
Submitted 2 February, 2012; v1 submitted 11 January, 2012;
originally announced January 2012.
-
Erdős-Szekeres and Testing Weak epsilon-Nets are NP-hard in 3 dimensions - and what now?
Authors:
Christian Knauer,
Daniel Werner
Abstract:
We consider the computational versions of the Erd\H os-Szekeres theorem and related problems in 3 dimensions. We show that, in constrast to the planar case, no polynomial time algorithm exists for determining the largest (empty) convex subset (unless P=NP) among a set of points, by proving that the corresponding decision problem is NP-hard. This answers a question by Dobkin, Edelsbrunner and Overm…
▽ More
We consider the computational versions of the Erd\H os-Szekeres theorem and related problems in 3 dimensions. We show that, in constrast to the planar case, no polynomial time algorithm exists for determining the largest (empty) convex subset (unless P=NP) among a set of points, by proving that the corresponding decision problem is NP-hard. This answers a question by Dobkin, Edelsbrunner and Overmars from 1990.
As a corollary, we derive a similar result for the closely related problem of testing weak epsilon-nets in R^3. Answering a question by Chazelle et al. from 1995, our reduction shows that the problem is co-NP-hard.
This is work in progress - we are still trying to find a smart approximation algorithm for the problems.
△ Less
Submitted 25 November, 2011;
originally announced November 2011.
-
Approximating Tverberg Points in Linear Time for Any Fixed Dimension
Authors:
Wolfgang Mulzer,
Daniel Werner
Abstract:
Let P be a d-dimensional n-point set. A Tverberg-partition of P is a partition of P into r sets P_1, ..., P_r such that the convex hulls conv(P_1), ..., conv(P_r) have non-empty intersection. A point in the intersection of the conv(P_i)'s is called a Tverberg point of depth r for P. A classic result by Tverberg implies that there always exists a Tverberg partition of size n/(d+1), but it is not kn…
▽ More
Let P be a d-dimensional n-point set. A Tverberg-partition of P is a partition of P into r sets P_1, ..., P_r such that the convex hulls conv(P_1), ..., conv(P_r) have non-empty intersection. A point in the intersection of the conv(P_i)'s is called a Tverberg point of depth r for P. A classic result by Tverberg implies that there always exists a Tverberg partition of size n/(d+1), but it is not known how to find such a partition in polynomial time. Therefore, approximate solutions are of interest.
We describe a deterministic algorithm that finds a Tverberg partition of size n/4(d+1)^3 in time d^{O(log d)} n. This means that for every fixed dimension we can compute an approximate Tverberg point (and hence also an approximate centerpoint) in linear time. Our algorithm is obtained by combining a novel lifting approach with a recent result by Miller and Sheehy (2010).
△ Less
Submitted 30 June, 2020; v1 submitted 1 July, 2011;
originally announced July 2011.
-
Hardness of discrepancy computation and epsilon-net verification in high dimension
Authors:
Panos Giannopoulos,
Christian Knauer,
Magnus Wahlström,
Daniel Werner
Abstract:
Discrepancy measures how uniformly distributed a point set is with respect to a given set of ranges. There are two notions of discrepancy, namely continuous discrepancy and combinatorial discrepancy. Depending on the ranges, several possible variants arise, for example star discrepancy, box discrepancy, and discrepancy of half-spaces. In this paper, we investigate the hardness of these problems wi…
▽ More
Discrepancy measures how uniformly distributed a point set is with respect to a given set of ranges. There are two notions of discrepancy, namely continuous discrepancy and combinatorial discrepancy. Depending on the ranges, several possible variants arise, for example star discrepancy, box discrepancy, and discrepancy of half-spaces. In this paper, we investigate the hardness of these problems with respect to the dimension d of the underlying space.
All these problems are solvable in time {n^O(d)}, but such a time dependency quickly becomes intractable for high-dimensional data. Thus it is interesting to ask whether the dependency on d can be moderated.
We answer this question negatively by proving that the canonical decision problems are W[1]-hard with respect to the dimension. This is done via a parameterized reduction from the Clique problem. As the parameter stays linear in the input parameter, the results moreover imply that these problems require {n^Ω(d)} time, unless 3-Sat can be solved in {2^o(n)} time.
Further, we derive that testing whether a given set is an ε-net with respect to half-spaces takes {n^Ω(d)} time under the same assumption. As intermediate results, we discover the W[1]-hardness of other well known problems, such as determining the largest empty star inside the unit cube. For this, we show that it is even hard to approximate within a factor of {2^n}.
△ Less
Submitted 23 March, 2011;
originally announced March 2011.
-
Polynomial Bounds on the Slicing Number
Authors:
Daniel Werner,
Matthias Lenz
Abstract:
NOTE: Unfortunately, most of the results mentioned here were already known under the name of "d-separated interval piercing". The result that T_d(m) exists was first proved by Gyaŕfaś and Lehel in 1970, see [5]. Later, the result was strengthened by Kaŕolyi and Tardos [9] to match our result. Moreover, their proof (in a different notation, of course) uses ideas very similar to ours and leads to a…
▽ More
NOTE: Unfortunately, most of the results mentioned here were already known under the name of "d-separated interval piercing". The result that T_d(m) exists was first proved by Gyaŕfaś and Lehel in 1970, see [5]. Later, the result was strengthened by Kaŕolyi and Tardos [9] to match our result. Moreover, their proof (in a different notation, of course) uses ideas very similar to ours and leads to a similar recurrence. Also, our conjecture turns out to be right and was proved for the 2-dimensional case by Tardos and for the general case by Kaiser [8]. An excellent survey article ("Transversals of d-intervals') is available on http://www.renyi.hu/~tardos.
Still, we leave this paper available to the public on http://page.mi.fu-berlin.de/dawerner, also because one might find the references useful.
-----
We study the following Gallai-type of problem: Assume that we are given a family X of convex objects in R^d such that among any subset of size m, there is an axis-parallel hyperplane intersecting at least two of the objects. What can we say about the number of axis-parallel hyperplanes that sufficient to intersect all sets in the family?
In this paper, we show that this number T_d(m) exists, i.e., depends only on m and the dimension d, but not on the size of the set X. First, we derive a very weak super-exponential bound. Using this result, by a simple proof we are able to show that this number is even polynomially bounded for any fixed d.
We partly answer open problem 74 on http://maven.smith.edu/~orourke/TOPP/, where the planar case is considered, by improving the best known exponential bound to O(m^2).
△ Less
Submitted 2 August, 2010; v1 submitted 20 April, 2010;
originally announced April 2010.
-
Fixed-parameter tractability and lower bounds for stabbing problems
Authors:
Panos Giannopoulos,
Christian Knauer,
Gunter Rote,
Daniel Werner
Abstract:
We study the following general stabbing problem from a parameterized complexity point of view: Given a set $\mathcal S$ of $n$ translates of an object in $\Rd$, find a set of $k$ lines with the property that every object in $\mathcal S$ is ''stabbed'' (intersected) by at least one line.
We show that when $S$ consists of axis-parallel unit squares in $\Rtwo$ the (decision) problem of stabbing…
▽ More
We study the following general stabbing problem from a parameterized complexity point of view: Given a set $\mathcal S$ of $n$ translates of an object in $\Rd$, find a set of $k$ lines with the property that every object in $\mathcal S$ is ''stabbed'' (intersected) by at least one line.
We show that when $S$ consists of axis-parallel unit squares in $\Rtwo$ the (decision) problem of stabbing $S$ with axis-parallel lines is W[1]-hard with respect to $k$ (and thus, not fixed-parameter tractable unless FPT=W[1]) while it becomes fixed-parameter tractable when the squares are disjoint. We also show that the problem of stabbing a set of disjoint unit squares in $\Rtwo$ with lines of arbitrary directions is W[1]--hard with respect to $k$. Several generalizations to other types of objects and lines with arbitrary directions are also presented. Finally, we show that deciding whether a set of unit balls in $\Rd$ can be stabbed by one line is W[1]--hard with respect to the dimension $d$.
△ Less
Submitted 21 June, 2009;
originally announced June 2009.
-
The parameterized complexity of some geometric problems in unbounded dimension
Authors:
Panos Giannopoulos,
Christian Knauer,
Gunter Rote,
Daniel Werner
Abstract:
We study the parameterized complexity of the following fundamental geometric problems with respect to the dimension $d$: i) Given $n$ points in $\Rd$, compute their minimum enclosing cylinder. ii) Given two $n$-point sets in $\Rd$, decide whether they can be separated by two hyperplanes. iii) Given a system of $n$ linear inequalities with $d$ variables, find a maximum-size feasible subsystem. We…
▽ More
We study the parameterized complexity of the following fundamental geometric problems with respect to the dimension $d$: i) Given $n$ points in $\Rd$, compute their minimum enclosing cylinder. ii) Given two $n$-point sets in $\Rd$, decide whether they can be separated by two hyperplanes. iii) Given a system of $n$ linear inequalities with $d$ variables, find a maximum-size feasible subsystem. We show that (the decision versions of) all these problems are W[1]-hard when parameterized by the dimension $d$. %and hence not solvable in ${O}(f(d)n^c)$ time, for any computable function $f$ and constant $c$ %(unless FPT=W[1]). Our reductions also give a $n^{Ω(d)}$-time lower bound (under the Exponential Time Hypothesis).
△ Less
Submitted 18 June, 2009;
originally announced June 2009.