List of open questions

This page is a collection of open problems in theoretical computer science. I have not investigated all of them thoroughly, but I find them interesting. It also features a list of other lists of open problems.

Some problems are very precisely formulated, others are fuzzier. The categories are here for convenience but they are mostly random.

Some of these problems are existing conjectures, though I avoided those that everyone should know, like P vs NP. For problems which are well-known conjectures with existing descriptions that are better than what I could write, I have a dedicated category. Other problems are open problems left as future work in publications that I found interesting. Yet other problems come from my research: however I usually do not list problems that I am working on right now, because the phrasing of the problem (or my understanding of possible solutions) could change too quickly for me to keep this page in sync. Also, some of my questions on TCS.SE may not be immediately reflected here, as I first wait to see whether I get an answer.

For more information about directions for new research (featuring questions that are less precise but more related to my interests), you can look at my internship and PhD offers. If you are a student interested in working with me, not all problems on that page are a good choice: the problems in this list on which I believe I could reasonably supervise someone are marked with an (*).

If you know the answer to a problem, or if you have new information about it, I would be very interested to know! You can write me at a3nm<REMOVETHIS>@a3nm.net.

This document is likely to contains errors of various kinds, and I make no promises about its correctness. Please report any mistakes to the address above.

Posets
Graphs
Databases and logic
Boolean functions and circuits
Formal languages
Miscellaneous
Well-known conjectures
Other lists of open problems
Solved problems
Retired problems

Posets ¶

Complexity of counting antichains in restricted poset classes ¶

With Yael Amsterdamer and Tova Milo, we worked on the problem of learning monotone predicates on lattices via crowd queries. We used the result that it is #P-hard to count the number of antichains in a poset¹. However, our results could be strengthened if we knew that hardness also held for distributive lattices. Is it also #P-hard to count antichains in distributive lattices?

More generally, the question is: on which classes of posets is it #P-hard to count antichains?

TCS.SE question on distributive lattices

Tradeoffs between computational and oracle complexity to learn monotone predicates on posets ¶

Our work about learning monotone predicates on lattices via crowd queries shows that (under some formulations of the problem) it is computationally intractable to do so in a way that minimizes the oracle complexity (number of oracle queries). However, we were unable to analyse the oracle performance of non-optimal strategies which are computationally inexpensive.

Formally, the task is the following: we are given a poset $(V, <)$ and we wish to learn exactly a Boolean predicate $P : V \mapsto {0, 1}$ which is monotone: if $P (x) = 1$ and $x < y$ then $P (y) = 1$ . A simple algorithm is to pick (uniformly at random) an element $x$ such that $P (x)$ is unknown, evaluate $P (x)$ with the oracle, integrate the consequences of the answer (set $P (y) = 0$ for all $y < x$ if $P (x) = 0$ , set $P (y) = 1$ for all $y > x$ if $P (x) = 1$ ), and repeat while there are unclassified element. Are there posets where the performance of this strategy (expected number of oracle queries) is significantly worse than the (computationally intractable) optimal strategy?

More generally, are there learning algorithms for this task which are computationally tractable and yet have near-optimal oracle complexity?

Complexity of finding linear extensions of a labeled poset in a regular language (*)¶

Our work on order-incomplete data studies labeled posets, which consist of a poset $(P, <)$ and a function $μ$ from $P$ to some finite alphabet $Σ$ . The label of a linear extension $x_{1}, \dots, x_{n}$ of $(P, <, μ)$ is the word $μ (x_{1}) \dots μ (x_{n})$ . A simple question is whether we can efficiently determine, given a labeled poset $(P, <, μ)$ and a word $w$ in $Σ^{*}$ , whether $(P, <, μ)$ has a linear extension of label $w$ . Our work shows that this problem is hard even for comparatively simple $P$ , and shows classes of labeled posets where it is tractable. It also studies a generalization of this question where you do not consider just the free monoid $Σ^{*}$ on $Σ$ , but an arbitrary monoid structure: given a monoid element $w$ and a labeled poset, we ask whether there is a linear extension whose label evaluates to $w$ according to the monoid law.

Of course, another variant of this problem is to consider a language $L$ on $Σ$ , for instance a regular language, and ask whether the labeled poset has a linear extension whose label is in $L$ . The more general question is: for which regular languages is this class tractable? Could one show a dichotomy result separating tractable and intractable languages?

For instance, with the alphabet $Σ = {a, b}$ and language $L = (a b)^{*}$ , I do not know whether it is tractable, given an input poset $(P, <, μ)$ , to test whether it has a linear extension where we alternate between elements labeled $a$ and elements labeled $b$ .

Our paper about the problem, which settles various special cases
Page keeping track of the remaining open problems from the paper
TCS.SE question about the case where the language follows the law of a finite group
TCS.SE question about the variant of finding the lexicographically minimal topological sort
TCS.SE question about hardness of a related problem
TCS.SE question about hardness of a related problem on restricted poset classes

A related question is to characterize the complexity of enumerating linear extensions of labeled posets: the problem can be done in constant amortized delay for classical posets³, but I am not aware of analogous results for labeled posets.

TCS.SE question about this

Representing the result of duplicate consolidation in labeled posets ¶

With M. Lamine Ba, Daniel Deutch and Pierre Senellart, we worked on how to represent and query order-incomplete data. In this context, we studied labeled posets (defined as above) and study the question of how to consolidate duplicates in a labeled poset. By this, we mean that we want to compute a labeled poset where the element labels are unique (all elements with the same label are collapsed to the same element), and the remaining order relations between the collapsed elements are sensible.

However, if you see a labeled poset as a way to represent a set of possible orders (its linear extensions), and if you want the duplicate removal operation to act consistently when looking at each linear extension (i.e., the linear extensions of the duplicate consolidation should be the result of removing duplicates in the linear extensions of the labeled posets), then it is not clear at all how to proceed. In fact, the result of duplicate consolidation may even not be representable as a labeled poset.

Is there a formalism that can represent the result of duplicate consolidation on labeled posets?

Testing whether a set of words can be the set of linear extensions of a labeled poset ¶

Define labeled posets as in the previous section. Consider a sequence $S$ of words given as input. A labeled poset $(P, <, μ)$ represents $S$ if $S$ is exactly the set of labels that can be achieved by the linear extensions of the poset. Of course, a sequence $S$ cannot be thus represented unless all its sequences are built using the same multiset of elements, which should be the image of $μ$ (seen as a multiset). However, if $S$ respects this condition, it does not seem easy to determine whether it can be represented or not.

What is the complexity of determining, given a set of sequences, whether it can be represented by a labeled poset in this sense?

Stable interpolation on partial orders ¶

In our work with Yael Amsterdamer, Tova Milo, and Pierre Senellart, we study an interpolation scheme on partial orders, defined as the center of mass of the convex polytope defined by the order constraints. As it turns out, however, the scheme is not stable: if we fix some variables to their interpolation result, the interpolation of other variables may change.

Is there a principled stable interpolation scheme on posets? Is it unique?

More general TCS.SE question

Representing bounded treewidth partial orders ¶

In our work with Yael Amsterdamer, Tova Milo, and Pierre Senellart, we consider partial orders whose Hasse diagram (or its reverse) is a directed tree. We do not know whether our study would extend to partial orders whose Hasse diagram has bounded treewidth.

Is there a structure on the set of linear extensions of a poset whose Hasse diagram has bounded treewidth? Alternatively, is there a structure on the convex polytope defined by such order constraints?

Related paper which shows that it is PTIME to count the number of linear extensions of posets whose Hasse diagram has bounded treewidth.

A related question: are there languages for which constrained topological sorting is tractable on partial orders whose Hasse diagram is a tree, or has bounded treewidth, but is hard on arbitrary DAGs?

Smallest posets with prescribed number of linear extensions ¶

For all $n \in N$ , what is the minimal number of elements of a poset with exactly $n$ linear extensions, if one exists? What is the asymptotic growth of this function?

Math.SE question

The analogous question for antichains has been studied⁴, with posets of size logarithmic in the desired number of antichains. However, the argument is much simpler, because the posets can be easily constructed from smaller posets. By contrast, the number of linear extensions of a poset seems hard to determine from elementary "constituent parts".

Graphs ¶

Detour problem ¶

Given a directed graph $G$ , a source vertex $s$ , and a sink vertex $t$ reachable from $s$ , we want to know if there is a simple directed path from $s$ to $t$ whose length is greater than that of the shortest path from $s$ to $t$ . Is this problem in polynomial time?

Last seen open: 2023⁵

A seemingly related question is that of simple paths on DAGs with backward edges. Consider an input DAG $G$ , two vertices $s$ and $t$ , and additional back edges such that whenever $(u, v)$ is an additional edge then $u$ is reachable from $v$ in $G$ . Is it NP-hard to determine whether there is a simple path from $s$ to $t$ that uses at least one additional edge?

TCS.SE question

FPRAS for the probability of obtaining a cyclic subgraph of a directed graph ¶

We are given a directed graph $G$ whose edges are annotated with independent probabilities of existence, and we want to estimate the probability of obtaining a subgraph of $G$ which contains a directed cycle. Is there an FPRAS for this task?

CStheory question (2023)
Paper leaving the problem open (to appear in 2024)

Simple paths of fixed modulo length on undirected graphs (*)¶

Given an undirected graph and two nodes, can one decide in polynomial time whether there is a simple path of length 0 modulo 3 between the two nodes?

Last seen open: 2022⁶.

The problem without the "simple" requirement is clearly in PTIME, and the same problem on a directed graph is NP-hard by a reduction from the 2 disjoint paths problem. The question is also open for paths of length $p$ modulo $q$ for other values of $p$ and $q$ . It can be determined in PTIME if all simple paths between two vertices of an undirected graphs are of length p modulo q, solving the problem for $q = 2$ ⁷.

Up to subdividing each edge by 2 and connecting the source and target nodes by an edge, we can reduce to the problem of deciding if an undirected graph has a simple cycle of length $2 p + 1$ mod $2 q$ . This question is also open: a related comment is at the end of these slides (2015). A sufficient condition for tractability was given in 2004⁸, but it does not cover the right cases, so the problem remains open.

I have written (in 2023) a more detailed writeup on the related work around this problem.

The monk problem (*)¶

The so-called monk problem⁹ is a pursuit-evasion game played on a directed graph. A strategy for $k$ pursuers is a sequence $s_{1}, \dots, s_{n}$ of subsets of $k$ vertices. A strategy is winning if, for every (non-simple) walk $v_{1}, \dots, v_{n}$ in the graph, there is $i$ such that $v_{i} \in s_{i}$ . Intuitively, the evader is walking on the graph (it has to move at each turn by following exactly one edge) and must avoid the $k$ vertices that the pursuers examine at each time step.

The evasion number of a digraph $G$ is the smallest $k$ for which the pursuers have a winning strategy. What is the computational complexity, given an input digraph, of computing its evasion number? This relates to the question of whether graphs with evasion number $\geq k$ can be somehow characterised, similarly to the links between treewidth and pursuit-evasion via havens. It also relates to the question of how the maximal length of a strategy can be bounded as a function of the graph size and of $k$ .

TCS.SE question
Bruteforce implementation
Paper which characterizes the undirected graphs with evasion number $k = 1$ (they are the unions of lobster graphs, and can be recognized in linear time), and determines explicit strategies for them. In this paper the pursuer is called a prince and the evader is called a princess. See also this Reddit discussion, where the pursuer is a vampire hunter and the evader is a vampire.
Paper on the problem for undirected graphs, which does not determine the complexity. In this paper the pursuers are called hunters, the evader is called rabbit, and the evasion number is called hunter number. Some other recent papers on the topic (on undirected graphs, and not determining the complexity):
- Catching a mouse on a tree, where the pursuers are called cats and the evader mouse;
- Hunting rabbits on the hypercube, which follows the hunter/rabbit terminology.

Computing the treewidth of planar graphs ¶

Treewidth measures how much an undirected graph is close to a tree. It is known that, for any fixed $k \in N$ , we can check in linear time in an input graph $G$ whether its treewidth is $\leq k$ ; but that, when both $k$ and $G$ are given as input, it is NP-hard to determine whether $G$ has treewidth $\leq k$ .

Is this last statement still true if $G$ is planar, or can the treewidth of a planar graph $G$ be computed in PTIME?

TCS.SE question

Computing cliquewidth ¶

The parameterized cliquewidth computation problem for $k \in N$ asks, given an input graph $G$ , whether $G$ has cliquewidth $\leq k$ . Is this problem in PTIME for any fixed $k$ ? Is there $k \in N$ such that the problem is NP-hard?

Membership in PTIME was recently shown¹⁶ for $k \leq 3$ , but the problem is open for larger $k$ . In particular, we do not know whether the problem is in FPT. Membership in FPT is known for some restricted classes of graphs of unbounded cliquewidth¹⁷, and membership in cubic FPT is known for the related parameter of rankwidth¹⁸.

Last seen open: 2012¹⁶.

Optimal bound on the size of a grid minor ¶

The grid minor theorem¹⁹ of Robertson and Seymour shows that, if a family of graphs has unbounded treewidth, then one can find arbitrary large grid graphs as minors of the family. Specifically, the result can be stated as follows²⁰: there exists a function $f : N_{> 0} \to N_{> 0}$ such that, for every $g \in N_{> 0}$ , every graph of treewidth $\geq g$ has the $(g \times g)$ -grid as a minor.

The best known upper bound²⁰ on $f$ is $~ O (g^{19})$ , where the $~ O$ notation neglects polylogarithmic factors. The best known lower bound²¹ is $Ω (g^{2} log g)$ .

What is the correct bound on the function $f$ ?

Last seen open: 2016²⁰. See²⁰ for discussion of conjectured bounds.

Extracting a polynomial grid minor in PTIME ¶

Continuing on the grid minor theorem from the previous entry, the following result is known²²: there is $c \in N$ such that for any $n \in N$ , for any graph $G$ of treewidth $\geq n^{c}$ , one can find the $n \times n$ grid as a minor of $G$ . This work shows how the minor can be found in randomized polynomial time.

Is it possible to extract the minor in deterministic PTIME?

Last seen open: 2015²³.

Monadic second-order logic with cardinality predicates ¶

We extend monadic second-order logic (MSO) over graphs with constructs to check the equality of the cardinalities of second-order variables, and check that their cardinalities are in fixed recursive sets. We fix a formula $Φ$ in this logic and a treewidth bound $k \in N$ . Is it PTIME in an input graph $G$ of treewidth $\leq k$ to check whether it satisfies $Φ$ ?

Courcelle's theorem states that this is true (and in linear time) for normal MSO without the extension.

The question on Open Problem Garden

Graph reachability labellings with total orders ¶

A labeling of a directed acyclic graph (DAG) $G = (V, E)$ is a function $μ$ from $V$ to some set $Σ$ of labels such that the labels of a pair of vertices suffice to test reachability. Formally, there is a decoding function $f : Σ^{2} \to {0, 1}$ independent of $G$ such that for any two vertices $u, v \in V$ , we have $f (μ (u), μ (v)) = 1$ iff $v$ is reachable from $u$ in $G$ .

I am interested in labeling schemes where $Σ$ is $N^{d}$ for a certain $d$ , and where for simplicity we assume that $μ$ cannot use the same number twice: there are no $u \neq v$ in $V$ such that, letting $ℓ = μ (u)$ and $ℓ^{'} = μ (v)$ , we have $ℓ_{i} = ℓ_{j}^{'}$ for some $1 \leq i < j \leq d$ . Further, the decoding function $f$ can only use the labels to do comparisons on their components, i.e., $f$ , when applied on $(ℓ, ℓ^{'})$ can only see whether $ℓ_{i} < ℓ_{j}^{'}$ for all $1 \leq i, j \leq d$ . Given $d \in N$ and a choice of decoding function $f$ , the labelable graphs of $f$ are those for which there is a reachability labeling scheme for $f$ .

These definitions generalize several known notions:

The case $d = 1$ is uninteresting: there is only a single possible scheme $f$ , and the labelable graphs are the directed path graphs.
For $d = 2$ , considering the decoding function $f (ℓ, ℓ^{'})$ that checks whether $ℓ_{2} < ℓ_{1}^{'}$ , the labelable graphs are those that represent interval orders
For $d = 2$ , considering the decoding function $f (ℓ, ℓ^{'})$ which is true iff $ℓ_{1} < ℓ_{1}^{'}$ and $ℓ_{2} < ℓ_{2}^{'}$ , the labelable graphs are those that represent posets with order dimension at most two. More generally, for arbitrary $d \in N$ , for the function that takes the AND of the pointwise comparability relations, the labelable graphs are those that represent a partial order with dimension $\leq d$ .

For other values of $d$ and decoding functions, how can we characterize the labelable graphs? In particular, for a given value of $d$ , are some decoding functions more expressive than others? (For $d = 2$ and the two decoding functions that I presented, interval orders and orders of dimension $\leq 2$ are incomparable.) Further, is there $d \in N$ and some decoding function such that any graph has a reachability labeling with $N^{d}$ ? (This sounds extremely unlikely but I'm not sure of how to prove it.)

Preliminary notes about these problems
It appears that the notion of labeling schemes with decoding function presented here, up to some minor differences, has been introduced in this paper³⁰ (see conclusion) under the name local presentation. In their terminology, a local presentation of a DAG is a function mapping each vertex to a $d$ -tuple of integers, with the decoding function being a Boolean function of equalities and inequalities on the tuple components: this is exactly the definition above except that the same number can be used multiple times and we can test for equality. The notion does not seem to have been studied again since that paper, though there have been studies of the related notion of Boolean dimension: this is the case where we can only do comparisons between $ℓ_{i}$ and $ℓ_{i}^{'}$ , i.e., between the same components of the labels. In particular I didn't find an example of a poset family where the size $d$ of local presentations was unbounded.

Boolean dimension of planar posets ¶

Let $G = (V, E)$ be a directed acyclic graph (DAG). A Boolean realizer of $G$ is intuitively a way to label each vertex of $G$ with integers, such that reachability in $G$ can be determined only by looking at some Boolean function over pairwise comparisons of the vertex labels. Formally, a Boolean realizer of $G$ of dimension $d \in N$ consists of a Boolean function $ϕ$ on $d$ variables, and of $d$ labellings $ℓ_{1}, \dots, ℓ_{d}$ , each of which is an injective function from $V$ to $N$ : we require that for any two vertices $u, v \in V$ , letting $x_{i}$ for all $1 \leq i \leq d$ be $1$ if $ℓ_{i} (u) < ℓ_{i} (v)$ and $0$ otherwise, we have $ϕ (x_{1}, \dots, x_{d}) = 1$ iff $v$ is reachable from $u$ in $G$ . The Boolean dimension of $G$ is then the smallest $d$ for which $G$ has a Boolean realizer of dimension $d$ .

Is the Boolean dimension of planar DAGs bounded? In other words, is there a constant $k \in N$ such that, whenever $G$ is planar (i.e., it can be drawn in a way such that no two edges cross), then the dimension of $G$ is no greater than $k$ ?

Posed in: 1989 (cited in³¹)
Last seen open: 2018³¹

Complexity of computing a simplicial decomposition ¶

A simplicial decomposition of a graph $G$ is a tree decomposition such that, for any two adjacent bags $b$ and $b^{'}$ whose set of vertices is respectively $X$ and $X^{'}$ , the subgraph induced by $G$ on $X \cap X^{'}$ is a clique. The width of the decomposition is the size of the largest bag minus one, as usual for treewidth. The simplicial width of $G$ is the smallest possible width of a simplicial decomposition. What is the complexity, given a graph $G$ , of computing its simplicial width and a simplicial decomposition? One can ask the question when allowing arbitrary $G$ , or when assuming that the simplicial width of the input graph is bounded by a constant.

Note that there are known results³² on the complexity of computing a clique minimal separator decomposition, i.e., a simplicial decomposition where we minimize the size of the cliques, not of the bags.

Paper of mine which uses simplicial decompositions.

Covering an undirected graph with cycles of length at least 5 ¶

What is the complexity of determining if an input undirected graph can be covered by cycles, each of which have length at least 5? The problem is known to be PTIME for a length constraint of "at least 3" or "at least 4", and NP-hard for a length constraint of "at least $k$ " for all $k \geq 6$ .

Source

Databases and logic ¶

Oracle complexity of skyline queries ¶

Consider a set of $d$ -dimensional vectors $V = v^{1}, \dots, v^{n}$ of numbers (or indeed any totally ordered set). Let us assume for simplicity that all the $v_{l}^{i}$ are distinct for $1 \leq i \leq n$ and $1 \leq l \leq d$ . In this context, we say that $v^{i}$ dominates $v^{j}$ if $v_{l}^{i} \geq v_{l}^{j}$ for all $1 \leq l \leq d$ . The skyline (or Pareto frontier) of $V$ is the set of maximal vectors for the domination relation.

Assume that the only way you can access the vectors of $V$ is by atomic comparisons, i.e., you can evaluate whether $v_{l}^{i} < v_{m}^{j}$ for any $1 \leq i, j \leq n$ , $1 \leq l, m \leq d$ . What are bounds on the minimal number of comparisons required to determine the skyline? Such bounds can be stated as a function of $k$ and $n$ , or as a function of the output size (the actual skyline).

Groz and Milo¹⁰ have studied this problem when the comparisons are additionally noisy, but good bounds for $d > 3$ are not yet known, even without noise.

Constraint classes where separability is decidable ¶

Consider a set $Σ$ of tuple-generating dependencies (TGDs) and a set $Φ$ of equality-generating dependencies (EGDs). We say that a Boolean conjunctive query (CQ) $Q$ is entailed by a relational database instance $I$ and by $Σ \land Φ$ if, for every superinstance $I^{'} \supseteq I$ such that $I^{'}$ satisfies $Σ \land Φ$ , it is the case that $I^{'}$ satisfies $Q$ . Intuitively, $Q$ is implied by the instance $I$ and the constraints $Σ \land Φ$ under open-world semantics: $Q$ is true on all superinstances of $I$ that satisfy the constraints $Σ \land Φ$ .

We call $Σ$ and $Φ$ separable if, for any Boolean CQ $Q$ and instance $I$ that satisfies $Φ$ , the following equivalence holds: $Q$ is entailed by $I$ and $Σ \land Φ$ iff $Q$ is entailed by $I$ and $Σ$ . In other words, the equality-generating dependencies $Φ$ can be checked separately on $I$ , and have no impact afterwards in terms of entailment.

For general TGDs and EGDs, the problem is undecidable¹¹, but this is not so surprising as reasoning with TGDs by themselves is also undecidable if arbitrary TGDs are allowed. Are there less expressive languages for which separability is decidable? One could think, e.g., of inclusion dependencies and functional dependencies, maybe with some restrictions.

Open-world query answering with linear rules and transitivity assertions ¶

A linear rule is a tuple-generating dependency (TGD) with exactly one atom in the body and one atom in the head. A transitivity assertion on an arity-two relation $R$ is a TGD of the form $\forall x y z R (x, y) \land R (y, z) \Rightarrow R (x, z)$ .

It is decidable²⁴ whether an instance $I$ and constraints $Σ$ of linear rules and transitivity assertions entail a CQ $Q$ , in the sense of the previous problem, but under some restrictions (all relations have arity at most two, or $Q$ consists of a single atom, or an additional condition holds). Does decidability hold in the general case of arbitrary CQ and arbitrary arity signatures for linear rules and transitivity assertions?

Decidable unary language with number restrictions ¶

With Michael Benedikt we introduced a language formed of expressive constraints on an arity-two signature (including number restrictions, namely counting quantifiers) and tuple-generating dependencies on arbitrary arity predicates, also supporting functional dependencies of a restricted kind on such predicates. We showed that it was decidable to determine (in the sense above) whether a query is entailed by an instance and such constraints, which we call the open-world query answering (OWQA) problem.

This raises the question of whether a more expressive and more uniform such language could be designed. It should also have decidable entailment, and should also feature (1) arbitrary arity constraints, (2) number restrictions such as functional dependencies, and (3) expressive logical operators such as disjunction. Can such a language be designed? Relevant results include the following (beyond the ones shown in the paper mentioned above):

OWQA is decidable for the two-variable guarded fragment of first-order logic with counting quantifiers²⁵. This covers (2) and (3).
OWQA is decidable for the guarded fragment of first-order logic²⁶: this covers (1) and (3).
OWQA is decidable with functional dependencies and unary functional dependencies, thanks to the fact that they satisfy the non-conflicting condition²⁷. The same holds for other classes of tuple-generating dependencies for which OWQA is decidable (e.g., guarded tuple dependencies) and functional dependencies. This covers (1) and (2).
OWQA is not decidable for inclusion dependencies and functional dependencies in general, because their implication problem is undecidable²⁸, and it reduces²⁹ to OWQA. This means we cannot have (1) and (2) without restrictions.

A general idea would be to design a language that generalizes frontier-one TGDs to have disjunction and some kind of "non-conflicting" counting quantification, and establish decidability.

Following this paper, another natural direction would be to study finite model reasoning for such a language. However, one should first show the decidability of the finite implication problem for it: decide whether a set of constraints of the language implies other constraints. Yet, as far as I know, the only decidability result for finite implication on an arbitrary-arity language that features number restrictions only covers³⁹ unary inclusion dependencies and functional dependencies. Can finite implication be decided for a more expressive language of this form? E.g., in the spirit of existing works⁴⁰ but for arbitrary arity constraints.

Decidability of finite query answering with path-functional dependencies and two-variable guarded constraints ¶

This question uses the definition of OWQA from the previous question. It was recently shown⁴¹ that the finite satisfiability problem is decidable for the two-variable guarded fragment of first-order logic with counting quantifiers (GC²) to which one adds path-functional dependencies. Is the same true of the finite OWQA problem? This is already known if there are no path dependencies²⁵.

If this is true, it would imply an independent proof of the result of this paper when assuming that all functional dependencies are unary. Indeed, unary inclusion dependencies and functional dependencies on arbitrary signatures can clearly be equivalently translated to GC² plus path functional dependencies (with paths of length 2). To re-prove this result for arbitrary functional dependencies, one would need some generalization of path functional dependencies (to capture the image of this translation).

Decidability of conjunctive query containment under bag semantics ¶

Is conjunctive query containment under bag semantics decidable, and what is its complexity?

Slides (2013) that define the problem.

Determinacy of path queries by unions of path views ¶

For $k \in N$ , the path query $q_{k}$ of length $k$ on a directed graph $G = (V, E)$ returns the set of pairs of nodes $(u, v) \in V^{2}$ such that there is a path of length $k$ from $u$ to $v$ . A union of path queries $q_{S}$ is defined by a (possibly infinite) set $S$ of integers and returns the sets of pairs of nodes $(u, v)$ such that there is a path from $u$ to $v$ whose length is in $S$ .

A set of union of path queries $Q = q_{S_{1}}, \dots, q_{S_{n}}$ determines a path query $q_{k}$ if, for all finite graphs $G$ and $G^{'}$ , if $q_{S_{i}} (G) = q_{S_{i}} (G^{'})$ for all $1 \leq i \leq n$ , then $q_{k} (G) = q_{k} (G^{'})$ . Under mild assumptions on the representation of infinite sets in union of path queries, the determinacy problem was shown⁴³ to be decidable but only for path queries $q_{k}$ where $k$ is larger than some function of $Q$ . Is it decidable to determine whether a path query is determined by a set of union of path queries, without this restriction?

Another question is whether this result extends to more general query languages. For instance, is it decidable to determine whether a union of path queries is determined by a set of union of path queries?

A natural further generalization of this problem is going from paths of a fixed length to paths labeled by some language, when edges are labeled by letters of some alphabet. Formally, we define a regular path query $q_{L}$ as a query on edge-labeled graphs that returns the pairs of vertices connected by a path in $L$ , where $L$ is a regular language over the alphabet of edge labels. In this context, however, it was shown that it was undecidable whether a regular path query was determined by a set of regular path queries⁴⁴, and it is even undecidable whether using only finite regular languages⁴⁵ (or, equivalently, union of labeled path queries, i.e., unions of queries returning all pairs of vertices separated by a path labeled by some finite word). So with these generalisations of the above problem being undecidable, it is not clear whether we should expect decidability for (unlabeled) path queries, or unions thereof.

Thanks to Nadime Francis and to Bartosz Bednarczyk for helping me to prepare this entry.

Do tractable queries on probabilistic instances have tractable lineages? (*)¶

A tuple-independent database (TID) is a probabilistic database consisting of a relational database $I$ where each tuple is given a probability in $[0, 1]$ . The semantics is a probability distribution on subsets of $I$ (the possible worlds), obtained by considering that each tuple is either kept or removed with the indicated probability, independently across tuples.

A Boolean union of conjunctive queries (UCQ) is a disjunction of conjunctive queries with no free variables. The probability evaluation problem for a fixed Boolean UCQ $q$ asks, given a TID $I$ , what is the probability that $I$ satisfies $q$ , meaning, what is the total probability mass of possible worlds of $I$ that satisfy $q$ . A dichotomy result is known⁴⁶: Boolean UCQs are partitioned between those for which the probability evaluation problem can be solved in polynomial time, and those where it is #P-hard.

The lineage of a Boolean query $q$ on a TID instance $I$ is a Boolean formula whose variables are the facts of $I$ , such that a valuation makes the circuit evaluate to true iff the corresponding possible world (defined by keeping exactly the facts set to true) satisfies $q$ . One way to show that a Boolean UCQ is tractable is to show that, on any instance, one can represent its lineage in a form that allows for tractable probability evaluation.

A d-DNNF $^{\neg}$ is a Boolean circuit (with AND, OR, NOT gates, and input gates) such that every AND gate is on disjoint set of variables (meaning, there is no input gate reachable from two distinct children of an AND gate), and every OR gate is exclusive (meaning, there is no valuation of the input gates that makes two distinct children of an OR gate evaluate to true). A Boolean UCQ has polynomial-size d-DNNF $^{\neg}$ if there is a constant $c \in N$ such that, for any TID instance $I$ , there is a d-DNNF $^{\neg}$ of size $O ({| I |}^{c})$ that expresses the lineage of $q$ on $I$ .

It is known⁴⁷ that many Boolean UCQs for which probabilistic query evaluation is in PTIME have polynomial-size d-DNNF $^{\neg}$ , but the converse is open. Is there a Boolean UCQ for which probabilistic query evaluation is in PTIME, but that has no polynomial-size d-DNNF $^{\neg}$ ? In particular, is this the case of the conjectured⁴⁷ counterexample $Q_{9}$ ? If this can be shown, is it possible to design a generalization of d-DNNF $^{\neg}$ that still enjoys probabilistic query evaluation, and is such that any Boolean UCQ with PTIME probabilistic query evaluation has a polynomial-size lineage in this generalization?

Update: the most recent work in this area is this paper by Mikaël Monet

Does bounded derivation depth imply finite controllability?¶

We talk of a Boolean conjunctive query $Q$ being entailed by an instance $I$ and set $Σ$ of tuple-generating dependencies (TGDs) if for any instance $I^{'} \supseteq I$ such that $I^{'} ⊨ Σ$ , we have $I^{'} ⊨ Q$ .

The set $Σ$ of tuple-generating dependencies (TGDs) has bounded derivation depth if, for any conjunctive query $Q$ , there exists a union of conjunctive queries $Q^{'}$ such that, for any relational instance $I$ , the following equivalence holds: $Q$ is entailed by $I$ and $Σ$ iff $I ⊨ Q^{'}$ .

The set $Σ$ of TGDs is finitely controllable iff for any instance $I$ and Boolean conjunctive query $Q$ , the following equivalence holds: $Q$ is entailed by $I$ and $Σ$ iff $Q$ is entailed by $I$ and $Σ$ over finite models (impose that $I^{'}$ is finite in the definition above).

Is it the case that if a set of TGDs has bounded derivation depth then it is finitely controllable? In the work where this conjecture was posed⁴⁸, it was solved for signatures of arity two, but the general question remains open.

What is the complexity of testing if a query is safe?¶

This question refers to the dichotomy⁴⁶ on the Boolean UCQs for which the probability evaluation problem is in PTIME: see this question for background. The queries which are tractable according to this dichotomy are safe.

What is the complexity, given a Boolean UCQ, to determine whether it is safe?

The best known algorithm for this is super-exponential¹⁴.

Last seen open: 2011¹⁴

Complexity of query evaluation parameterized by treewidth ¶

The query evaluation problem asks, given a Boolean conjunctive query $Q$ and a relational database $I$ , whether $Q$ holds on $I$ . The treewidth of $I$ is the treewidth of its Gaifman graph, and the treewidth of $Q$ is that of its canonical instance (i.e., we just see the query as a database on its variables). What is the parameterized complexity of the query evaluation problem when parameterized by the treewidth of $I$ and of $Q$ ? (In other words, the parameter of the problem is an upper bound on the treewidth of both $I$ and $Q$ .)

CStheory question where the problem is phrased in terms of graph homomorphisms (query evaluation can be understood as deciding the existence of a homomorphism between labeled hypergraphs). Also includes pointers to related work.

How can one strengthen lower bounds for probabilistic query evaluation on unbounded-treewidth families? (*)¶

In this paper with Pierre Bourhis and Pierre Senellart, we showed (Theorem 4.2) that there is a query for which the probabilistic query evaluation problem (see this question for the definition) is intractable on any unbounded-treewidth instance family. However, the lower bound of this result has some limitations, and I do not know whether they can be lifted:

Does the result generalize to arbitrary arity signatures rather than arity-2 signatures?
Can the query be in a weaker language than MSO? For instance, can it be a union of conjunctive queries with inequalities ( ${UCQ}^{\neq}$ )?
Does the result hold even if we are not completely free to choose the probability valuation of the input instance?
Is it possible to use PTIME reductions rather than RP reductions? This is asked separately as this question.

Lower bounds on lineage sizes ¶

In the same paper as in the previous question, we showed (in Lemma 8.2) that, for any graph signature, there is a UCQ with inequalities $Q$ such that, given any instance $I$ , the width of any OBDD representing the lineage of $Q$ on $I$ (see this question for the definition) is bounded by an exponential function of the treewidth of $I$ : specifically, there is a constant $c \in N_{> 0}$ such that it is in $Ω (2^{(width (I))^{1 / c}})$ .

Can the same result be shown for other lineage representations? In particular, can it be shown for d-DNNFs?

Update: our ICDT'19 paper with Mikaël Monet and Pierre Senellart shows a similar lower bound for the class of d-SDNNF (structured d-DNNF). The question of showing a similar bound for d-DNNF is very challenging and relates to the open problem of separating DNFs in general and d-DNNFs.

Boolean functions and circuits ¶

Monotone dualization ¶

The prime CNF of a monotone Boolean function over variables $x_{1}, \dots, x_{n}$ is the (unique up to order) expression of the function as a conjunction of disjunctions of the $x_{i}$ from which no variable occurrence can be removed. The dual of a monotone Boolean function $f$ is the monotone Boolean function mapping $x_{1}, \dots, x_{n}$ to $\neg f (\neg x_{1}, \dots, \neg x_{n})$ . By De Morgan's laws, a Boolean expression for the dual can be obtained by replacing $\land$ 's by $\lor$ 's and vice-versa in the prime CNF of $f$ .

The Dual problem asks, given two prime CNFs, whether the functions defined by these CNFs are dual of one another. It is known to be in quasipolynomial time¹⁵. Is the Dual problem in PTIME?

Explicit Boolean functions with supralinear circuits ¶

Shannon proved in 1949 with a counting argument that most Boolean functions cannot be represented by a circuit of linear size. However, we do not know yet of any explicit Boolean function for which no linear size circuit exists. Can we construct such a function?

Conciseness gap between formulae and circuits ¶

Boolean functions can be represented as circuits or as formulae. Circuits seem much more concise, because they can reuse common subexpressions. Yet the best conciseness gap known is the following: there are Boolean functions that can be represented by linear circuits but for which any formula representation has size at least $n^{3 - o (1)}$ (over circuits with AND, OR, and NOT). Can we do better?

TCS.SE question

Size bounds on smoothing structured circuit representations ¶

A Boolean circuit is structured if there is a fixed full binary tree whose leaves are labeled by the variables (the vtree) and a mapping from the gates of the circuit to the nodes of the tree such that every variable is mapped to the leaf corresponding to itself and the inputs to every gate $g$ are mapped to descendants of the node to which $g$ is mapped. A Boolean circuit is smooth if, intuitively, no variable is omitted, i.e., whenever we take the disjunction of two gates then the set of variables reachable from each gate is the same.

Given a Boolean structured circuit, what is the complexity of computing an equivalent circuit which is structured but still smooth? What about the same question while preserving other desirable circuit properties such as being deterministic?

Paper with complete definitions and a partial result (Proposition 7.1)

A circuit is decomposable if the inputs to every AND-gate depend on pairwise disjoint sets of variables. What can be said about the same question for smoothing decomposable circuits? There is a partial result (applying only to a certain class of "smoothing-gate algorithms") as Theorem 5.2 in the paper above.

Formal languages ¶

Shortest superpermutation ¶

An $n$ -superpermutation is a word $w$ over ${1, \dots, n}$ such that each permutation of ${1, \dots, n}$ occurs as a subsequence of $w$ . What is the length of the shortest $n$ -superpermutations as a function of $n$ ?

Languages recognized by polynomial-size DFAs ¶

Which languages can be recognized by a (not necessarily uniform) family of deterministic finite automata of polynomial size?

TCS.SE question

Context-freeness of primitive words ¶

A primitive word is a word that cannot be represented as a power of another word. Is it true that, on any alphabet with more than one letter, the set of all primitive words is not context-free?

There is a book about the question (not available online): P. Domosi, M. Ito, Context-Free Languages and Primitive Words

Words without shuffle squares (*)¶

Fix an alphabet $Σ$ . We say that a word $w \in Σ^{*}$ is the shuffle of two words $u, v \in Σ^{*}$ if we can obtain $w$ by interleaving $u$ and $v$ . A word $w \in Σ^{*}$ is a shuffle square if there is a word $u \in Σ^{*}$ such that we can obtain $w$ as the shuffle of $u$ and $u$ . (Note that this implies that, for each letter $a \in Σ$ , there is an even number of occurrences of $a$ in $w$ . While the number of occurrences of each letter in $u$ are uniquely defined from $w$ , the order of the letters is not.) We say that a word is shuffle-square-free if it contains no substring which is a shuffle-square.

If the alphabet $Σ$ has at least $6$ letters then it is known that there exist infinitely many shuffle-square-free words⁴⁹, following an earlier result⁵⁰ on alphabet size $7$ . If the alphabet $Σ$ has 3 letters or less, then it is known⁵¹ that such words do not exist: the longest such words on $Σ = {a, b, c}$ are abcacbacabc, acbabcabacb, bacbcabcbac, bcabacbabca, cabcbacbcab, cbacabcacba, of length 11 each. (I'm adding them to this entry to ensure that people interested in this computation can easily find the page. ;)) The question is open for alphabets of size 4 and 5.

Are there infinitely many shuffle-square-free words on an alphabet of size 4?

Source code repository which we used to bruteforce the computation of large shuffle-square-free words using a SAT solver. Some documentation is provided as a README. This is joint work with Charles Paperman.

Which regular tree languages can be recognised by a word automaton?¶

Given a tree alphabet $Σ$ , we define the alphabet $¯ Σ$ to consist of ${a ∣ a \in Σ} \cup {¯ a ∣ a \in Σ}$ . Given an unranked tree $T$ on a tree alphabet $Σ$ , the XML representation of $T$ is the word on $¯ Σ$ recursively defined as follows: the coding of a leaf labeled $a \in Σ$ is $a ¯ a$ , and the coding of an internal node labeled $a \in Σ$ with children $n_{1}, \dots, n_{k}$ is $a c_{1} \dots c_{k} ¯ a$ where each $c_{i}$ is the coding of $n_{i}$ .

Given a tree automaton $A$ on unranked trees on $Σ$ , it defines a so-called regular tree language $L (A)$ , and we denote by $X M L (L (A))$ the word language on $¯ Σ$ of the XML codings of trees of $L (A)$ . Of course, for all regular tree languages except finite ones, $X M L (L (A))$ is not regular as a word language, because a word automaton cannot check if the opening and closing tags (i.e., the $a$ 's and $¯ a$ 's) are properly nested. The question is intuitively to understand the regular tree languages for which matching the opening and closing tags is the only difficulty.

Formally, we say that a word automaton $A$ weakly recognises a tree language $L$ if, given a word $w$ on $¯ Σ$ which is the XML representation of some tree $T$ , then $A$ accepts $w$ iff $T \in L$ . (The behavior of $A$ on words of $¯ Σ$ that do not represent any tree is not specified.)

Given a tree automaton $A$ , is it decidable to determine if there exists a word automaton that weakly recognizes the language $X M L (L (A))$ ?

The problem, in the specific case of DTDs (a special case of tree automata), has been shown⁵² to be equivalent to a variant of the word problem for groups, whose decidability status is open. The status of the general problem is also open.

Thanks to Bartosz Bednarczyk for pointing out this problem to me, and to Charles Paperman for the problem phrasing used here.

Miscellaneous ¶

Lower bounds on representations of provenance ¶

The Why-provenance⁵³ of a Boolean query $q$ in the positive relational algebra on a relational instance $I$ is a Boolean function $ϕ$ defined on the facts of $I$ such that, for any valuation $ν$ mapping each fact of $I$ to true or false, $ν (ϕ)$ is true iff ${F \in I ∣ ν (F) is true}$ satisfies $q$ .

It is known⁵³ that, for any such fixed query $q$ , its Why-provenance on $I$ can be represented as a Boolean formula of polynomial size in the instance $I$ . Are there instance families with known lower bounds on the representation of Why-provenance? In particular, is there a query $q$ and family of relational instances $I_{1}, \dots, I_{n}, \dots$ , such that, letting $ϕ_{i}$ for all $i$ be the Why-provenance of $q$ on $I_{n}$ , the size of $ϕ_{i}$ is superlinear in $I_{i}$ , namely, there is no constant $K$ such that, for all $i$ , the provenance $ϕ_{i}$ can be written as a Boolean formula of size less than $K \cdot | I |$ , where $| I |$ is the number of facts of $I$ .

The question generalizes to other representations of provenance, such as provenance circuits⁵⁴, or to provenance expressed in different provenance semirings⁵³, such as the universal semiring $N [X]$ of polynomials in $X$ (standing for the facts of $I$ ).

Complexity of multi-machine scheduling of jobs with start dates, end dates, and equal duration ¶

We have a certain number $m$ of machines, and a number $n$ of jobs. All jobs have the same duration (an integer $p$ ), and each job has a minimum start date and a maximal end date, both of which are integers. The scheduling problem asks whether there is a way to schedule all $n$ jobs on the $m$ machines. We can phrase it in two variants: the decision variant simply asks whether it is possible, and the computation variant asks us to compute a possible schedule, i.e., a partition of the $n$ jobs into $m$ sequences such that each machine can perform its sequence of $m$ jobs in a way that respects the start date and end dates of all jobs, and the job duration. The problem can equivalently be phrased with jobs of unit duration and start and end dates which are rationals.

Somewhat surprisingly (to me at least), this problem can be solved in PTIME. The best known algorithm⁵⁹ (which additionally optimizes an additional criterion) runs in time $O (m i n (1, \frac{p}{m}) n^{2})$ . What is the most efficient algorithm for this problem?

For the case $m = 1$ , a more efficient algorithm is known⁶⁰, which runs in $O (n log n)$ . In this setting, there is a clear lower bound of $Ω (n log n)$ for the computation problem, as the scheduling problem for jobs with disjoint intervals amounts to sorting their bounds; but for the decision problem, this is unclear.

Thanks to the judges of the 2017 ACM-ICPC World Finals for bringing this problem to my attention (see Problem H, "Scenery").

Which convex polytopes have volumes of polynomial bit-length?¶

In this question, all numbers are represented as rationals. Consider a convex polytope described as an intersection of half-spaces, each of which is described as inequalities between a linear function of the coordinates and a constant (e.g., $x_{1} + \frac{7}{42} x_{2} \leq 3$ ). We assume that the polytope is bounded, i.e., it is not infinite. Given such a representation, we wish to compute the volume of the polytope. It turns out⁶¹ that this cannot be done in polynomial time (or even in nondeterministic PTIME) because the volume is not always of polynomial length in the input description. Hence the question: For which classes of polytopes is the volume always of polynomial length in the input?

TCS.SE question

Well-known conjectures ¶

The 1/3-2/3 conjecture: In every non-totally-ordered poset $P$ , there are $x$ and $y$ such that $x < y$ in at least 1/3 and at most 2/3 of the linear extensions of $P$ .
The removable pair conjecture: Can you always remove two elements from a poset of $\geq 3$ elements such the order dimension drops by at most 1?
The union-closed sets conjecture: calling a family of sets union-closed if the union of any sets in the family remains in the family, is it true that any union-closed finite family of finite sets must have an element occurring in at least half the sets of the family? (except the family containing just the empty set)
The reconstruction conjecture: Given a graph $G$ with $n$ vertices, its deck is the multiset of the $n$ subgraphs obtained, for each vertex, by deleting this vertex and its incident edges. For $n \geq 3$ , if two graphs have the same deck (up to isomorphism), are they necessarily isomorphic?
What is the complexity of matrix multiplication?
Can integer factorization be done in polynomial time?
The Zarankiewicz problem: given $m, n, s, t$ , how many edges can you put into a bipartite graph between $m$ vertices and $n$ vertices, such that there is no complete bipartite subgraph between $s$ vertices or $t$ vertices?
The strong Aanderaa–Karp–Rosenberg conjecture
The Erdős–Gyárfás conjecture: does every 3-regular graph have a simple cycle whose length is a power of two?
Singmaster's conjecture: is there a constant bounding the maximal number of times that a number different from 1 may appear in Pascal's triangle?
Tuza's conjecture: is it true that the smallest size of a subset of edges that hits all triangles of a graph is at most two times the largest number of edge-disjoint triangles that can be packed in the graph?
Is it possible to solve graph coloring and computing maximum independent sets in polynomial time on even-hole-free graphs?

Solved problems ¶

This section regroups problems that occurred in a previous version of this list, and were solved since then.

Complexity of an assignment problem with subsets ¶

It is known that a maximum matching can be found in PTIME. In particular, this is the case of bipartite graphs. However, what about the situation where we are allowed to remove vertices in one of the parts, and where we want to maximize the number of vertices that must remain unmatched in a maximum matching? What is the complexity of this problem?

TCS.SE question with precise statement

Proved to be in PTIME by Chao Xu in this TCS.SE answer. Thanks!

Decidability of conjunctive query determinacy in the finite ¶

The query determinacy problem asks, given a set $Q_{1}, \dots, Q_{n}$ of conjunctive queries (CQs) and an additional CQ $Q$ , whether the $Q_{i}$ determine $Q$ , i.e., whether, for any two (possibly infinite) relational databases $I$ and $I^{'}$ , having $Q_{i} (I) = Q_{i} (I^{'})$ for all $1 \leq i \leq n$ implies we must have $Q (I) = Q (I^{'})$ . The query determinacy problem was shown⁴² to be undecidable but this work left open the problem of finite determinacy, which is defined as determinacy except that $I$ and $I^{'}$ must be finite.

Is finite determinacy also undecidable for conjunctive queries?

Proved by the authors of the original work⁴² in a new paper. As this was accepted for publication at PODS 2016, I think we can say that this has been accepted by the community, although I haven't personally checked the proof.

Steinberg's conjecture ¶

Grötzch's theorem says that planar graphs without 3-cycles (triangles) are 3-colorable. Is every planar graph without 4-cycles and 5-cycles 3-colorable?

The question on Open Problem Garden

Disproved in this paper. I haven't personally checked it, but it has been published in the Journal of Combinatorial Theory, so I'm assuming it is correct.

Feder-Vardi conjecture ¶

Letting $Γ$ be a digraph, the CSP for $Γ$ , written $CSP (Γ)$ , is the following problem: given a digraph $G$ , decide whether there is a homomorphism from $G$ to $Γ$ . The Feder-Vardi conjecture³³ asks: Is there a dichotomy on $Γ$ , with $CSP (Γ)$ being always either in PTIME or NP-hard?

An analogous result is already known for undirected graphs³⁴, for some restricted classes of directed graphs³⁵, and when $Γ$ has two elements³⁶ or more recently three elements³⁷.

Proved: two recent preprints by Bulatov and by Zhuk claim a proof of this result. The papers were accepted for publication at FOCS 2017 so I'm classifying this as solved (accepted by the community), even though it may still be a good idea to verify these proofs in more detail (see this post). I have not personally checked any of this.

Thanks to Mikaël Monet for pointing out that the conjecture for digraphs implies³⁸ the conjecture for relational structures.

Complexity of counting linear extensions in posets of height 2 ¶

It is #P-hard to count the number of linear extensions of an input poset². The proof uses posets of height 3, and leaves open the natural question: is it #P-hard to count linear extensions in posets of height 2?

An equivalent formulation is in terms of directed bipartite graphs (all edges go from one part to the other part): given a directed bipartite graph as input, is it #P-hard to count the number of topological sorts of the graph?

Solved in this paper (shown to be #P-hard). Note that I haven't personally checked it, and it does not seem to have been peer-reviewed yet. Thanks to Kuldeep S. Meel for pointing me to that paper.

Complexity of counting linear extensions in posets of dimension 2 ¶

This is similar to the question above, but for order dimension: is it #P-hard to count linear extensions in posets of dimension 2?

Solved in the same paper as above (shown to be #P-hard).

Retired problems ¶

This section contains former questions which I no longer think are interesting or sensible, or for which my interest has waned.

What is the connection between the fluted fragment and inversion-free queries?¶

The fluted fragment is a fragment of function-free first-order logic for which the satisfiability problem is decidable¹² (but non-elementary). This definition is reminiscent of inversion-free queries, which are known¹³ to be exactly the UCQs that admit OBDD lineages on TID databases (see this question for relevant definitions).

Is there any connection between these two classes?

The original version of this question incorrectly claimed that inversion-free queries are the positive existential fragment of fluted queries, but this is in fact not correct: there are non-hierarchical CQs, e.g., $\exists x y R (x), S (x, y), T (y)$ , which are fluted. So while there is still a connection in the definitions, the link is no longer as compelling, and I don't think the question is so interesting anymore.

Thanks to the anonymous Reviewer 3 of our paper at PODS 2016 for pointing out this connection. Thanks to Bartosz Bednarczyk for bringing back this problem to my attention, which made me realize the error.

Compactness difference between probabilistic XML document formalisms ¶

The PrXML formalism is a way to represent probability distribution on labeled, unranked, unordered trees⁵⁵. It does so by representing uncertainty with special kinds of nodes:

ind nodes have each child labeled with a probability in $[0, 1]$ indicating the probability (independently across draws) that the child node is kept, or that it is discarded along with its descendants; the ind node is replaced by the collection of its kept children;
mux nodes have each child labeled with a probability in $[0, 1]$ , the probabilities summing to at most $1$ , and one of the children is kept by drawing according to the probabilities (if the sum is $< 1$ there is a probability of keeping no child); the children that are not kept are removed with their descendants, and the mux node is replaced by its kept child (if any);
det nodes are deterministically replaced by the collection of their children;
cie nodes have their children labeled by a conjunction of Boolean variables from a global set of variables, where each variable has some probability in $[0, 1]$ of being true (the same variables can be used in multiple cie nodes); the semantics is that a valuation for the variables is drawn by setting each variable as true or false with the indicated probabilities, independently, and replacing each cie node by the children whose edge annotation evaluates to true under the chosen variable valuation;
exp nodes are annotated with an explicit probability distribution: a set of pairs of a subset of children of the nodes and a probability, the probabilities summing to 1.

We give names to families of probabilistic XML documents according to the nodes that are allowed, e.g., PrXML^{ind, mux} refers to probabilistic documents where nodes of type ind and mux are allowed. A family is efficiently translatable in another if any document of the first family can be rewritten to a document in the second. Some families are known to be efficiently translatable in others⁵⁶, but open questions remain: is PrXML^exp efficiently translatable in PrXML^mux,det, or even to PrXML^cie?

Thanks to Pierre Senellart for helping me to prepare this entry.

Complexity of testing the equivalence of PrXML^cie ¶

Using the previous definitions, the semantic equivalence problem for two PrXML documents asks whether the two documents define the same probability distribution on outcomes (same possible worlds, with same probabilities). Given two PrXML^cie documents, what is the probability of deciding whether they are semantically equivalent? The problem is in EXPTIME but no lower bound or hardness result is known⁵⁷. What is the complexity of this problem?

The structural equivalence problem asks, given two PrXML^cie documents, whether, for each valuation of the variables, the corresponding valuations of both documents are the same. (Note that the probabilities assigned to the events is irrelevant.) The problem is in coNP, but no lower bound or hardness result is known⁵⁸. What is the complexity of this problem?

Thanks to Pierre Senellart for helping me to prepare this entry.

J. S. Provan, M. O. Ball. The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected, 1983. ↩
G. Brightwell, P. Winkler. Counting Linear Extensions is #P-Complete, 1991. ↩
G. Pruesse, F. Ruskey. Generating Linear Extensions Fast, 1994. (closed access) ↩
K. Ragnarsson, B. Eileen Tenner. Obtainable Sizes of Topologies on Finite Sets, 2008. To understand the connection, remember that a topology with $m$ points and $k$ open sets is the same as a poset with $m$ elements and $k$ order ideals (as they state in the abstract), and this is the same as a poset with $m$ elements and $k$ antichains (take the set of maximal elements of each order ideal). Their Theorem on page 2 gives the logarithmic bound. We independently derived a similar result with a similar proof but a worse bound in our paper with Yael Amsterdamer and Tova Milo, Lemma B.2, and the size of the posets is in $O ({log}^{2} n)$ . ↩
S. Akmal, V. V. Williams, R. Williams, Z. Xu., Faster Detours in Undirected Graphs, 2023. Look for "it remains open whether k-Longest Detour is in P even for the special case of k = 1!" ↩
W. Martens, T. Popp. The Complexity of Regular Trail and Simple Path Queries on Undirected Graphs, 2022. ↩
E. M. Arkin, C. H. Papadimitriou, M. Yannakakis. Modularity of Cycles and Paths in Graphs, 1991. ↩
E. Hemaspaandra, H. Spakowski, M. Thakur. Complexity of Cycle Length Modularity Problems in Graphs, 2004. ↩
B. Fredriksson, E. Lundberg. The Monk Problem: Verifier, heuristics and graph decompositions for a pursuit-evasion problem with a node-located evader, 2015. ↩
B. Groz, T. Milo. Skyline Queries with Noisy Comparisons, 2015. ↩
A. Calì, G. Gottlob, G. Orsi, A. Pieris. On the Interaction of Existential Rules and Equality Constraints in Ontology Querying, 2012. ↩
I. Pratt-Hartmann, W. Szwast, L. Tendera. Quine’s Fluted Fragment is Non-elementary, 2016. ↩
A. Jha, D. Suciu. Knowledge Compilation Meets Database Theory: Compiling Queries to Decision Diagrams, 2011. See Theorem 4.2. ↩
D. Suciu, D. Olteanu, C. Ré, C. Koch. Probabilistic Databases, 2011. See end of Section 4.1.6.1. ↩↩
T. Eiter, K. Makino, G. Gottlob. Computational Aspects of Monotone Dualization: A Brief Survey, 2008. ↩
D. G. Corneil, M. Habib, J.-M. Lalignel, B. Reed, U. Rotics. Polynomial-time recognition of clique-width ≤ 3 graphs, 2012 (only available behind a paywall). ↩↩
P. Heggernes, D. Meister, U. Rotics. Computing the clique-width of large path powers in linear time via a new characterisation of clique-width), ↩
P. Hliněný, S. Oum. Finding Branch-Decompositions and Rank-Decompositions, 2007. ↩
N. Roberston, P. D. Seymour. Graph Minors. V. Excluding a Planar Graph, 1986. ↩
J. Chuzhoy, Improved Bounds for the Excluded Grid Theorem, 2016. See Theorem 1.1, and the three following paragraphs. ↩↩↩↩
N. Roberston, P. D. Seymour. R. Thomas. Quickly Excluding a Planar Graph, 1994. See last paragraph of Section 1. ↩
C. Chekuri, J. Chuzhoy. Polynomial Bounds for the Grid-Minor Theorem, 2014. ↩
C. Chekuri, personal communication, 2015. (That's academic lingo to say that we emailed them and they don't know yet.) ↩
J.-F. Baget, M. Bienvenu, M.-L. Mugnier, S. Rocher. Combining Existential Rules and Transitivity: Next Steps, 2015. ↩
I. Pratt-Hartmann. Data-complexity of the two-variable fragment with counting quantifiers, 2009. ↩↩
V. Bárány, G. Gottlob, M. Otto. Querying the Guarded Fragment, 2014. ↩
A. Calì, G. Gottlob, A. Pieris. Towards more expressive ontology languages: The query answering problem, 2012. See Section 6.2.2. ↩
J. Mitchell. The Implication Problem for Functional and Inclusion Dependencies, 1983 ↩
A. Calì, D. Lembo, R. Rosati. Query rewriting and answering under constraints in data integration systems, Theorem 3.4. However, the proof of this result has a slight inaccuracy; see Appendix A.2 of this paper for details. ↩
G. Gambosi, J. Nešetřil, M. Talamo. On locally presented posets, 1990. ↩
S. Felsner, T. Mészáros, P. Micek. Boolean dimension and tree-width, 2018. ↩↩
H.-G. Leimer. Optimal decomposition by clique separators, 1993. ↩
T. Feder, M. Y. Vardi. The Computational Structure of Monotone Monadic SNP and Constraint Satisfaction: A Study through Datalog and Group Theory, 1998. ↩
P. Hell, J. Nešetřil. On the complexity of H-coloring, 1990. ↩
L. Barto, M. Kozik, M. Maróti, T. Niven. CSP dichotomy for special triads, 2009. ↩
T. J. Schaeffer. The complexity of satisfiability problems, 1978. ↩
A. A. Bulatov. A dichotomy theorem for constraints on a three-element set, 2002. ↩
T. Feder, M. Y. Vardi. The Computational Structure of Monotone Monadic SNP and Constraint Satisfaction: A Study through Datalog and Group Theory, 1998. See Theorem 10 and subsequent theorems. ↩
S. S. Cosmadakis, P. C. Kanellakis, M. Y. Vardi. Polynomial-time implication problems for unary inclusion dependencies (only available behind a paywall), 1990. ↩
Y. Ibañez-García, C. Lutz, T. Schneider. Finite Model Reasoning in Horn Description Logics, 2014. ↩
G. Kourtis, I. Pratt-Hartmann. Adding Path-Functional Dependencies to the Guarded Two-Variable Fragment with Counting, 2016. ↩
T. Gogacz, J. Marcinkowski. The Hunt for a Red Spider: Conjunctive Query Determinacy Is Undecidable, 2015. ↩↩
N. Francis. Asymptotic Determinacy of Path Queries using Union-of-Paths Views, 2015. ↩
G. Głuch, J. Marcinkowski, P. Ostropolski-Nalewaja. Can One Escape Red Chains? Regular Path Queries Determinacy is Undecidable, 2018. ↩
G. Głuch, J. Marcinkowski, P. Ostropolski-Nalewaja. The First Order Truth behind Undecidability of Regular Path Queries Determinacy, 2019. ↩
N. Dalvi, D. Suciu. The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries, 2012. ↩↩
A. Jha, D. Suciu. Knowledge Compilation Meets Database Theory: Compiling Queries to Decision Diagrams, 2011. ↩↩
T. Gogacz, J. Marcinkowski. On the BDD/FC Conjecture, 2014. ↩
L. Bulteau, V. Jugé, S. Vialette. On shuffled-square-free words, 2023. ↩
G. Guégan, P. Ochem. A Short Proof that Shuffle Squares are 7-Avoidable, 2016. ↩
L. Bulteau, S. Vialette. Recognizing Binary Shuffle Squares is NP-Hard, 2019 (only available behind [a paywall}(https://www.sciencedirect.com/science/article/pii/S0304397519300258)). Look for "Exhaustive enumeration shows..." ↩
L. Segoufin, C. Sirangelo. Constant-Memory Validation of Streaming XML Documents Against DTDs, 2007. ↩
T. Green, G. Karvounarakis, V. Tannen. Provenance Semirings, 2007. ↩↩↩
D. Deutch, T. Milo, S. Roy, V. Tannen. Circuits for Datalog Provenance, 2014. ↩
B. Kimelfeld, P. Senellart. Probabilistic XML: Models and Complexity, 2013. ↩
E. Kharlamov, W. Nutt, P. Senellart. Updating Probabilistic XML, 2010. See in particular Figure 3. ↩
P. Senellart. Understanding the Hidden Web. PhD thesis, 2007. See Section 2.6.2. ↩
P. Senellart. Understanding the Hidden Web. PhD thesis, 2007. See Section 2.4. ↩
A. López-Ortiz, C.-G. Quimper. A Fast Algorithm for Multi-Machine Scheduling Problems with Jobs of Equal Processing Times, 2011. ↩
M.R. Garey, D.S. Johnson, B.B. Simons, R.E. Tarjan. Scheduling Unit–Time Tasks with Arbitrary Release Times and Deadlines, 1981 (only available behind a paywall). ↩
J. Lawrence. Polytope volume computation, 1991. See pages 11 and 12. ↩