Constraint Programming and Database Query Languages
Paris C Kanellakis ? and Dina Q Goldin ?
Brown University, Providence, RI 02192, USA
Abstract. The declarative programming paradigms used in constraint languages can lead to powerful extensions of Codd's relational data model. The development of constraint database query languages from logical database query languages has many similarities with the development of constraint logic programming from logic programming, but with the additional requirements of data e cient, setatatime, and bottomup evaluation. In this overview of constraint query languages (CQLs) we rst present the framework of 41]. The principal idea is that: \the ktuple (or record) data type can be generalized by a conjunction of quanti erfree constraints over k variables". The generalization must preserve various language properties of the relational data model, e.g., the calculus/algebra equivalence, and have time complexity polynomial in the size of the data. We next present an algebra for dense order constraints that is simpler to evaluate than the calculus described in 41], and we sharpen some of the related data complexity bounds. We note that CQLs are applicable to spatial databases. This is because these languages have \spatial point set" as the semantics of their record data type and because existing multidimensional searching data structures can support I/O e cient access to sets of records. Finally, we observe that CQLs can be augmented with complex object data types, aggregate operations, and nullvalues, just like the relational data model.
1 From Constraint Programming to Database Querying
1.1 Declarative Programming with Constraints
Constraint programming paradigms are inherently declarative, since they implicitly describe computations by specifying how these computations are constrained. Programming with constraints as primitives (or constraint programming) is appealing because constraints are the normal language of discourse for many highlevel applications. Pioneering work in constraint programming goes back to the early 1960's, e.g., Sutherland's SKETCHPAD 67]. The theme has been investigated since the 1970's, e.g., in arti cial intelligence 31, 54, 55, 66], in graphicalinterfaces 10], and in logic programming languages 37, 22, 27]. The relevant literature and contributions are too large to attempt a survey. Instead we will limit our exposition to recent applications in databases.
? Research was supported by ONR grant N0001491J4052 ARPA Order No. 8225.
Perhaps one of the most important advances in constraint programming in the 1980's has been the development of Constraint Logic Programming (CLP) as a generalpurpose framework for computations, e.g., in CLP(<) 37], in Prolog III 22], and in CHIP 27, 72]. For more information on CLP see the surveys in 21, 51], the proceedings of logic programming conferences and of recent workshops on the subject, e.g., 42]. In particular, the advances in CLP are very relevant for database applications (which brings us to the subject of this paper). One reason is the many connections between databases and logic programming. Starting with the Japanese 5th Generation Project, these connections were explored in depth during the last decade. The declarative style of database query languages is an important aspect of database systems, that has been at the core of the relational data model since Codd's pioneering work 20]. Indeed, having such a language for adhoc database querying is a requirement in today's commercial relational technology. Querying in Codd's relational calculus is a form of limited (but very successful) constraint programming. It is rather surprising that other advances in constraint programming have not yet in uenced database query language design. However, we believe that this will soon change and there will be (in the 1990's) integration of constraint programming and databases. One intuitive reason for the success of CLP is as follows. A strength of a typical logic programming language (e.g., Prolog) is its topdown, depth rst search strategy. The operation of rstorder term uni cation, at the forefront of this search, is a special form of e cient constraint solving. Additional constraint solving increases the depth of the search and, thus, the e ectiveness of the approach. Unfortunately, the bottomup and setatatime style of evaluation emphasized in relational databases, and more recently in knowledge bases, seems to contradict the topdown, depth rst intuition behind CLP. The main contribution of the Constraint Query Language (CQL) framework in 41] was to demonstrate that it is possible to bridge the gap between: bottomup, e cient, declarative database programming and e cient constraint solving. One key intuition came from CLP: a conjunction of quanti erfree constraints over k variables, where k depends on the database schema and not the instance, is the correct generalization of the ktuple or record or ground fact. The technical tools for this integration were: data complexity 17, 73] from database theory, and quanti er elimination methods from mathematical logic 28, 29, 68, 23, 8, 48, 60]. More speci cally, nite relations are generalized to nitely representable relations and appropriate algebras for their data manipulation can be developed in this framework. In many cases, the algebras have polynomial time data complexity. This use of data complexity (a common tool for studying expressibility in nite model theory) distinguishes the CQL framework from arbitrary (and inherently exponential) theorem proving 30, 9, 15].
Example 1. Let us illustrate database querying and introduce some of its no
tation. The meaning and use is (or rather should be) intuitively obvious. For formal de nitions we refer the reader to 40, 69]. To see that constraints should be part of any database query language consider an example from 41].
f is amount spent for food, r for rent, m for miscellaneous, s for transfer to savings, w for wages, and i for interest. The intended output of this query is the list of userids whose checkbooks balance. As is typical in data processing applications, we assume that the input relations are much larger than the query program and they might reside on secondary storage. In nonrecursive Datalog notation and using a single linear equation constraint we express the following \balanced checkbook" query. Balanced(x) : Expenses(x; f; r; m); Savings(x; s); Income(x; w; i); C C (f + r + m + s = w + i) In relational technology even a little arithmetic, like the linear equation of this example, is outside the data model and requires special adhoc processing. Linear equations, although quite simple theoretically, can be interesting from a practical point of view. Also, as illustrated in 41], the simple case of linear equations can be related to the relational database theory theme of query homomorphisms. 2 One point of the above example is that constraints integrate naturally in the core database query syntax (called tableaux in the relational databases of the 1970's and nonrecursive Datalog in the knowledge bases of the 1980's). Another point is that this constraint program has a natural analog in constraint maintenance. We can think of a scenario where the checkbook should always balance, and any update to the database would have to preserve the constraint C. Constraint Programming vs Maintenance: In databases the term constraints is most often used in connection with \integrity constraints" and with their maintenance. Note the di erence between constraint maintenance, where an update might trigger some additional computation (like in spreadsheets) and database retrieval, where a new relation is de ned based on constraints. Constraint programming is a highlevel programming style, which is often supported by integrity constraint maintenance algorithms. 2 The CQL framework of 41] provided a uni ed view of some previous database research: for example, on the power of constraints for the implicit speci cation of temporal data 19], for extending relational algebra 33, 46], for magic set evaluation 58], and for logic program analysis 13]. CQLs also extend the approach to safe queries of 4, 35, 44, 58]. The example CQLs of 41] are applicable to spatial databases, where dense order is present. Temporal databases require the development of analogous frameworks for the theory of discrete order with constants see 39]. Complete vs Partial Information: The key concept in CQL, illustrated in the next subsection is that constraints describe pointsets, such that all their points are in the database. With the appropriate constraint theory these pointsets are accurate (and perhaps the most intuitive) representations of spatial objects. For e ciency reasons we require that these constraints be quanti erfree, just like the CLP axioms are quanti erfree. The 41] framework is thus one of complete information.
This is a query program with Expenses, Savings and Income input relations, Balanced output relation, and a single linear equation constraint: x is userid,
The problem of managing partial information in databases has received a fair amount of attention, see 40] for pointers to the database literature. For example, formulas with variables called nullvalues can be thought of as formulas representing many possible states (of which one is true). They would correspond to constraints that have existential quanti ers. In this sense, constraint programming has already been applied extensively to incomplete information databases. However, it is well known, that the existential quanti cation results in a number of di culties with the e cient processing of nullvalues and partial information in general. It is easy to see that, just like the relational data model, constraint frameworks can be augmented with partial information constructs, such as nullvalues, but at the expense of data complexity. Constraint logic programming paradigms are currently attracting a great deal of attention in languages for operations research applications 72] and have also impacted the eld of concurrent programming language design 64]. The use of constraints for operations research and for concurrency is sometimes semantically di erent from their use here. For example: Constraints can be used to represent the many possible states (of which one is true) of a set of concurrent processes. Each individual concurrent process maintains and manipulates constraints that describe the partial information it has about the state of all processes. These semantic di erences are due to the use of complete vs partial information. 2 Manipulation of spatial data is an important application area (e.g., spatial or geographic databases) that requires both relational query language techniques and arithmetic calculations. Indexes for range searching and modeling of complex structures have been used to bridge the gap between declarative accessing of large volumes of spatial data and performing common computational geometry tasks. The resulting extended relational systems are somewhat adhoc. Objectoriented database technology has also been applied to spatial databases with mixed success. Although, objectoriented databases emphasize rich and exible data types, they do not incorporate the geometry of point sets in their data models. This geometry is implicit in constraints and a particular strength of CQLs. To illustrate our goals for integration of application, language paradigm, and implementation, let us revisit the canonical successful case of integration in the relational data model. In this data model, an important application area (data processing) is described in a declarative style (relational calculus) so that it can be automatically and e ciently translated into procedural style (relational algebra). Program evaluation is bottomup and setatatime as opposed to topdown and tupleatatime, because the applications involve massive amounts of structured data. This evaluation may be optimized, e.g., via algebraic transformations, selection propagation etc. It may be performed incore in polynomial time, because of the low complexity of the calculations expressed. Most importantly, it may be implemented e ciently with large amounts of data in secondary storage via indexing and hashing.
1.2 Spatial Database Applications
r
(a2 ; d2 )
r
r
(c2 ; d2 )
r
(a3 ; d3 ) (a3 ; b3 )
r
(c3 ; d3 ) (c3 ; b3 )
r
(a1 ; d1 ) (a2 ; b2 )
r
(c1 ; d1 )
r r
r
(c2 ; b2 )
r
(a4 ; d4 ) (a4 ; b4 )
r
(c4; d4 ) (c4 ; b4 )
r
(a1 ; b1 )
r
(c1; b1 )
r
r
Fig.1. Rectangle intersection
We believe that by generalizing relational formalismsto constraint formalisms it is, in principle, possible to generalize all the key features of the relational data model to spatial data models. (1) The language framework preserves the declarative style and the e ciency of relational database languages. (2) The possible applications of constraint databases include both data processing and numerical processing of spatial data. (3) The implementation technology of spatial access methods (see 62, 63]) naturally matches the new formalism.
Example 2. Let us try to illustrate these points (13) with an example. A very common task from computational geometry and spatial databases is the problem of computing all rectangle intersections 57, 63]. The database consists of a set of rectangles in the plane (the four rectangles of Figure 1) and we want to compute all pairs of distinct intersecting rectangles. Note that the theory of constraints used in this simple, but very common, example is the theory of dense order with constants. This query is expressible in a relational data model that has a interpreted predicate. One possibility is to store the data in a 5ary relation named R. This relation will contain tuples of the form (n; a; b; c; d), and such a tuple will mean that n is the name of the rectangle with corners at (a; b), (a; d), (c; b) and (c; d). We can express the intersection query in relational calculus or rstorder logic over nite structures as follows:
f(n1; n2)jn1 6= n2 ^ (9a1 ; a2; b1; b2; c1; c2; d1; d2) (R(n1; a1; b1; c1; d1) ^ R(n2 ; a2; b2; c2; d2) ^(9x; y 2 fa1 ; a2; b1; b2; c1; c2; d1; d2g) (a1 x c1 ^ b1 y d1 ^ a2 x c2 ^ b2 y d2 ))g
To see that this query expresses rectangle intersection note the following: the two rectangles n1 and n2 share a point if and only if they share a point whose coordinates belong to the set fa1; a2; b1; b2; c1; c2; d1; d2g. This can be shown by exhaustively examining all possible intersecting con gurations. One could eliminate the (9x; y) quanti cation altogether and replace it by a boolean combination of atomic formulas, involving the various cases of intersecting rectangles. The above query program is particular to rectangles and does not work for triangles or for interiors of rectangles. Recall that, in the relational data model quanti cation is over constants that appear in the database. By contrast, if we use generalized relations the query can be expressed very simply (without case analysis) and applies to more general shapes. Let RC(z; x; y) be a ternary relation. Consider a CQL (to be de ned more formally in the next subsection) where we interpret RC(z; x; y) to mean that (x; y) is a point in the rectangle with name z. The rectangle that was stored above by (n; a; b; c; d), would now be stored as the generalized tuple (z = n)^(a x c) ^ (b y d). The meaning of this generalized tuple is an in nite set of points. The set of all intersecting rectangles can now be expressed as: Intersects(n1 ; n2) : n1 6= n2 ; RC(n1; x; y); RC(n2; x; y) The simplicity of this program is due to the ability in CQL to describe and name pointsets using constraints. The same program can be used for intersecting triangles. 2 Of course the ease of expression in the above example is there because the (implicit) existential quanti cation of x; y in the above intersection rule is over an in nite domain. But, despite this, there are e cient evaluation techniques. Example 3. In the above example we used relational calculus notation and nonrecursive Datalog notation. Let us continue with a calculation of the pairs of rectangles that are reachable from each other in recursive Datalog notation. Note that again the x; y are quanti ed over an in nite domain. Reachable(n1 ; n2) : Reachable(n1 ; n3); Reachable(n3 ; n2) Reachable(n1 ; n2) : RC(n1 ; x; y); RC(n2; x; y) As shown in 41] recursive Datalog with dense order is evaluable in polynomial time. The algorithm used there was a bottomup version of CLP(<). In general, more complex constraints and recursion lead to languages that can compute all partial recursive functions. However, one could limit recursion to nite domains as in this example, where recursion happens to be over the nite set of rectangle names. 2 The above examples may be complemented with many other computational geometry tasks, such as computing convex hulls and Voronoi diagrams 57]. In these examples the typical computational geometry input of \N arbitrary
points" becomes an input database of size N and the task speci cation becomes a xed size CQL program. Most natural computational geometry tasks are expressible declaratively using CQLs that combine rstorder logic with simple arithmetic constraints. They can then be evaluated in a procedural fashion using the generalpurpose evaluation method that is a CQL intrepreter. Special computational geometry algorithms would be used by a CQL compiler as optimizations additional to this generalpurpose evaluation method. A very important optimization involves indexing particular variables in the generalized tuples. In the above examples such indexing would correspond to I/O e cient processing of the projections of the rectangles on the x and y dimensions. More speci cally, as shown in (the journal version of) 41], if in a CQL the projection of any generalized tuple on x is an interval, then the problem of 1dimensional searching on x in the constraint data model is a special case of 2dimensional searching in the relational data model. We will treat this optimization in detail in Section 4 of this paper because we believe that it is the most practical one for spatial databases. Fortunately, current spatial database access methods are applicable to indexing in the CQL framework. Let us now describe this framework as presented in 41].
1.3 The CQL Framework
(1) A generalized ktuple is a quanti erfree conjunction of constraints on k variables, which range over a set D. There are many kinds of generalized tuples depending on the kind of constraints used. In all cases equality constraints among individual variables and constants are allowed. In the relational database model R(3; 4) is a tuple of arity 2 or ground fact. It is a single point in 2dimensional space representable also as R(x; y) with x = 3 and y = 4, where x; y range over some nite set. R(x; y) with (x = y ^ x < 2) is a generalized tuple of arity 2 and so is R(x; y) with x + y = 2:5, where x; y range over the rational or the real numbers. Hence, a generalized tuple of arity k is a nite representation of a possibly in nite set of tuples of arity k (or points in kdimensional space Dk ). (2) A generalized relation of arity k is a nite set of generalized ktuples, with each ktuple over the same variables. It is a rstorder formula in disjunctive normal form (DNF) of constraints, which uses at most k variables ranging over set D. Each generalized relation describes a possibly in nite set of arity k tuples (or points in kdimensional space Dk ). A generalized database is a nite set of generalized relations. (3) The syntax of a CQL is the union of an existing database query language and a decidable logical theory. For example: Relational calculus 20] + the theory of real closed elds 68, 23, 8, 48, 60]; Relational calculus 20] + the theory of real (or Presburger) addition 29, 30, 9, 15]; Relational calculus 20] + the theory of discrete order with constants. In ationary Datalog 3, 32, 47] + the theory of dense order with constants 28]; In ationary Datalog + the theory of equality on an in nite domain with constants.
: :
(4) The semantics of a CQL is based on that of the decidable logical theory, by interpreting database atoms as shorthands for formulas of the theory. Let
= (x1 ; : : :; xm ) be a CQL query program using free variables x1; : : :; xm . Let predicate symbols R1, : : :, Rn in name the input generalized relations and let r1 ; : : :; rn be corresponding input generalized relations. We interpret the program in the context of such an input. Let r1=R1; : : :; rn=Rn] be the formula of the theory that is obtained by replacing in each database atom Ri(z1 ; : : :; zk ) by the DNF formula for input generalized relation ri , with its variables appropriately renamed to z1 , : : :, zk . The output is the possibly in nite set of points in mdimensional space Dm , such that instantiating the free variables x1 ; : : :; xm of formula r1=R1; : : :; rn=Rn] to any one of these points makes the formula true. (Wlog an occurrence of a database atom in is of the form Ri(z1 ; : : :; zk ) 1 i n, where Ri is of arity k and z1, : : :, zk are distinct variables; this is because in our framework we can always use equality constraints). (5) For each input, the queries must be evaluable in closed form and bottomup. By closed form we mean that the output of any query program applied to any input generalized relations must be a generalized relation. The analogue for the relational model is that relations are nite structures, and queries are supposed to preserve this niteness. This is a requirement that creates various \safety" problems in relational databases 20, 69]. The precise analogue in relational databases is the notion of weak safety of 4]. Evaluation of a query corresponds to an instance of a decision problem. Quanti er elimination procedures realize the goal of closed form and use induction on the structure of formulas, which leads to bottomup evaluation. (6) For each input, the queries must be evaluable e ciently in the input size, i.e., with at least PTIME data complexity. Database atomic formulas indicate, in the declarative query language itself, the parts that can grow asymptotically versus the parts that are constantsize. By xing the program size and letting the database grow, one can prove that the evaluation can be performed in PTIME or in NC or in LOGSPACE, depending on the constraints considered (for the various complexity classes see 38]).
1.4 Languages, e ciency, e ciency,
:::
In the remainder of this paper we explore CQLs with an emphasis on quantifying language e ciency. In Section 2, we present an algebra for dense order constraints (due to Goldin) which is simpler to evaluate than the calculus described in 41]. In Section 3, we sharpen some of the related data complexity bounds and comment on the current stateoftheart of data complexity analysis. We also observe that complex object data types can easily be added to the CQL framework. In Section 4, we describe how the CQL framework can be supported by existing multidimensional searching data structures. We close, in Section 5, with some open questions and some recent developments.
2 An Algebra for Dense Order Constraints
Dense order inequality constraints are all formulas of the form x y and x c, where x; y are variables, c is a constant, and is one of =, , < (or its negation 6=, >, ). We assume that these constants are interpreted over a countably in nite set D with a binary relation which is a dense order (e.g., we can take D to be the rationals). Constants, =, , and < are interpreted respectively as elements, equality, the dense order, and the irre exive dense order of D. For the rstorder theory of dense order see 28] and for its data complexity (with and without recursion) see 41]. In this section we present a rstorder algebra for dense order constraints, which is equivalent to the calculus of 41] and is simpler. We rst assume that the generalized tuples have a speci c form, beyond being conjunctions of constraints, this is De nition 1. It is easy to see that any conjuction of constraints can be transformed into a union of such generalized tuples. In our algebra, we use generalized tuples that are \tighest" or canonical (Subsection 2.1). We show that this algebra has the desirable closure property (Subsection 2.2).
sists of two sequences l = (l1 ; : : :ln ) and u = (u1; : : :; un), and an n by n matrix = ( 1;1; 1;2; : : :; n;n), where: (a) the li 's are constants in D f?1g; (b) the ui's are constants in D f+1g; (c) for all i, li ui; (d) for all i; j, i;j is one of =, <, >, ?. W.l.o.g., we can assume that the entries of along the diagonal are =, and is antisymmetric (e.g., if x < y then y > x). 2 De nition2. Each generalized ntuple over variables X = fx1; : : :; xng de nes a formula with free variables X, and a set of points in Dn . This pointset is the semantics of the generalized tuple. 1. For a generalized ntuple t = (l; u; ), the formula F(t), with n free variables fx1; : : :; xng, is the conjunction of n2 + n constraints i and i;j : (a) for all tuples (li ; ui); i is (xi = li ) if li = ui , and (li < xi < ui ) otherwise; (b) for all entries i;j ; i;j is (xi i;j xj ) if i;j is not ?, and TRUE otherwise. 2. For a generalized ntuple t = (l; u; ), the corresponding set of points P(t) is the set of all points x = (x1 ; : : :; xn) satisfying F(t). 2 Example 4. The following is a tuple t on variables (A; B; C) whose formula is (?1 < A < 5 ^ 1 < B < 7 ^ C = 6 ^ A > B). 2
2.1 Generalized Tuples and Relations in Canonical Form De nition1. A generalized ntuple t = (l; u; ) over variables x , : : :, xn con1
In tabular form we represent generalized tuples as follows: t A BC l ?1 1 6 u 5 7 6 = > ? < = ? ? ?= We can de ne a partial order on all tuples that share the same set of variables: De nition3. Let t and t be generalized tuples over variables X, with the 's and 's as in De nition 2. t is tighter than t if: 8i; i is tighter than i and 8i; j; i;j is tighter than i;j , where constraint is tighter than constraint i the universal closure of ( ) ) is true in D. 2 If t is tighter than t, then the universal closure of (F(t ) ) F(t)) will also be true, so P(t ) P(t). We can also de ne equivalence classes over generalized tuples w.r.t. their sets of points: De nition4. Let t1 ; t2 be generalized tuples over variables X. Then, t1 and t2 belong to the same equivalence class E i P(t1 ) = P(t2 ). Proposition5. For each class E (except for the class of tuples whose point set is empty) there is a unique member tc such that 8t 2 E , tc is tighter than t. We will refer to this tuple tc as the canonical representation of all members of E. It can be constructed e ciently as follows: Proposition6. Let t be a generalized tuple such that F(t) is satis able. Then, t is a canonical representation of t if: (a) 8i; i is the tightest constraint on constants in t such that F(t) ) i ; (b) 8i; j; i;j is the tightest constraint such that F(t) ) i;j . 2 It is easy to show that the canonical representation for any generalized ntuple can be found e ciently: Proposition7. Given a generalized ntuple t, it is possible in time T = O(nc) for some constant c to determine whether t is satis able, and if so, to nd the canonical representation of t. We will call a generalized tuple t canonical if P(t) is not empty and t is its own canonical representation. Example 5. The tuple t in the previous example is not canonical because:
0 0 0 0 0 0 0 0 0 0 0 0 0 0
(A < 5 ^ A > B) ) B < 5, but l2 = 7. The following tuple t is a canonical representation of t:
0
t ABC l 116 u 556 =>< <=< >>=
0 0 0 0
F(t ) = (1 < A < 5 ^ 1 < B < 5 ^ C = 6 ^ A > B ^ A < C ^ B < C). 2 Having de ned canonical tuples, we are ready to de ne generalized relations and generalized databases consisting of such tuples.
0
De nition8. 1. A generalized relation over variables X = (x ; : : :; xn) is a
1
nite set of canonical generalized tuples over X. There exists a natural mapping from generalized relations over X to DNF formulas over X, and a corresponding mapping from generalized relations to ( nite or in nite) relations over Dn : (r) =
_
2. A generalized database is a nite set of generalized relations. 2 To conclude this subsection, we would like to compare canonical tuples with the di erent tuple construct used in 41], called rcon gurations. There is a de nite similarity in the syntax of canonical tuples here and rcon gurations, and in fact: For a given nite subset D of D, an rcon guration is any canonical generalized tuple t = (l; u; ) where (a) 8i, there does not exist c 2 D such that li < c < ui, and (b) 8i; j; i;j is one of f<; >; =g. tuple is canonical. Therefore an advantage of the canonical tuples here over the rcon gurations of 41] is that inserts are more e cient. Furthermore, for an arbitrary conjunction of constraints over free variables X, the size of the smallest set of canonical tuples r over X such that W F(t) is only dependent on the length of , whereas it would grow to be at t r least linear in the size of D if r was restricted to rcon gurations. Nevertheless, if we choose to restrict generalized relations to sets of rcon gurations, it is important to note that all of the de nitions and lemmas in the next subsection remain valid.
2
t2r
F(t); (r) =
t2r
P(t)
not constant in the size of D , whereas it is constant for checking whether a
The time of checking whether a generalized tuple is an rcon guration is
2.2 The Algebraic Operations
We can now introduce the syntax and the semantics of an algebra over generalized relations. As in Codd's relational algebra 20, 40], we will de ne basic algebraic operations projection ( ), selection (&), natural join (1), union ( ), di erence (?), and renaming (%), which map one or more generalized relations to a new generalized relation. Algebraic queries over generalized databases are composed of these basic operations. For each operation OP on generalized relations, we will claim that the resulting generalized relation has the following closure property: if r = OP(r1 ; : : :; rn ), then (r ) = OP( (r 1 ); : : :; (rn )), where OP is the operation of the same name in the relational algebra in 40], and is de ned in De nition 8.
0 0
Remark: In our de nitions of operators, we use the notation (r) for the vari
ables of a generalized relation r. To prove the closure theorem of this section (see Figure below) we compose the closure properties of operations.
r1 r 2 : : : r n
QUERY(r ; r ; : : : ; rn )
1 2

r n+1
r1 r2 : : : r n
?
QUERY(r1 ; r2 ; : : : ; rn )

rn+1
?
Fig. 2. The Closure Condition
Projection. If r is a generalized relation over variables X, and Z is a subset of X, then Z (r) is the projection of r on Z: (1) ( Z (r)) = Z, (2) Z (r) = f(t # Z) : t 2 rg, where t = (t # Z) is the restriction of t to variables Z, i.e., the variables in t have the same bounds and the same constraints as the corresponding variables in t, and all other bounds and constraints are dropped.
0 0
Lemma 9. Let r = Z (r). Then, (r ) = Z ( (r)). Example 6. Let r = ft ; t g be a generalized relation over variables (A; B; C). We compute A;C (r) = ft ; t g (see gure on Projection). 2
0 0
1
2
(
)
0
0
1
2
t1 A B C l1 1 1 6 u1 5 5 6 1 = > < <=< >>=
t2 A B C l2 4 1 3 u2 7 1 1 2 = > ? <=< ? >=
(
A;C )

t1 A C l1 1 6 u1 5 6 1 = < >=
0 0 0 0
t2 A C l2 4 3 u2 7 1
0 0 0 0
2
= ? ? =
Fig.3. Projection
Selection. If r is a generalized relation over variables X, Z is a subset of X, and t0 is a generalized tuple over variables Z, then &F (t0 ) (r) is the selection on r by F(t0): (1) (&F (t0 ) (r)) = X, (2) &F (t0 ) (r) = ft : for some t 2 r; t is the common tuple of (t0 " X) and t g, where (t0 " X) and the notion of common tuple are de ned below. De nition10. Let t be a canonical generalized tuple over variables X, and Z be a superset of X. Then, t is the extension of t to variables Z, written (t " Z) if t is the most general tuple over variables Z such that t = (t # X). In particular, (a) 8i such that zi 62 X; i = (?1 < xi < +1); (b) 8i; j such that either zi 62 X or zj 62 X, i;j = TRUE. 2
0 0 0 0 0 0 0
Then, t is the common tuple of t1 and t2 if t is a canonical tuple over variables X such that F(t) (F(t1 ) ^ F(t2)), or alternately, P(t) = P(t1 ) \ P(t2). In particular, t is the canonical representation of t , where: 1 2 8i; i ( i1 ^ i2 ), and 8i; j; i;j ( i;j ^ i;j ). If some i or i;j does not exist, then P(t1 ) \ P(t2) is empty and there is no common tuple. 2 Lemma 12. Let r = &F (t0 )(r). Then, (r ) = &F (t0) ( (r)). Example 7. Let r = ft1; t2g be a generalized relation over variables (A; B; C), and let t0 be a canonical tuple over variables (B; C), where F(t0 ) = (2 < B < 4 ^ 3 < C < 7). We compute &F (t0 ) (r) = ft1 g (see gure on Selection). 2
0 0 0 0 0 0 0 0
De nition11. Let t and t be canonical generalized tuples over variables X.
1 2
t1 A B C l1 1 1 6 u1 5 5 6 1 = > < <=< >>=
t2 A B C l2 4 1 3 u2 7 1 1 2 = > ? <=< ? >=
&F (t0)
t1 A B C l1 1 2 6 u1 5 4 6 1 = > < <=< >>=
0 0 0 0
Fig. 4. Selection
Natural Join. If r1 is a generalized relation over variables X, r2 is a generalized relation over variables Y , and Z = X Y , then r1 1r2 is the natural join of r1 and r2 :
(1) (r1 1r2) = Z, (2) r1 1r2 = ft : for some t1 2 r1 ; t2 2 r2; t is the common tuple of (t1 " Z) and (t2 " Z)g.
Lemma 13. Let r = r 1r . Then, (r ) = (r )1 (r ).
0
1
2
0
1
2
Example 8. Let r1 = ft1 ; t2 g be a generalized relation over variables (A; B; C), and r2 = ft3 ; t4 g be a generalized relation over variables (B; D). We compute r1 1r2 (see gure on Join). 2
t1 A B C l1 1 1 6 u1 5 5 6 1 = > < <=< >>=
t2 A B C l2 4 1 3 u2 7 1 1 2 = > ? <=< ? >=
t3 B D l3 1 2 u3 4 6
3
= ? ? =
t4 B D l4 0 7 u4 1 7 4 = > <=
?
t1 A B C D l1 1 1 6 2 u1 5 4 6 6 1 = > < ? <=< ? >>=> ? ? <=
0 0 0 0
1
t2 A B C D l2 4 1 3 2 u2 7 1 1 6 2 = > ? > <=<< ? >= ? <> ? =
0 0 0 0
Fig. 5. Join
Union. If r1 and r2 are generalized relation over variables X, then r1 r2 is the union of r1 and r2 : (1) (r1 r2) = X, (2) r1 r2 = ft : t 2 r1 or t 2 r2g. Lemma 14. Let r = r1 r2 . Then, (r ) = (r1) (r2 ). Difference. If r1 and r2 are generalized relations over variables X, then r1 ?r2 is the di erence of r1 and r2. The de nition of r1?r2 is recursive, as follows: (1) (r1 ?r2 ) = X, (2) If size of r1 > 1; r1 ?r2 = t r 1 (ftg?r2), (3) If size of r1 = 0; r1 ?r2 = ;, (4) If size of r1 = 1 (i.e., r1 = ftg), r1 ?r2 = (for some t 2 r2 , (tuple di erence of t and t ) ? (r2 ? ft g)), where tuple di erence is a set of tuples de ned below. De nition15. Let t and t be canonical generalized tuples over variables X. 1. If t = t , tuple di erence of t and t is ;. 2. If 9i such that i ^ i is not satis able, or 9i; j such that i;j ^ i;j is not satis able, tuple di erence of t and t is ftg. 3. Let be a constraint in F(t), either or , such that 6= , where is the corresponding constraint in F(t ). 4. Let _( 1 ; : : :; k ), where 1 ( ^ ), all i 's are disjoint, and k 1 is the smallest possible. 5. Let (t1 ; : : :; tk ) be obtained by replacing in t by each i and putting the result into canonical form. Let t1 be obtained by replacing in t by 1 , and putting the result into canonical form. 6. Then, tuple di erence of t and t is de ned recursively as ft2 ; : : :; tk g (tuple di erence of t1 and t1 ). Lemma 16. Let r = r1 ?r2 . Then, (r ) = (r 1) ? (r2 ). Renaming. If r is a generalized relation over variables X = (x1; : : :; xn), and y is a variable not in X, then %x y (r) is the renaming in r of xi to y: (1) (%x y (r)) = (X ? fxig) fyg, (2) %x y (r) = r. Lemma 17. Let r = %x y (r). Then, (r ) = %x y ( (r)). Let us now consider the rstorder calculus CQL of 41] for dense order. This is relational calculus combined with the decidable theory of dense order with constants. Every query can be expressed semantically as a relational calculus or algebra query on unrestricted ( nite or in nite) relations over Dn . The above de nitions and lemmas compose to prove the following closure property:
0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ij
ij
ij
0
ij
0
ij
Theorem 18. For every relational algebra QUERY on unrestricted relations over Dn , that are nitely representable, the constraint algebra QUERY, that uses OPs instead of OPs, has the property that: (QUERY(r ; : : :; rn)) = QUERY ( (r ); : : :; (rn ))
1 1
.
3 PTIME Expressibility
The notion of data complexity arose from a study of the expressibility of computations over nite structures. It corresponds nicely to the intuition that the database size is much larger than the query program size. It seems reasonable to limit computations to e cient ones, i.e., PTIME manipulations of the data. There are characterizations of PTIME 36, 73] using Datalog on nite, ordered structures. So in terms of expressibility , Datalog would su ce. However, various other CQLs allow for more \natural" and \ exible" expression of queries.
: :
A query Q has data complexity in PTIME (resp. LOGSPACE, NC) if there is a TM (resp. TM, PRAM) which given input generalized relations d produces some generalized relation representing the output of Q(d) and uses polynomial time (resp. logarithmic space on the work tape, polynomial number of processors running in polylogarithmic parallel time). See 38] for de nitions of TMs and PRAMs. We assume a standard binary encoding of generalized relations. The CQL framework is interesting because many combinations of database query languages and decidable theories have PTIME data complexity. >From Codd's original work 20] it follows that: relational calculus on nite sets can be evaluated bottomup in closed form and LOGSPACE data complexity. The LOGSPACE data complexity analysis is from 17]. In the following table we list some theories with upper bounds on their data complexities.
3.1 Data Complexity
In 41] it was shown that relational calculus (In ationary Datalog ) with dense order constraints has LOGSPACE (PTIME) data complexity. Also, by a slight modi cation of 36, 73] In ationary Datalog with dense linear order expresses exactly PTIME (this is the = in the table above). It is possible to strengthen the LOGSPACE bounds to uniform AC0 , using randomaccess Alternating TMs 5, 16] with a xed number of alternations and a logarithmic time bound. The same applies to relational calculus (In ationary Datalog ) with equality constraints over an in nite domain. Let us now look at a fairly rich constraint theory. Relational calculus with real polynomial inequality constraints can be evaluated bottomup in closed form
: : :
Polynomial Dense Order Equality Relational Calculus NC AC AC Datalog Not closed =PTIME PTIME
0 0
:
and PTIME data complexity. This is a direct consequence of 23]. More recent results 8, 48, 60] place the data complexity in NC. The precise expressibility (in terms of data complexity within NC) for real polynomial constraints is still open. Clarifying the relationship with the data complexity of weaker theories involving addition and multiplication by constants is a topic of interest 25]. Unfortunately, arithmetic (even just integer order) with recursion easily leads to computability of all partial recursive functions. One implication is that there are qualitatively multiple ways of encoding computations in CLP (e.g., 26]. Su cient conditions that allow closure under recursion are an interesting topic 61]. Note that, nonclosure is very easy to show: Let be the query program that consists of the rules S(x; y) : R(x; y) and S(x; y) : R(x; z); S(z; y) (i.e., S is the transitive closure of R). If the input r for R consists of the generalized relation y = 2 x, then the result of the query is the set of all points (x; y) that satisfy y = 2i x for some i > 0. This set is not nitely representable in the language of polynomial inequality constraints.
3.2 Language Extensions
The relational data model has been extended with a number of useful features, e.g., aggregates and complex objects. Are these extensions possible for CQLs and how complicated are they? First, we would like to observe that the CQL framework allows us (via equality constraints) to embed relational and complex object databases directly in constraint databases. Consider the rectangle intersection example from Section 1. In a CQL we can have both a relation of 5tuples ranging over a nite set and denoting endpoints of rectangles and a genaralized relation of 3tuples ranging over an in nite set of named rectangle points. We can even link the two relations together with an \integrity constraint" that relates every 5tuple with a generalized 3tuple and viceversa. This \naive" solution can be used to add complex objects or aggregates to a CQL by just adding them to the relational data model embedded in the CQL. It is useful because it allows posing queries about the representation itself and not only about the in nite relations represented. There are however some subtle issues that should be researched further. Aggregates: The rst important extension involves aggregate operations (e.g., maximum, average etc), see 45]. Introducing aggregates in a CQL is identical with the relational data model if an attribute ranges over a nite set. Many aggregate primitives have low data complexity 5]. Introducing aggregates in a CQL for attributes ranging over an in nite set is harder, since it typically involves integration, and is a topic of current research 50]. Complex Objects: The second extension involves complex objects. As described in 1, 2] complex objects are built using tuple and niteset data type constructors. Generalized tuples are tuple constructors, but they also have some of the properties of niteset constructors. This subtelty is best illustrated by an example.
Example 9. Consider a convex polygon in 2dimensional space. It can be char
acterized by its vertices or by its facets, which are linear inequalities of the form a x + b y c. In the relational data model extended with complex objects one named convex polygon is a object of type: tuple(D, niteset(tuple(D,D,D))) In a CQL the same named convex polygon is a single generalized tuple, with one equality constraint for the name and linear inequality constraints for the facets.
4 I/O E ciency in Constraint Databases
The language framework of the relational data model does have low data complexity, but does not account for searches that are logarithmic or faster in the sizes of input relations. Without the ability to perform such searches relational databases on secondary storage would have been impractical. I/O e cient (i.e., logarithmic or constant) use of secondary storage is an additional requirement, beyond low data complexity, whose satisfaction greatly contributes to relational technology. Btrees and their variants B+ trees, 7, 24], are examples of important data structures for implementing relational databases. In particular, let each secondary memory access transmit B units of data, let r be a relation with N tuples, and let us have a B+ tree on the attribute x of r. The space used in this case is O(N=B). The following operations de ne (dynamic) 1dimensional searching on relational database attribute x, with the corresponding performance bounds using a B+ tree on x: (i) Find all tuples such that for their x attribute (a1 x a2 ). If the output size is K tuples, then this range searching is in worstcase O(logB N + K=B) secondary memory accesses. If a1 = a2 and x is a key, then this is keybased searching. (ii) Insert or delete a given tuple. These are in worstcase O(logB N) secondary memory accesses. The problem of kdimensional searching on relational database attributes x1 ; : : :; xk generalizes 1dimensional searching to k attributes, with range searching on kdimensional intervals. It is a central problem in spatial databases for which there are many solutions with good secondary memory access performance, e.g., grid les, quadtrees, Rtrees, hBtrees, kdBtrees etc (see the surveys 62, 63]). For generalized databases we can de ne an analogous problem of 1dimensional searching on generalized database attribute x using the operations: (i) Find a
4.1 Indexing Relational Databases
4.2 Indexing Constraint Databases
generalized database that represents all tuples of the input generalized database such that their x attribute satis es (a1 x a2 ). (ii) Insert or delete a given generalized tuple. (Note that, this insertion/deletion is syntactic).
If (a1 x a2 ) is a constraint of our CQL then there is a trivial, but ine cient, solution to the problem of 1dimensional searching on generalized database attribute x. One can add constraint (a1 x a2) to every generalized tuple (i.e., conjunction of constraints) and naively insert or delete generalized tuples in a table. This would involve a linear scan of the generalized relation and introduces a lot of redundancy in the representation. In many cases, the projection of any generalized tuple on x is one interval (a x a ). This is true for Example 2, for our CQL's with dense order, for relational calculus with linear inequalities over the reals, and in general when a generalized tuple represents a convex set. Under such natural assumptions, there is a better solution for 1dimensional searching on generalized database attribute x.
0
associated with a generalized tuple. Each interval (a x a ) in the index is the projection on x of its associated generalized tuple. The two endpoint a; a representation of an interval is a xed length generalized key. { Finding a generalized database, that represents all tuples of the input generalized database such that their x attribute satis es (a1 x a2 ), can be performed by adding constraint (a1 x a2 ) to only those generalized tuples whose generalized keys have a nonempty intersection with it. { Inserting or deleting a given generalized tuple are performed by computing its projection and inserting or deleting intervals from a set of intervals.
0 0
{ A generalized 1dimensional index is a set of intervals, where each interval is
Unlike Btrees that o er worstcase guarantees for their performance, many data structures (like the grid le, Rtree, hBtree, kdBtree, quadtree etc) that have been used to solve the problem of dynamic interval management in secondary storage don't o er optimal worstcase times. These data structures work reasonably well on the average, but with certain simple cases, their performance can deteriorate badly. We would like to o er worstcase guarantees like the Btrees do for this problem.
In summary: generalized 1dimensional indexing is dynamic interval management on secondary storage. It can be implemented with good average I/O e ciency using existing generalpurpose spatial data strutures.
The use of generalized 1dimensional indexes reduces redundancy of representation and transforms 1dimensional searching on generalized database attribute x into the problem of dynamic interval management. This is a wellknown problem with many elegant solutions from computational geometry 57]. It is a special case of 2dimensional searching in relational databases, called 1.5dimensional searching in 53]. For example, the priority search trees of 53] are a linear space data structure with logarithmictime update and search algorithms for incore processing. Grid les, Rtrees, quadtrees etc, have all been used for solving this problem with good secondary memory access performance.
Query Interval
6
2 Data Intervals 4
1 3
Stabbing Query
?
6 ? ? ? ? ? ? ?? ?
s
(x1 ; x2 )
x1
Stabbing Query
x2
Fig. 6. Showing the relation between interval management and 2dimensional range searching
The rst I/O optimal solution for the problem of static interval management in secondary memory was published in 43]. Optimal incore dynamic interval management is one of the basic tools of computational geometry. However, I/O optimal solutions are nontrivial, even for the static case. Let us describe the relationship of interval management on secondary storage and 2dimensional searching. Intervals that intersect a query interval fall into four categories as shown in Figure 6. Categories (1) and (2) can be easily located by sorting all the intervals on their rst endpoint and using a B+ tree to locate all intervals whose rst endpoint lies in the query interval. So the main di culty is handling categories (3) and (4). As shown in the gure, categories (3) and (4) can be located by nding all data intervals which contain the rst endpoint of the query interval. (This is called a stabbing query at that point.) Stabbing queries can be mapped to a special case of 2dimensional range searching. This is done by mapping an interval x1; x2] to the point (x1; x2) in two dimensions X; Y . It is easy to see that all points corresponding to intervals will lie above the line X = Y . A stabbing query now translates into a two sided query. That is, an interval x1; x2] belongs to a stabbing query at x if and only if the corresponding point (x1 ; x2) is inside the box generated by the lines X = 0, X = x, Y = x, and Y = 1. In fact, such two sided queries will always have their corner on the line X = Y . Such queries are called diagonal corner queries in 43]. An optimal solution for answering diagonal corner queries in secondary storage is presented in 43]. 43] presents a semidynamic data structure for this problem called the metablock tree that has optimal query I/O time O(logB N+K=B), uses optimal storage O(N=B). The metablock data structure also allows inserts in O(logB N + (log2 N)=B) amortized I/O time; it is fairly involved and does B not allow data points to be deleted. Spacetime tradeo s for this and other related 2dimensional range searching problems are examined in 59]. In this paper it is shown that many elegant data structures like the priority search tree, segment tree and the interval treeall of which have been used for interval management incore can be im
4.3 Optimal Static Indexing for Constraints
plemented in secondary memory with small space overheads. In particular, two sided queries (a generalization of diagonal queries) can be answered in optimal I/O time O(logB N + K=B), using space O( N log2 log2 B). This solution can B bemade fully dynamic in optimal O(logB N) amortized time.
5 Conclusions and Open Questions
The appeal of constraint programming in the 1970's and early 1980's was tempered by the cost of storing and solving constraints, and by the lack of e cient, robust, generalpurpose implementations. Advances in hardware, graphical user interfaces, and programming language technology (mainly CLP in the 1980's) have brought constraintbased computing closer to the e ciency of more conventional computing formalisms. The combination of constraint programming and database querying is one area where signi cant progress can be made in the 1990's. E cient declarative programming, both in terms of languages and of grahical user interface, can be achieved for an extensive range of applications. The most challenging task is the implementation and testing of declarative and e ciently evaluable CQLs for spatial applications. The results presented here should be viewed as positive arguments for the feasibility of such an e ort. In the context of such a software systems e ort, many interesting theoretical questions remain: (1) Dynamic interval management on secondary storage is the key algorithmic problem for backend support of constraint databases. Existing spatial data structures solve this problem with reasonable e ciency, but there is much room for improvement. While 43] and 59] represent advances in this context, the truly elegant and seemingly di cult problem remains open. Can dynamic interval management on secondary storage be achieved optimally in O(N=B) pages, dynamic query I/O time O(logB N +t=B) and worstcase update time O(logB N)? (2) Indexing is only one possible optimization. How do various optimization methods (such as selection propagation, join ordering and other algebraic transformations, magic sets etc) combine with CQLs? This would involve extending 58]. For some recent research in this direction we refer to 34, 52, 56, 65, 12]. Constraint use for database logic program analysis is also relevant here, e.g., 13, 14, 70, 71]. (3) We have addressed the spatial applications of CQLs. Constraints o er useful approaches to temporal databases as well. For recent developments in temporal databases we refer to 6, 18, 61]. The subject of temporal constraints has also been addressed extensively in the arti cial intelligence literature (which we unfortunately do not have the space or expertise to survey here). For some recent work that connects CQLs to arti cial intelligence approaches see 49]. (4) As we argued in this paper, extensions to the relational model, involving partial information and complex object data types, can also be applied to CQLs. For example, null values can be introduced with existential quanti cation in constraints. Finite complex objects 1, 2] and aggregation 45] over nite sets can coexist with in nite relations. Are there more subtle ways of adding these
extensions to CQLs? For example, adding aggregation operations over in nite sets is motivated by querying about area size and is a challenging problem 50]. (5) Linear constraint databases 11] are among the most applicable CQLs, and will certainly be among the rst to be implemented. Even for this theoretically simple case the expressive power is not fully understood 25]. What are the right linear programming algorithms for these CQLs? Both linear and integer programming with a xed number of variables are relevant combinatorial problems in this context. How can recursion be combined with linear constraints in a safe fashion? (6) The technology of algorithms for logical theories is still rather complex, but much progress has been accomplished in recent years. For example, see 8, 48, 60] for the stateoftheart in real closed elds. What is the precise data complexity of CQLs for the various decidable logical theories? Are there interesting syntactic subsets of these theories that are motivated by constraint databases and for which simpler algorithmic techniques can be used? These would be analogous to projectselectjoin query programs in the relational model. (7) Finally, can constraint query languages be designed in an extensible fashion? Are there objectoriented CQLs? An example use of such extensibility, would be calling computational geometry algorithms from a CQL interpreter.
References
1. S. Abiteboul, C. Beeri. On the Power of Languages for the Manipulation of Complex Objects. INRIA Research Report 846, 1988. 2. S. Abiteboul and P. Kanellakis. Database Theory Column: Query Languages for Complex Object Databases. SIGACT News, 21, pp. 9{18, 1990. 3. S. Abiteboul and V. Vianu. Datalog Extensions for Database Queries and Updates. J. Comput. System Sci., 43 (1991), pp. 62{124. 4. A.K. Aylamazyan, M.M. Gilula, A.P. Stolboushkin, G.F. Schwartz. Reduction of the Relational Model with In nite Domain to the Case of Finite Domains. Proc. USSR Acad. of Science (Doklady), 286(2):308{311, 1986. 5. D.A. Barrington, N. Immerman, H. Straubing. On Uniformity within NC1 . JCSS, 41:274{306,1990. 6. M. Baudinet, M. Niezette, P. Wolper. On the Representation of In nite Temporal Data and Queries. Proc. 10th ACM PODS, 280{290, 1991. 7. R. Bayer, E. McCreight. Organization of Large Ordered Indexes. Acta Informatica, 1:173{189, 1972. 8. M. BenOr, D. Kozen, J. Reif. The Complexity of Elementary Algebra and Geometry. JCSS, 32:251{264, 1986. 9. L. Berman. Precise Bounds for Presburger Arithmetic and the Reals with Addition. Proc. 18th IEEE FOCS, pp. 9599, 1977. 10. A.H. Borning. The Programming Language Aspects of ThingLab, A ConstraintOriented Simulation Laboratory. ACM TOPLAS 3:4:353{387, 1981. 11. A. Brodsky, J. Ja ar, M.J. Maher. Toward Practical Constraint Databases. Proc. 19th VLDB, 322{331, 1993. 12. A. Brodsky, C. Lassez. Separability of Polyhedra and a New Approach to Spatial Storage. in 42].
13. A. Brodsky, Y. Sagiv. Inference of Monotonicity Constraints in Datalog Programs. Proc. 8th ACM PODS, 190{200, 1989. 14. A. Brodsky, Y. Sagiv. Inference of Inequality Constraints in Logic Programs. Proc. 10th ACM PODS, 227{241, 1991. 15. A.R. Bruss, A.R. Meyer. On TimeSpace Classes and their Relation to the Theory of Real Addition. Proc. 10th ACM STOC, pp. 233239, 1978. 16. S.R. Buss. The Formula Value Problem is in ALOGTIME. Proc. 19th ACM STOC, pp. 123131, 1987. 17. A.K. Chandra, D. Harel. Structure and Complexity of Relational Queries. JCSS, 25:1:99{128, 1982. 18. J. Chomicki. Ptime Query Processing in Temporal Deductive Databases. Proc. 9th ACM PODS, 379{391, 1990. 19. J. Chomicki, T. Imielinski. Relational Speci cations of In nite Query Answers. Proc. ACM SIGMOD, 174{183, 1989. 20. E.F. Codd. A Relational Model for Large Shared Data Banks. CACM, 13:6:377{ 387, 1970. 21. J. Cohen. Constraint Logic Programming Languages. CACM, 33:7:52{68, 1990. 22. A. Colmerauer. An Introduction to Prolog III. CACM, 33:7:69{90, 1990. 23. G.E. Collins. Quanti er Elimination for Real Closed Fields by Cylindrical Algebraic Decomposition. Proc. 2nd GI conference on Automata Theory and Languages, LNCS 33, pp. 512532, SpringerVerlag, 1975. 24. D. Comer. The Ubiquitous BTree. Computing Surveys, 11:2:121{137, 1979. 25. S.S. Cosmadakis, G.M. Kuper Expressiveness of FirstOrder Constraint Languages. Manuscript, December 1993. 26. J. Cox, K. McAloon, C. Tretko . Computational Complexity and Constraint Logic Programming. Annals of Math. and AI, 5:163{190, 1992. 27. M. Dincbas, P. Van Hentenryck, H. Simonis, A. Aggoun, T. Graf, F. Berthier. The Constraint Logic Programming Language CHIP. Proc. Fifth Generation Computer Systems, Tokyo Japan, 1988. 28. J. Ferrante, J.R. Geiser. An E cient Decision Procedure for the Theory of Rational Order. Theoretical Computer Science, 4:227{233, 1977. 29. J. Ferrante, C. Racko . A Decision Procedure for the First Order Theory of Real Addition with Order. SICOMP, 4:1:69{76, 1975. 30. M.J. Fischer, M.O. Rabin. SuperExponential Complexity of Presburger Arithmetic. SIAMAMS Proc. volume VII, American Mathematical Society, 1974. 31. E. Freuder. Synthesizing Constraint Expressions. CACM, 21:11, 1978. 32. Y. Gurevich, S. Shelah. FixedPoint Extensions of FirstOrder Logic. Annals of Pure and Applied Logic, 32, 265{280, 1986. 33. M.R. Hansen, B.S. Hansen, P. Lucas, P. van Emde Boas. Integrating Relational Databases and Constraint Languages. Computer Languages, 14:2:63{82, 1989. 34. R. Helm, K. Marriott, M. Odersky. Constraintbased Query Optimization for Spatial Databases. Proc. 10th ACM PODS, 181{191, 1991. 35. R. Hull, J. Su. Domain Independence and the Relational Calculus. Technical Report 88{64, University of Southern California. 36. N. Immerman. Relational Queries Computable in Polynomial Time. Information and Control, 68:86104, 1986. 37. J. Ja ar, J.L. Lassez. Constraint Logic Programming. Proc. 14th ACM POPL, 111{119, 1987. 38. D.S. Johnson. A Catalogue of Complexity Classes. Handbook of Theoretical Computer Science, Vol. A, chapter 2, (J. van Leeuwen editor), NorthHolland, 1990.
39. F. Kabanza, JM. Stevenne, P. Wolper. Handling In nite Temporal Data. Proc. 9th ACM PODS, 392{403, 1990. 40. P.C. Kanellakis. Elements of Relational Database Theory. Handbook of Theoretical Computer Science, Vol. B, chapter 17, (J. van Leeuwen editor), NorthHolland, 1990. 41. P. C. Kanellakis, G. M. Kuper, P. Z. Revesz. Constraint Query Languages. Proc. 9th ACM PODS, 299{313, 1990. Full version available as Brown Univ. Tech. Rep. CS9250. To appear in JCSS. 42. P.C. Kanellakis, J.L. Lassez, V.J. Saraswat. Proceedings of the Workshop on the Principles and Practice of Constraint Programming (PPCP93). Newport RI, April 1993. Preliminary version available via anonymous ftp from wilma.cs.brown.edu (128.148.33.66) in the directory /pub/ppcp93. To appear from MIT Press. 43. P. C. Kanellakis, S. Ramaswamy, D. E. Vengro , J. S. Vitter. Indexing for Data Models with Constraints and Classes. Proc. 12th ACM PODS, 233{243, 1993. 44. M. Kifer. On Safety, Domain Independence, and Capturability of Database Queries. Proc. International Conference on Databases and Knowledge Bases, Jerusalem Israel, 1988. 45. A. Klug. Equivalence of Relational Algebra and Relational Calculus Query Languages having Aggregate Functions. JACM, 29:3:699{717, 1982. 46. A. Klug. On Conjunctive Queries Containing Inequalities. JACM, 35:1:146{160, 1988. 47. P. Kolaitis, C.H. Papadimitriou. Why not Negation by Fixpoint? Proc. 7th ACM PODS, 231{239, 1988. 48. D. Kozen, C. Yap. Algebraic Cell Decomposition in NC. Proc. 26th IEEE FOCS, 515{521, 1985. 49. M. Koubarakis. Foundations of Temporal Constraint Databases. PhD Thesis. Nat. Tech. Univ. of Athens and Imperial College. 1993. 50. G.M. Kuper. Aggregation in Constraint Databases. in 42]. 51. W. Leler. Constraint Programming Languages. Addison Wesley, 1987. 52. A. Levy, Y. Sagiv. Constraints and Redundancy in Datalog. Proc. 11th ACM PODS, 67{81, 1992. 53. E. McCreight. Priority Search Trees. SIAM J. Computing, 14:257{276, 1985. 54. A.K. Mackworth. Consistency in Networks of Relations. AI, 8:1, 1977. 55. U. Montanari. Networks of Constraints: Fundamental Properties and Application to Picture Processing. Information Science, 7, 1974. 56. I. S. Mumick, S. J. Finkelstein, H. Pirahesh, R. Ramakrishnan. Magic Conditions. Proc. 9th ACM PODS, 314{330, 1990. 57. F.P. Preparata, M.I. Shamos. Computational Geometry: An Introduction. SpringerVerlag, 1985. 58. R. Ramakrishnan. Magic Templates: A Spellbinding Approach to Logic Programs. Proc. 5th International Conference on Logic Programming, 141{159, 1988. 59. S. Ramaswamy, S. Subramanian. Path Caching: A Technique for Optimal External Searching. Manuscript submitted for publication. 60. J. Renegar. On the Computational Complexity and Geometry of the Firstorder Theory of the Reals: Parts I{III. Journal of Symbolic Computation, 13:255{352, 1992. 61. P.Z. Revesz. A Closed Form for Datalog Queries with Integer Order. Proc. 3rd International Conference on Database Theory, 1990, (to appear in TCS). 62. H. Samet. Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. AddisonWesley, Reading MA, 1990.
63. H. Samet. The Design and Analysis of Spatial Data Structures. AddisonWesley, Reading MA, 1990. 64. V.A. Saraswat. Concurrent Constraint Programming Languages. PhD thesis, Carnegie Mellon University, 1989. 65. D. Srivastava, R. Ramakrishnan. Pushing Constraint Selections. Proc. 11th ACM PODS, 301{316, 1992. 66. G.L. Steele. The De nition and Implementation of a Computer Programming Language Based on Constraints. Ph.D. thesis, MIT, AITR 595, 1980. 67. I.E. Sutherland. SKETCHPAD: A ManMachine Graphical Communication System. Spartan Books, 1963. 68. A. Tarski. A Decision Method for Elementary Algebra and Geometry. University of California Press, Berkeley, California, 1951. 69. J.D. Ullman. Principles of Database Systems. Computer Science Press, 2nd Ed., 1982. 70. R. van der Meyden. The Complexity of Querying Inde nite Data about Linearly Ordered Domains. Proc. 11th ACM PODS, 331{346, 1992. 71. A. Van Gelder. Deriving Constraints among Argument Sizes in Logic Programs. Proc. 9th ACM PODS, 47{60, 1990. 72. P. Van Hentenryck. Constraint Satisfaction in Logic Programming. MIT Press, 1989. 73. M.Y. Vardi. The Complexity of Relational Query Languages. Proc. 14th ACM STOC, 137{146, 1982.