In a relational database, a join is an operation that amalgamates rows across tables according to a set of conditions. The result of a join is equivalent to the rows in the Cartesian product of the tables that satisfy the conditions.
Since the Cartesian product of M and N has cardinality |M|⋅|N|, the worst-case time- and space-complexity of a join across M and N is O(|M|⋅|N|). One can see that this quickly becomes intractable since the tables are not bounded in size and there can be an arbitrary number of tables being joined.
A typical implementation of the join operator performs a number of sub-joins, storing the results of the intermediate computations in temporary tables. If the tables are too large and the conditions too permissive, these intermediate tables may exceed the storage capacity of the computer. At this point the traditional algorithms break down.
This patent application describes a method for performing joins in a depth-first manner which places a small, well-defined upper bound on the storage required to carry out the operation. The time complexity is harder to calculate as it depends on the query and the indices, but the best-case scenario is O(|M|⋅|N|). This makes joins tractable for extremely large data sets.
Method and Apparatus for Performing a Depth-First Join in a Database, USPTO application 20080027906