|
Category: ASE: Query Processing and Abstract Plan
Joins: Strategies
Basically ASE provides four main join strategies:
-
NestedLoopJoin
-
MergeJoin
-
HashJoin
-
NaryNestedJoin, as a variant of the NestedLoopJoin
Before ASE version 15.0, the NestedLoopJoin strategy was the primarily used join strategy.
Joins: NestedLoopJoin Strategy
In this simplest join strategy, the left side of the join forms the outer data set while the right
side forms the inner data set.
For each row of the outer data set, the inner data set must be scanned once.
If for the current row of the outer data set, all matching rows of the inner data set have been found,
the next row of the outer data set is selected and the inner data set scanned again for all matching rows,
and so forth, until the outer area reaches the "End of Scan" marker.
The NestedLoopJoin is most effective if the inner data set is covered by a useful
index.
Joins: MergeJoin Strategy
In a MergeJoin the data sets are, in addition to the proceedings during a NestedLoopJoin, sorted according
to the key values (columns) of the join.
The MergeJoin Strategy is most effective when the majority of rows must be processedist when scanning the
data set and when in large data sets the values are already sorted in key order of the join.
Joins: HashJoin Strategy
A HashJoin is also most effective, if large amounts of data must be processed. This data is hashed, i.e.
assigned a very short identification, which unambiguously defines each data value and therefore is easier
identifiable.
In a HashJoin the outer data set forms the input stream.
The inner data set forms the probe stream.
Every row of the input stream is being hashed, i.e. replaced by a very short, unambiguous identifier, and
stored in a memory structure called "hash bucket".
The same is then done with the probe stream and eventually the hash values of both streams are compared
for identical hash values, which then form the result of the HashJoin.
Joins: NaryNestedLoopJoin Strategy
The N-aryNestedLoopJoin is a special case of the NestedLoopJoin. It is never regarded by the optimizer
for joining, but rather an operator constructed from scratch every time when generating the execution code.
Exactly then, when several left-deep NestedLoopJoins are found, or if each NestedLoopJoin has an
"inner join" and the outer data set requires a scan.
In order to avoid an unnecessarily large number of scan operations, the optimizer uses NaryNestedLoop Joins,
which produce much less I/O than a comparable NestedLoopJoin.
|