Monday, January 27, 2014

Hierarchical Queries: A QuickStart Guide

hierarchical-queries
A hierarchical query is one that works on data with a "tree" relationship.
An Example of Hierarchical Data 
The employee-manager relationship in SCOTT’s famous EMP table. Each employee record has a manager’s id associated with it. In effect, there is a "tree" of data relationships
At each level, I can get the employee-manager relationship as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SQL> select empno, ename, mgr mgrno,
  2        (select e2.ename from emp e2
  3         where e1.mgr = e2.empno) mgrname
  from emp e1
  5  /
 
     EMPNO ENAME         MGRNO MGRNAME
---------- -------- ---------- ----------
      7369 SMITH          7902 FORD
      7499 ALLEN          7698 BLAKE
      7521 WARD           7698 BLAKE
      7566 JONES          7839 KING
      7654 MARTIN         7698 BLAKE
      7698 BLAKE          7839 KING
      7782 CLARK          7839 KING
      7788 SCOTT          7566 JONES
      7839 KING
      7844 TURNER         7698 BLAKE
      7876 ADAMS          7788 SCOTT
      7900 JAMES          7698 BLAKE
      7902 FORD           7566 JONES
      7934 MILLER         7782 CLARK
 
14 rows selected.
But what if I want to select –
  1. the entire hierarchy of employees under JONES?
  2. the line of reporting above MILLER, till the topmost manager?
These queries require a special way of traversing through the "tree". The next sections introduces you to the SQL keywords that you will need to write such a query, and finally the query.

Processing Hierarchical Data: START WITH.CONNECT BY

The START WITH.CONNECT BY clause can select hierarchical data, to answer the above two requirements.
The structure of a hierarchical query is:
1
2
3
4
SELECT <data>
FROM <table>
START WITH <root>
CONNECT BY PRIOR <condition>
START WITH specifies the topmost node of the tree or in other words, where to start parsing from.
CONNECT BY specifies the relationship between the parent and the child.
PRIOR achieves the recursive condition to traverse the tree.
Let’s see the query in action.

SQL to get the entire hierarchy below a node

In the EMP table example, the query to get the entire hierarchy of employees under JONES:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SQL> select empno, ename
  2     , sys_connect_by_path(ename, ' -> ') path
  from emp
  4  start with ename = 'JONES'    -- the root node
  connect by prior empno = mgr  -- the recursive condition
  6  /
 
     EMPNO ENAME    PATH
---------- -------- -----------------------------------------
      7566 JONES     -> JONES
      7788 SCOTT     -> JONES -> SCOTT
      7876 ADAMS     -> JONES -> SCOTT -> ADAMS
      7902 FORD      -> JONES -> FORD
      7369 SMITH     -> JONES -> FORD -> SMITH
SYS_CONNECT_BY_PATH(column, char) returns the path of a column value from root to node. The column values in the path are separated by the "char"  specified in the SQL, such as -> in this case.

SQL to get the entire hierarchy above a node

In the EMP table example, the query to get the line of reporting aboveMILLER, till the topmost manager:
1
2
3
4
5
6
7
8
9
10
11
SQL> select sys_connect_by_path(ename, ' -> ') line_of_reporting
  from emp
  3  start with ename = 'MILLER'    -- the root node
  4* connect by prior mgr = empno   -- the recursive condition
SQL> /
 
LINE_OF_REPORTING
------------------------------------------------------------
 -> MILLER
 -> MILLER -> CLARK
 -> MILLER -> CLARK -> KING
Almost like the previous one with a crucial difference: the connect by prior condition is reversed.

A Memory Aid

To get the reporting hierarchy below an employee, should the condition be written as
connect by prior mgr = empno;
or
connect by prior empno = mgr?
This will not be confusing once you commit this pneumonic to memory:
When the tree to be queried has a heavy bottom (i.e. reporting hierarchy below an employee, starts with one node and branches to many), then theheavy side must be put later.
By this logic, the condition for querying the hierarchy below a node is:connect by prior empno = mgr

Summary

This article covers:
  • The meaning of hierarchical data
  • The type of query to process hierarchical data
  • Keywords START WITH.CONNECT BY and others that feature in hierarchical queries
  • SQL to get all nodes below a specific node in a hierarchy
  • SQL to get all nodes above a specific node in a hierarchy
In the next article, we’ll talk about the LEVEL pseudocolumn in hierarchical queries.

No comments:

Post a Comment