Monday, January 27, 2014

IN = EXISTS, but beware: NOT IN is not the same as NOT EXISTS!

not-in-not-exists
In our last article we said that in SQL queries, IN and EXISTS are interchangeable. Many of us assume therefore, that NOT IN and NOT EXISTS are also interchangeable.
A big mistake.
See how NOT IN and NOT EXISTS behave differently in this small example.
Writing a query to find the number of employees in emp table who are not managers. The logic used is: get count where the employee id is not present in as mgr id in the same table.

The NOT IN version:

1
2
3
4
5
6
7
8
9
10
11
12
SQL> -- Using NOT IN
SQL> -- Count employees who
SQL> -- are not managers
SQL> select count(*)
  from emp e1
  where e1.empno not in
  4    (select e2.mgr
  5     from emp e2);
 
  COUNT(*)
----------
         0
OK, that says there are no such employees. Let’s try it with NOT EXISTS now.

The NOT EXISTS version:

1
2
3
4
5
6
7
8
9
10
11
12
13
SQL> -- Using NOT EXISTS
SQL> -- Count employees who
SQL> -- are not managers
SQL> select count(*)
  from emp e1
  where not exists
  4    (select 'Y'
  5     from emp e2
  6     where e2.mgr = e1.empno)
 
  COUNT(*)
----------
         8
Whoa! 8 such employees! Which answer is correct?
Let’s verify with the help of a flag against each employee – yes_mgr or no_mgr. We calculate this flag with help of the CASE statement.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SQL> select e.empno
  2       , e.mgr
  3       , case when
  4             (select 'Y' from emp m
  5              where m.mgr = e.empno
  6              and rownum = 1) is null
  7            then 'no_mgr'
  8            else 'yes_mgr'
  9         end is_mgr
 10  from emp e;
 
     EMPNO        MGR IS_MGR
---------- ---------- -------
      7369       7902 no_mgr
      7499       7698 no_mgr
      7521       7698 no_mgr
      7566       7839 yes_mgr
      7654       7698 no_mgr
      7698       7839 yes_mgr
      7782       7839 yes_mgr
      7788       7566 yes_mgr
      7839            yes_mgr
      7844       7698 no_mgr
      7876       7788 no_mgr
      7900       7698 no_mgr
      7902       7566 yes_mgr
      7934       7782 no_mgr
 
14 rows selected.
It turns out that there actually are eight employees who are flagged no_mgr, so the NOT EXISTS version is all right.

Why does NOT IN give a “wrong” result?!

Well, it isn’t a wrong result, that’s how it is supposed to work! The column ‘mgr’ in the emp table is nullable. As per Oracle’s treatment of NULLs:
  • where 3 in (3, null) is TRUE as 3 is in (3).
  • where 3 not in (1, null) is UNKNOWN as it is not known whether 3 is in null or not.
So, the NOT IN condition does not equate to true if the value being matched has nulls.

Conclusion

  • NOT IN and NOT EXISTS are not identical if the subquery can potentially return nulls.
  • Use NOT IN only if you are 100% certain that the subquery will not return NULLs.

No comments:

Post a Comment