NOT IN, NOT EXISTS and NULLs

By | April 11, 2011

Last week I was teaching a junior programmer about the differences between not in and not exists and the issue of nulls came up. This isn’t the most intuitive thing, so I thought I would post about it as well.

For a simple example, I will just create 2 simple data sets of one column each. One with 13 rows (3 being null) and the other with 15 rows all populated.

Now lets look at the difference between NOT IN and NOT EXISTS when comparing the 2 data sets:

NOT IN

with nulls as
(
    select rownum id 
    from dual
    connect by level <= 10
    union all
    select null id
    from dual
    connect by level <= 3
),
no_nulls as
(
    select rownum id 
    from dual
    connect by level <= 15
)
select *
from no_nulls
where id not in (
    select id
    from nulls
);
no rows selected

NOT EXISTS

with nulls as
(
    select rownum id 
    from dual
    connect by level <= 10
    union all
    select null id
    from dual
    connect by level <= 3
),
no_nulls as
(
    select rownum id 
    from dual
    connect by level <= 15
)
select *
from no_nulls nn
where not exists (
    select null
    from nulls n
    where n.id = nn.id
);
        ID
----------
        13
        11
        12
        14
        15

And one last test, the NOT IN when there are no nulls present:

with nulls as
(
    select rownum id 
    from dual
    connect by level <= 10
    union all
    select null id
    from dual
    connect by level <= 3
),
no_nulls as
(
    select rownum id 
    from dual
    connect by level <= 15
)
select *
from no_nulls
where id not in (
    select id
    from nulls
    where id is not null
);
        ID
----------
        11
        12
        13
        14
        15

So why this behavior? Well, it comes down to how NOT IN and NOT EXISTS are evaluated. For NOT EXISTS, the subquery is run and the expression is either true or false depending on if rows are returned or not. For this example, the subquery will return no rows for IDs 11 - 15, so the NOT EXISTS will evaluate to TRUE and those rows are returned.
For NOT IN, the expression can be thought of as the following for each row in the dataset:

[current_value from "no_nulls"] != [first_value from "nulls"]
AND [current_value from "no_nulls"] != [second_value from "nulls"]
AND [current_value from "no_nulls"] != [third_value from "nulls"]
...
AND [current_value from "no_nulls"] != [last_value from "nulls"]

Since the "nulls" query has at least one null value, the expression is trying to compare if a value is not equal to null, which Oracle can't know, so it returns null. This does not evaluate the expression to TRUE, so that row is skipped, as are all of the others by the same logic.

When you exclude the nulls out of the NOT IN query, all of the expressions evaluate to either TRUE or FALSE, so the rows are returned as expected.

Keep in mind that this behavior is intentional and does make sense, since a null means literally that the value is not known, even if that means it is a pain to deal with sometimes. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Turn on pictures to see the captcha *