SELECT: NOT IN operator and tuple constructor

In some cases there is a need to exclude one set of values from another set. The SQL language allows us to exclude values to be excluded returned by a nested query using the NOT IN operator. Where the elements to be excluded are identified by a set of attributes we can use the tuple constructor. With some examples we will understand how to use these constructs appropriately.

Share

Reading time: 3 minutes

In the previous articles, SELECT: simple query structure, SELECT: query with JOIN and GROUP BY and SELECT: IN operator, we addressed various constructs of the SQL language. In particular, in the article SELECT: IN operator, we introduced the concept of membership in a set using the IN construct. This allowed us both to divide complex problems into subproblems and to exploit the result of queries to filter records from other tables. As mentioned, however, there is not only the IN operator, but also its dual the NOT IN operator.

In this article we will discuss this very operator and how to use it correctly to answer some queries that we would otherwise fail to express with the constructs seen so far. Finally, we will also introduce the tuple constructor that can be used with both this operator and IN.

We will use, again, the pattern from the other articles related to movie showings in theaters. We reproduce it below for those who have not read the other articles.

FILM (CodF, Title, Release_Date, Genre, DurationMinutes)
CINEMA (CodC, Name, Address, City, Website*)
ROOM (CodC, RoomNumber, Capacity)
PROJECTION (CodC, RoomNumber, Date, StartTime, EndTime, CodF)

As a reminder, the notation used to describe the schema uses underlining to indicate the fields that make up the primary key and an asterisk to indicate the possibility of assigning NULL values to the field. Finally, for simplicity, foreign keys have the same name as the primary keys to which they refer. We then proceed to look at the various operators.

Exclusion concept

In some contexts we are asked to exclude values from the result of a query because they are not of interest. How do we do that? It depends. In fact, there is no single answer because it depends from case to case. For example, if we wanted to display all cinemas that are not in the city of Turin, the query would be as follows:

				
					SELECT *
FROM CINEMA
WHERE City <> ’Turin';
				
			

In this example we simply indicated to include in the final result all records that have a value other than the one we are not interested in, namely the city of Turin.  But suppose we want to find theaters that have only theaters with a capacity greater than 100. Let us try writing the query as follows:

				
					SELECT C.*
FROM ROOM R, CINEMA C
WHERE R.CodC=C.CodC 
AND Capacity > 100;
				
			

What are we getting back? We are actually getting back the cinemas that have at least one theater with a capacity greater than 100. However, if there is a theater with a capacity less than 100 in that theater, the theater will still be displayed. Why? The reason is that we are only filtering theaters by capacity and not theaters.

How do we then exclude something from the result of a query? We need an operator that acts precisely on a set of values and excludes them. This operator is the NOT IN.

NOT IN operator

The NOT IN operator expresses precisely the concept of exclusion from a set of values. The syntax is as follows:

				
					Attributename NOT IN (NestedQuery)
				
			

where the attribute name must be compatible with the attribute returned by the nested query. The nested query can also be replaced with a list of values, if necessary. As with the IN operator, this last option is used only in cases where these values are derived from external interactions with DB data, such as when a user chooses multiple options in a form. Using a nested query allows the DBMS to optimize query execution and make it faster and more efficient.

The NOT IN operator is used to appropriately identify the set to be excluded so that the query performs the proper exclusions of the affected records.

Attention

There is no equivalent formulation with the join

Thus, let us resume the example seen earlier. If we want to identify cinemas that have only rooms with a capacity greater than 100, we must exclude cinemas and not individual rooms! Let us, therefore, divide the problem into two parts:

  1. find the cinemas that have rooms with a capacity less than or equal to 100
  2. exclude the cinemas found in the previous step

The first query results in

				
					SELECT CodC
FROM ROOM
WHERE Capacity <= 100;
				
			

At this point, having selected the cinema code, we can filter the cinemas by excluding the returned values. Therefore, we use the NOT IN operator by comparing the cinema codes. The query will result:

				
					SELECT *
FROM CINEMA
WHERE CodC NOT IN (SELECT CodC
    FROM ROOM
    WHERE Capacity <= 100);
				
			

When NOT IN is used?

Some questions arise at this point:

  1. How do we know that we need to use the NOT IN operator?
  2. What attribute to use for exclusion?
  3. What conditions to put in the nested query?

Let’s start with the first question. The NOT IN operator is used to exclude a set of values from the final result and is used only in the WHERE clause. Therefore, when you have to filter a table based on conditions that are expressed with the adverbs “only,” “always,” and “never” you have to think that in most cases you will have to use the NOT IN operator.

With this in mind, we need to figure out from the problem what we want to exclude. Remember to use primary keys whenever possible in order to exclude something.

Finally, to answer the third question, you must always think about what is the set to be excluded and what characteristics it has. In the previous example, the theaters that had a capacity less than or equal to 100 determined whether the theater should be included or excluded from the final result. We have, therefore, reversed the condition from the original query. This approach almost always works, but it depends greatly on the conditions that are imposed.

Tuple constructor

Sometimes the primary key on which we want to make a comparison/exclusion is composed of multiple attributes. As seen above, the basic syntax of the In and NOT IN operators act on only one attribute. Can they also be used with multiple attributes? Absolutely! We need to introduce the concept of a tuple constructor, which allows you to define a temporary structure of a tuple. To do this we list the attributes that are part of it in round brackets, as shown below

				
					(AttributeName1, AttributeName2, ...)
				
			

Attention

The order of attributes is important and should reflect the order returned by nested queries

Let us look at an example of the application of the tuple constructor combined with the NOT IN operator. Suppose we want to locate the theaters where movies of the horror genre have never been shown. We surely cannot write the query that follows:

				
					SELECT R.*
FROM ROOM S, PROJECTION P, FILM F
WHERE R.CodC=P.CodC AND R.RoomNumber=P.CodC
AND P.CodF=F.CodF
AND Genre <> ‘Horror’;
				
			

This solution is wrong because it filters screenings by film genre and not rooms. We need to use the NOT IN operator. We divide the problem into two components:

  1. Finding the rooms that have projected movies of the horror genre.
  2. We exclude the rooms found in the previous step from the final result.

The rooms are identified by both the cinema code and the room number. Therefore, the query will need to select both fields and then perform the exclusion correctly. Otherwise, by selecting only the cinema codes or the room number we would have compared different datasets.

				
					SELECT CodC, RoomNumber
FROM PROJECTION P, FILM F
WHERE P.CodF=F.CodF
AND Genre=’Horror’;
				
			

Now we can use the tuple constructor along with the NOT IN operator to correctly exclude rooms. The final query will be:

				
					SELECT *
FROM ROOM
WHERE (CodC, RoomNumber) NOT IN (SELECT CodC, RoomNumber
    FROM PROJECTION P, FILM F
    WHERE P.CodF=F.CodF
    AND Genre=’Horror’);

				
			

More To Explore

Artificial intelligence

Gradio: web applications in python for AI [part2]

Gradio is a python library that allows us to create web applications quickly and intuitively for our machine learning and AI models. Our applications always require user interaction and layout customization. Let us find out, through examples, how to improve our applications.

Artificial intelligence

Gradio: web applications in python for AI [part1]

Writing web applications for our machine learning and/or artificial intelligence models can take a lot of time and skills that we do not possess. To streamline and speed up this task we are helped by Gradio, a Python library designed to create web applications with just a few lines of code. Let’s discover its basic functionality with some examples.

Leave a Reply

Your email address will not be published. Required fields are marked *

Design with MongoDB

Design with MongoDB!!!

Buy the new book that will help you to use MongoDB correctly for your applications. Available now on Amazon!