Thanks to the knowledge gained in the previous articles, SELECT: simple query structure, SELECT: query with JOIN and GROUP BY, SELECT: IN operator and SELECT: NOT IN operator and tuple constructor, we are able to write quite complex queries. In fact, we can define groups, calculate statistics, include and/or exclude records based on the result of other queries. We really have a lot of tools! The SQL language, however, provides us with even more features and constructs to increase our productivity and the possibilities of extracting data of interest with very particular constraints. In this article we will address the use of set operators that allow us to join the results of two queries using set theory.
As in previous articles, we will use the now well-known pattern related to movie showings in theaters. We quote it below for those who have not read the other articles.
FILM (CodF, Title, Release_Date, Genre, DurationMinutes)
CINEMA (CodC, Name, Address, City, Website*)
ROOM (CodC, RoomNumber, Capacity)
PROJECTION (CodC, RoomNumber, Date, StartTime, EndTime, CodF)
As a reminder, the notation used to describe the schema uses underlining to indicate the fields that make up the primary key and an asterisk to indicate the possibility of assigning NULL values to the field. Finally, for simplicity, foreign keys have the same name as the primary keys to which they refer. Let us then proceed to look at the various operators.
The set operators are as follows:
They have the function of putting together the results that result from two queries. However, they have a special feature: they require the result patterns to be compatible with each other. Therefore, unlike the IN and NOT IN operators that put two queries “in communication,” these operate similarly if and only if the selected attributes are the same. Let us analyze each operator in more detail to understand their peculiarities.
This operator performs the union of the two relational expressions A and B, that is, the results of two SELECT statements. The syntax is as follows:
A UNION [ALL] B
where the ALL parameter is optional. In case it is omitted duplicates of selected rows are removed. Otherwise all results are returned to the user. Let us better understand by means of an example how it works.
Suppose we want to find the names of movie theaters in Turin or that have at least one theater with a capacity greater than 200. We again divide the problem into two subproblems. The first will be related to finding all cinemas in Turin and is solved by a very simple query.
SELECT Name FROM CINEMA WHERE City=’Turin’;
The second problem, on the other hand, concerns the identification of cinemas with at least one hall with a capacity greater than 200. We must first filter the rows of the ROOM table by the value of the capacity attribute. After that, since we need the name of the cinema it will be sufficient to perform a join with the CINEMA table. The final query will be as follows:
SELECT Name FROM CINEMA C, ROOM R WHERE C.CodC=R.CodC AND Capacity > 200;
Now all we have to do is put the two results together. Since we have selected only the Name attribute in both queries, the patterns are by definition compatible. We can, therefore, apply the UNION operator so as to obtain the desired result. Obviously if there were a cinema that is located in Turin and has a hall with a capacity greater than 200 it would appear in both results of the two queries just written. To prevent it from being repeated simply omit the ALL parameter. The final query will be:
SELECT Name FROM CINEMA WHERE City=’Turin’ UNION SELECT Name FROM CINEMA C, ROOM R WHERE C.CodC=R.CodC AND Capacity > 200
As the name implies, this operator performs an intersection of the results of two SELECT statements. Given, therefore, two queries A and B, its syntax is
A INTERSECT B
Let us try to apply this to a concrete example. Suppose we want to find the dates on which movies were released and also some movies were shown in the theaters we have in our database. Therefore, we write two queries that select the release dates of the movies and the screening dates. The intersection between these two sets will be the ones we want to find. The first query will return the release dates from the FILM table, while the second will return the screening dates. To get the final result we will use the INTERSECT operator.
SELECT Release_Date FROM FILM INTERSECT SELECT Date FROM PROJECTION;
As you can see, the field names are different. This, in many databases, does not affect schema compatibility since the selected attributes are of the same type (i.e., DATE). Of course, some databases may require the name to be the same, and in that case it will be sufficient to rename using the AS operator the selected fields.
The intersection operation can be replaced by join or the IN operator. In the case of the join, it is sufficient to enter in the FROM clause the tables affected by the intersection, and in the WHERE clause the join conditions between the attributes in the SELECT clauses of the two queries. Using these rules the previous query would result:
SELECT F.Release_Date FROM FILM F, PROJECTION P WHERE F.Release_date=P.Date;
The join condition, then, is not the “classic” one on primary/external keys, which obviously should not be inserted so as not to affect the desired result.
If we want to use the IN operator instead, we will have to use one of the two queries as a nested query, while the attributes in the outer SELECT clause, joined by a tuple constructor, form the left side of the IN operator. In practice, the query from before we can write it like this:
SELECT Release_Date FROM FILM WHERE Release_date IN (SELECT Date FROM PROJECTION);
Finally, the EXCEPT operator performs a difference between the results of two queries. In addition to schema compatibility, this operator does not enjoy the commutative property. This means that the order in which the queries are written, and consequently executed, is relevant. Given two queries A and B, the syntax is:
A EXCEPT B
Let’s take the example about dates again. Suppose we want to find the release dates of movies where no movies were shown. The queries are as seen previously but we will need to use the EXCEPT operator.
SELECT Release_Date FROM FILM EXCEPT SELECT Date FROM PROJECTION;
Being a difference, the concept on which it operates is the same as the exclusion of the NOT IN operator that can be used in its place. In order to use it, the second query will become nested within the NOT IN operator, while the attributes in the SELECT clause of the first query, joined by a tuple constructor, will form the left side of the NOT IN operator. The end result is as follows:
SELECT Release_Date FROM FILM WHERE Release_date NOT IN (SELECT Date FROM PROJECTION);