In previous articles we have seen many constructs of the SQL language. In particular, with the use of derived tables and their evolution, namely Common Table Expression (CTE) we have been able to write very complex queries. These constructs have been present for years now in relational databases, both commercial and open source. However, before their advent, queries were written differently when a query could be answered by a single query. In fact, In some cases it may be necessary to bind the execution of a nested query to the value of one or more attributes in a more external query. This binding, called correlation, is expressed by one or more correlation conditions that are specified in the WHERE clause of the nested query that requires it. The predicate will, therefore. bind some attributes of the tables entered in the FROM clause of the nested query with some attributes of the tables defined in the FROM clause of the outermost queries. Put like that it sounds very complex, but it is actually easier done than said.
In this article we will look at some examples of correlation both required by some SQL language operators and for other scenarios. As in previous articles, we will use the pattern related to movie showings in theaters. We reproduce it below for those who have not read the other articles.
FILM (CodF, Title, Release_Date, Genre, DurationMinutes)
CINEMA (CodC, Name, Address, City, Website*)
ROOM (CodC, RoomNumber, Capacity)
PROJECTION (CodC, RoomNumber, Date, StartTime, EndTime, CodF)
As a reminder, the notation used to describe the schema uses underlining to indicate the fields that make up the primary key and an asterisk to indicate the possibility of assigning NULL values to the field. Finally, for simplicity, foreign keys have the same name as the primary keys to which they refer. We then proceed to look at the various operators.
EXISTS and NOT EXISTS
The operators that use correlation for their operation are:
- NOT EXISTS
These two operators are “half-brothers” of the IN and NOT IN operators seen in SELECT: IN operator and SELECT: NOT IN operator and tuple constructor articles, respectively. However, both their syntax and behavior differ.
In fact, the EXISTS and NOT EXISTS operators admit as a parameter a nested query whose execution will be bound to the outermost query by one or more correlation conditions. The EXISTS operator returns the true value only if the nested query yields a nonempty set (i.e., returns at least one record). Otherwise, it returns the value false if the nested query returns the empty set (i.e., returns no records). The NOT EXISTS operator behaves in a dual way. Finally, the SELECT clause of the nested query can include any attribute but is irrelevant to the result. Let us look at an example to understand the link between the various operators.
Suppose we want to find the titles of movies that have never been shown in cinemas in Turin. As usual, we divide the problem into two steps
- Find the codes of movies that have been shown at least once in a cinema in Turin
- Display the titles of movies that do not belong to the set found previously
The first part can be solved with the following query
SELECT CodF FROM PROJECTION P, CINEMA C WHERE P.CodC=C.CodC AND City=’Turin’;
At this point we should exclude these codes from the final result. The most natural thing to do is to use the NOT IN operator on the CodF attribute. We would then obtain the following query:
SELECT Title FROM FILM WHERE CodF NOT IN ( SELECT CodF FROM PROJECTION P, CINEMA C WHERE P.CodC=C.CodC AND City=’Turin’);
One could reason differently, however. For each film, we look to see if there is a screening in cinemas in the city of Turin. If it exists, that movie should be excluded from the result, otherwise it should be displayed. This is exactly the logic of NOT EXISTS. We then rewrite the query as follows.
SELECT Title FROM FILM F WHERE NOT EXISTS ( SELECT * FROM PROJECTION P, CINEMA C WHERE P.CodC=C.CodC AND City=’Turin’ AND P.CodF=F.CodF);
The last condition in the nested query determines the correlation between its execution and the external query. This means that as the FILM table is scanned, a query is generated whose conditions in the WHERE clause are updated according to the value read in the CodF field. Obviously, the database does not perform this procedure because it would be too resource- and run-time-intensive, but that is the underlying logic.
As can be seen from this example it is possible to express the same query with NOT In than with NOT EXISTS. But then why might the results differ? First we must specify that in most cases writing with an operator or its “half-brother” is irrelevant to the result. However, if the attribute on which correlation is performed (i.e., the comparison attribute for inclusion/exclusion) may have NULL values then the set of filtered records will not be equivalent. The EXISTS and NOT EXISTS operators also handle NULL values, while their “half-brothers” do not.
Correlation among queries
Correlation can also be used without the use of the EXISTS and NOT EXISTS operators. In fact, we may need to bind the result of nested query to what we are analyzing in the outer query. Before we look at a concrete example, some clarifications need to be made:
- Correlation conditions cannot be expressed in queries at the same level of nesting.
- Correlation can only be expressed between a query of a given nesting level and the attributes of the tables of the query of the immediately higher nesting level. Therefore, correlation cannot be used in an external query or “skipped” nesting levels.
Suppose we want to find theaters that have a capacity greater than the average capacity of all movie theaters in the same city. To summarize, if movie theaters in Turin have an average capacity of 50, we want to find the theaters that have a value for the capacity attribute greater than 50. Obviously, as we change cities, the average value will change accordingly.
Let’s start writing the query that calculates the average capacity of the cinema.
SELECT AVG(Capacity) FROM ROOM R, CINEMA C WHERE R.CodC=C.CodC
This query, however, returns us a value calculated on all halls. How do we limit its range to the city to which a hall belongs? We need to use correlation! In the external query we are going to scan the HALL table by requiring that the capacity of the individual hall is greater than the average calculated by the previous query to which, however, we will add the condition that the city is equal to the one we are externally analyzing. The end result will be:
SELECT CodC, RoomNumber FROM ROOM R1, CINEMA C1 WHERE R1.CodC=C1.CodC AND Cacity > (SELECT AVG(Capacity) FROM ROOM R2, CINEMA C2 WHERE R2.CodC=C2.CodC AND C1.City=C2.City);
The use of aliases helps us to distinguish various instances of the same table and to express correlation correctly.
This example highlights to us the difficulty of writing queries of this kind in a straightforward way. Are there simpler alternatives? Of course there are: derived tables and CTEs. We can, in fact, write the same query using what we saw in articles XXX and YYY. We just need to divide the problem differently:
- For each city calculate the average capacity
- Add the average capacity value to each city
- Filter the rooms by capacity
We use CTEs for simplicity in reading the solution as well. The first query will then be:
WITH AVERAGE_CITY AS (SELECT City, AVG(Capacity) AS AvgCapacity FROM ROOM R, CINEMA C WHERE R.CodC=C.CodC GROUP BY City)
At this point, to answer the other two steps of the problem, simply join the CTE just defined with the CINEMA table and then, through another join, retrieve the theaters in each theater and compare the capacity with the average capacity. The final query will be:
WITH AVERAGE_CITY AS (SELECT City, AVG(Capacity) AS AvgCapacity FROM ROOM R, CINEMA C WHERE R.CodC=C.CodC GROUP BY City) SELECT CodC, RoomNumber FROM ROOM R, CINEMA C, AVERAGE_CITY A WHERE R.CodC=C.CodC AND A.City=C.City AND Capacity > AvgCapacity;
As can be seen, the use of CTE and derived tables is much more intuitive than correlation and places no constraints on the levels of nesting we can use.