SELECT: simple query structure

The SELECT statement in the SQL language is perhaps the most widely used as it allows us to retrieve information of interest from the database. Its syntax is simple and well-structured, but we need to know some constraints behind each clause of the statement. By means of some examples we will discover how to write queries that are simple but at the same time useful in a real context.

Share

Reading time: 5 minutes

In previous articles we have seen how to create the tables of a database (SQL: creating a database) and how to insert, update and delete data as well as the concept of transactions (SQL: transactions and data manipulation).  Of course, the usefulness of a relational database is not only to store information in a proper and structured way, but to retrieve what we are interested in efficiently. In this article we will look at the structure of queries, that is, the SELECT statement. We will focus on the main and simple clauses to begin to understand how the statement works. If you are ready we can get started!

Query structure

SQL queries have a well-defined structure:

				
					SELECT [DISTINCT] ListAttributesToShow
FROM ListTablesToUse
[WHERE ConditionsOfTuple ]
[GROUP BY ListAttributesOfGrouping ]
[HAVING ConditionsOnAggregates ]
[ORDER BY ListOfOrderAttributes ]

				
			

where square brackets indicate that the term inside is optional.

Below we will use the following diagram to illustrate the various clauses of the SELECT statement, where primary keys are underlined, fields with asterisks indicate that they can take null values, and foreign keys have the same name as the primary keys they refer to.

MOVIE (CodM, Title, Release_Date, Genre, DurationMinutes)

CINEMA (CodC, Name, Address, City, Website*)

ROOM (CodC, RoomNumber, Capacity)

PROJECTION (CodC, RoomNumber, Date, StartTime, EndTime, CodM)

SELECT Clause

In the FROM clause you enter the tables involved in the query, while in the SELECT clause you enter the information you want to be displayed in the final result. Specifically, in the SELECT you can enter the columns of interest or the wildcard * to indicate that all available columns should be displayed. For example, the query

				
					SELECT * 
FROM MOVIE;

				
			

will select all records in the MOVIE table and display all available columns in the order defined when the table was created. Otherwise, if we want to display only the title and genre the query will have to be rewritten as follows:

				
					SELECT Title, Genre
FROM MOVIE;

				
			

However, in case there are two or more movies with the same title and belonging to the same genre (such as remakes) the result will contain duplicate records. To eliminate these duplications in the result it is necessary to use the keyword DISTINCT in the SELECT clause.

				
					SELECT DISTINCT Title, Genre
FROM MOVIE;

				
			

Fields can be renamed by adding an alias next to the desired field. It is possible to use the keyword AS to explicitly indicate the alias assignment. For example, if we wanted to rename the genre column in the result as Category we would simply write the following query:

				
					SELECT Title, Genre AS Category
FROM MOVIE;


				
			

WHERE clause

The WHERE clause allows selection conditions expressed individually to each record. The conditions are represented by a Boolean expression of simple predicates such as expressions for comparing attributes and constants, text search, and evaluation of NULL values.

For example, if we wanted to display movies in the genre Adventure and with a duration greater than 90 minutes, the query would result as follows:

				
					SELECT *
FROM MOVIE
WHERE Genre=”Adventure” AND DuratioMinutes > 90;


				
			

Otherwise if we were interested in Adventure and Fantasy genre threads we would have to use the OR operator since the individual field for each record can only take one value at a time. The query would look like this:

				
					SELECT *
FROM MOVIE
WHERE Genre=”Adventure” OR Genre=”Fantasy”;



				
			

For text search, it is necessary to use the LIKE operator. Beware that this is not a textual search as commonly understood, but only a search for portions of strings. Suppose we want to find all movies whose title begins with “Star.” To do this we will use the % symbol immediately after the string Star so that an arbitrary number of characters can appear after it.

				
					SELECT *
FROM MOVIE
WHERE Title LIKE ”Star%”;




				
			

Otherwise if we wanted to indicate that there is only one character after Star it is necessary to use the symbol “_” as below.

				
					SELECT *
FROM MOVIE
WHERE Title LIKE ”Star_”;





				
			

Finally, in some cases it is required to check whether a field takes the value NULL or not. From the diagram we can see that the web site of cinemas is not always available. Therefore, if we wanted to display all cinemas that do not have a website we must use the IS operator whose syntax is as follows:

				
					AttributeName IS [NOT] NULL





				
			

The query will then be as follows:

				
					SELECT * 
FROM CINEMA
WHERE Website IS NULL;
				
			

Otherwise, if we wanted to find all cinemas that have a website, it would be sufficient to negate the previous predicate.

				
					SELECT * 
FROM CINEMA
WHERE Website IS NOT NULL;
				
			

ORDER BY clause

When we view the results of previously introduced queries, the sorting of the records seems random to us. In fact, it is not really random but depends on the order in which the database is reading the data on disk. However, this does not help us in some cases. For example, if we wanted to display rooms with a capacity greater than 100 seats it would also be useful to sort them from largest to smallest. To do this we need to use the ORDER BY clause, which has the following syntax:

				
					ORDER BY AttributeName [ASC | DESC]  {, AttributeName [ASC | DESC] }
				
			

where the curly brackets indicate that the enclosed term may not appear or be repeated an arbitrary number of times, while the “|” symbol indicates that one of the terms must be chosen. The default sorting for many DBMSs is ascending (ASC) and therefore can be omitted.

Returning to our example we will simply enter a descending sort based on the Capacity field. The query will result as follows:

				
					SELECT * 
FROM ROOM
WHERE Capacity > 100
ORDER BY Capacity DESC;
				
			

However, we may find some rooms that have the same capacity. In those cases the sorting will be “random”. If we wanted to insert another sorting constraint in case of equal capacity we could use the hall number by obtaining the following query

				
					SELECT * 
FROM ROOM
WHERE Capacity > 100
ORDER BY Capacity DESC, RoomNumber;

				
			

Aggregation functions

Aggregation functions operate on a set of values and produce a single (aggregate) value as the result. These functions are evaluated only after all the predicates in the WHERE clause have been applied. They can be used in the SELECT clause and in the HAVING clause.  Beware that in case the GROUP BY clause is not defined, no other non-aggregate attributes can be added in the SELECT.

The standard aggregation functions are as follows.

COUNT

Counts the number of elements in a set, that is, the rows or values (possibly distinct) of one or more attributes. Its syntax is:

				
					COUNT (<*| [DISTINCT | ALL] ListAttributes >)}

				
			

The symbol * indicates that the count is referred to the rows of the set. Otherwise, if only the list of attributes is entered, the count is done for attribute values that do not take NULL value. Finally, with the keyword DISTINCT the count returns the number of different values for the specified attributes.

SUM

Calculates the sum of the values of the given attributes and also admits an expression as an argument. Attributes must be of numeric or temporal type.

MIN e MAX

They calculate minimum and maximum values and also admit an expression as an argument. Attributes must be able to be sortable, so even on text type fields they can be applied. Beware that sorting on strings depends on character encoding and thus is affected by lower and upper case.

AVG

Computes the arithmetic mean of an attribute or expression. Attributes must be either numeric or temporal.

Suppose we want to calculate the number of theaters, the number of different theaters where the theaters are located, and some statistics about the capacity of the theaters. The query will result as follows:

				
					SELECT COUNT(*) AS TotRooms, COUNT(DISTINCT CodC) AS TotCinema, 
MIN(Capacity), MAX(Capacity), AVG(Capacity)
FROM ROOM;

				
			

More To Explore

Artificial intelligence

AI: engineering prompts to reduce hallucinations [part 2]

Hallucinations, i.e., responses that appear to make sense but are actually incorrect, afflict all large language models (LLMs). There are some techniques that can be used to mitigate this behavior. Let us discover some of them through examples and by analyzing the advantages and disadvantages.

Artificial intelligence

AI: prompt engineering to reduce hallucinations [part 1]

Prompt engineering techniques allow us to improve the reasoning and responses provided by LLMs , such as ChatGPT. However, are we sure that the responses received are correct? In some cases no! When this happens, the model is said to have hallucinated. Let’s find out what this is and what are the techniques to reduce the probability of getting them.

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

Progetta con MongoDB!!!

Acquista il nuovo libro che ti aiuterà a usare correttamente MongoDB per le tue applicazioni. Disponibile ora su Amazon!

Design with MongoDB

Design with MongoDB!!!

Buy the new book that will help you to use MongoDB correctly for your applications. Available now on Amazon!