Deep learning: introduction

Deep learning is a tool that has invaded everyday life. Many of the applications we use daily are based on models built with specific techniques that accumulate experience by looking at the data available to us. Let's find out what it is and how it affects the use and experience of our applications.

Share

Tempo di lettura: 5 minuti

Most of the programs we use every day are coded as a rigid set of rules that specify exactly how the user should behave and the system respond to requests. Suppose we want to write an application to manage an e-commerce platform. After thinking at the table quite a bit, we might define the rules that will govern a working solution, e.g., (i) users interact with the application through an interface in a browser or mobile application; (ii) our application interacts with a database to keep track of each user’s transactions and store historical transactions; and (iii) at the heart of our application, we will implement the business logic of our application to define a set of rules that map each event to the corresponding action to be performed by our platform.

To build the business logic, i.e., the “brain” of our application, we might enumerate all the events that our program must handle. For example, each time a customer clicks to add an item to his or her shopping cart, the program must add an entry to the shopping cart database table, associating the user’s identifier with the requested product identifier. We could then attempt to examine all possible cases, checking the adequacy of our rules and making any necessary changes. It will take several tests and hours of implementation to arrive at a working product. Nevertheless, in most cases we can write such programs and launch them with confidence.

Designing automated systems that adapt to new data and/or situations would require a lot of cognitive effort, and we would be unlikely to get 100% working products. Think, for example, of wanting to write a program that predicts tomorrow’s weather based on geographic information, satellite images, and a window of past time, or to create a chatbot that answers a question written in natural language. For these problems, it is extremely difficult to find a solution with strict rules as in the e-commerce platform example. The reasons may be different depending on the context. For example, the program we are developing uses a pattern that changes over time, so there is no fixed right answer! Other times, the relationships between data may be too complicated and/or require thousands or millions of calculations and follow unknown principles. In the case of image recognition, the precise steps required to perform the task are beyond our understanding, even though our subconscious cognitive processes perform the task effortlessly.


Machine learning is the study and development of algorithms that can learn from experience. When a machine learning algorithm accumulates more experience, i.e., “observes” more data/events, its performance improves. This is in contrast to our deterministic e-commerce platform, which follows the same business logic, regardless of accumulated experience, until the developers themselves decide it is time to upgrade the software.

This article is an introduction to a series of other articles that will address the main concepts of machine learning and in more detail deep learning. Through some examples that can be replicated at home we will learn what is useful to know if you want to become a data scientist or elevate your computing knowledge.

Deep learning in real life

How often do we interact with machine learning or deep learning models every day? Hard to say but certainly more than we imagine. Let’s take an example so we can understand how what we habitually do is influenced by machine learning systems.

We get into the car to head to a business meeting and start driving. Since we don’t know the road we talk to our smartphone (it’s not so strange nowadays to see people in cars talking to themselves!). Assuming we have an iPhone, we say “Hey Siri” to unlock the phone’s voice recognition system. At this point we ask for “directions to the Crazy Cats company.” The phone quickly displays the transcript of the command and recognizing that we have asked for directions, launches the Maps application to fulfill our request. Once launched, the Maps application identifies the best route and alternatives by showing us the expected travel time. This example demonstrates how a simple iteration with our smartphone can involve several machine learning systems.

Now imagine writing a program that responds to a password such as “Alexa,” “OK Google,” and “Hey Siri.” How would you go about writing such a program from scratch? Think about it-the problem is difficult not to say impossible. Every second, the microphone collects about 44,000 samples. Each sample is a measure of the amplitude of the sound wave. What rule could reliably map a fragment of raw audio to reliable predictions to determine whether or not the fragment contains the word to trigger the program? It would be very difficult to accomplish that, and then subsequently we would have to figure out how to recognize the other words to launch the apps we need. That is why we use machine learning.

Using machine learning is the trick of explicitly telling a computer how to map from inputs to outputs. In other words, even if you don’t know how to program a computer to recognize the word “Alexa,” you yourself can recognize it. Armed with this ability, we can collect a huge dataset containing examples of audio fragments and associated labels that indicate which fragments contain the word Alexa. In the currently dominant approach to machine learning, we do not try to design a system explicitly capable of recognizing a word. Instead, we define a flexible program whose behavior is determined by a set of parameters. Then we use the data set to determine the best possible parameter values, that is, those that improve the performance of our program with respect to a chosen performance measure.


We can think of the parameters as knobs that we can turn to change the behavior of the program. Once the parameters are fixed, the program defines a model. The set of all programs (input-output mappings) that we can produce simply by manipulating parameters is called a model family. The “meta-program” that uses our data set to choose parameters is called the learning algorithm.

Before proceeding with the learning algorithm, we must precisely define the problem, establishing the exact nature of the inputs and outputs and choosing an appropriate model family. In this case, our model receives a fragment of audio as input and generates a choice between yes and no as output. If all goes according to plan, the model’s assumptions will generally be correct regarding whether the fragment contains the word Alexa.

If we choose the right model family, there should be a knob setting such that the model provides “yes” whenever it hears the word “Alexa.” Since the exact choice of the password is arbitrary, we will probably need a sufficiently rich model family that, through another knob setting, can provide “yes” as output only when it hears the word “Apricot.” We expect the same pattern family to be suitable for recognizing “Alexa” and recognizing “Apricot,” because intuitively they seem to be similar tasks. However, we might need a completely different model family if we wanted to handle fundamentally different inputs or outputs, for example, if we wanted to map from pictures to captions or from English sentences to Italian sentences.

As you can guess, if we set all the knobs randomly, our model is unlikely to recognize “Alexa,” “Apricot,” or any other word. In machine learning, learning is the process by which we discover the right knob setting to get the desired behavior from our model. In other words, we train our model with data. The training process is usually as follows:

  1. Start with a randomly initialized model that cannot do anything useful.
  2. Take some data (e.g., audio snippets and corresponding labels).
  3. Modify the knobs to improve the performance of the model based on the examples.
  4. Repeat steps 2 and 3 until the model achieves satisfactory performance.

In summary, instead of creating a word recognizer, we create a program that learns to recognize words of interest to us. This can be thought of as programming with data. In other words, we can “program” a dog and cat detector by providing our machine learning system with many examples of dogs and cats. In this way the detector will learn to output a very large positive number if it is a cat, a very large negative number if it is a dog, and something closer to zero if it is unsafe. This is just one example of what machine learning is capable of doing. Deep learning, which we will explain in detail in other articles, is just one of many methods of solving machine learning problems.

More To Explore

Artificial intelligence

Sentiment Analysis & Topic Modeling: What Your Customers Really Mean

You have 200 reviews, 500 support tickets, 1,000 social media comments. Reading them all would take days — and you’d still miss the most important patterns. Sentiment Analysis and Topic Modeling solve exactly this: in ten minutes you get the emotional tone of every text, recurring themes grouped automatically, and a strategic summary that manual reading would never have produced.

Artificial intelligence

Multimodal AI: Analyze PDFs, Images and Documents with Claude, GPT-4 and Gemini

AI no longer reads only text. Claude summarizes a 10-page quote in 30 seconds. GPT-4 Vision transcribes data from a dashboard screenshot into a ready-to-use table. Gemini 1.5 Pro navigates 1,000-page documents citing the sources. This guide shows how they work, when to use which tool, and where the time savings are measurable — with real screenshots from live sessions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Progetta con MongoDB!!!

Acquista il nuovo libro che ti aiuterà a usare correttamente MongoDB per le tue applicazioni. Disponibile ora su Amazon!