Several methodologies and algorithms that are part of deep learning have been developed in recent years. But are these techniques actually that modern? Let's look at a bit of history, starting from the Middle Ages and arriving at the present day, to understand the roots of deep learning and artificial intelligence.

Share

Tempo di lettura: 4 minuti

Every day new deep learning algorithms and models come out to answer new and old problems. In the articles Deep learning: Supervised learning [part 1]Deep learning: Supervised learning [part 2] etc, we examined a small subset of problems that deep learning can address. The techniques available to us can be many and more or less suitable for our problems. It is up to us to figure out which methodology to use in the specific context. But have all these techniques been developed only in the last few years? Many that we have histo indeed yes, but their roots come from much further back. In this article we will study a bit of the history of data analysis that underlies the techniques we use today.

From the Middle Ages to the 19th century

The desire to analyze data and predict future outcomes has always been present in humans and is the basis of much of the natural sciences and mathematics. Two examples are the Bernoulli distribution, named after Jacob Bernoulli (1655-1705), and the Gaussian distribution discovered by Carl Friedrich Gauss (1777-1855). Gauss invented, for example, the least mean squares algorithm, which is still used today for a multitude of problems, from insurance calculations to medical diagnostics. Such tools have improved the experimental approach in the natural sciences: for example, Ohm’s law relating current and voltage in a resistor is perfectly described by a linear model.

As early as the Middle Ages, mathematicians had a keen insight into estimates. For example, Jacob Köbel’s (1460-1533) geometry book illustrates the average foot length of 16 adult men to estimate the typical foot length in the population.

What is shown in the illustration is the experiment that was done by Köbel. On leaving a church, a group of 16 adult men were asked to line up and have their feet measured. The sum of these measurements was then divided by 16 to obtain an estimate of what is now called the foot. This “algorithm” was later improved to handle deformed feet: the two men with the shortest and longest feet were sent away, averaging only the remaining ones. This is one of the earliest examples of truncated mean estimation.

The 20th century

Statistics took off with the availability and collection of data. One of its pioneers, Ronald Fisher (1890-1962), contributed significantly to its theory and applications in genetics. Many of his algorithms (such as linear discriminant analysis) and concepts (such as Fisher’s information matrix) still occupy a prominent place in the foundations of modern statistics. His data resources have also had a lasting impact. The Iris dataset that Fisher published in 1936 is still used today to demonstrate machine learning algorithms. Fisher was also an advocate of eugenics, which should remind us that the morally questionable use of data science has as long and enduring a history as its productive use in industry and the natural sciences.

Other influences for machine learning come from the information theory of Claude Shannon (1916-2001) and the theory of computation proposed by Alan Turing (1912-1954). Turing posed the question “can machines think?” in his famous article Computing Machinery and Intelligence (Turing, 1950). Describing what is now known as the Turing test, he proposed that a machine can be considered intelligent if it is difficult for a human evaluator to distinguish machine responses from human responses, based on purely textual interactions.

Further influences came from neuroscience and psychology. After all, humans clearly exhibit intelligent behavior. Many scholars have wondered whether it is possible to explain and possibly decode this ability. One of the first biologically inspired algorithms was formulated by Donald Hebb (1904-1985). In his groundbreaking book The Organization of Behavior (Hebb, 1949), he stated that neurons learn through positive reinforcement. This principle became known as the Hebbian learning rule. These ideas inspired later work, such as Rosenblatt’s perceptron learning algorithm, and laid the foundation for many stochastic gradient descent algorithms that today form the basis of deep learning: reinforcing desirable behavior and decreasing undesirable behavior to achieve good parameter settings in a neural network.

Biological inspiration is what gave neural networks their name. For more than a century (beginning with the models of Alexander Bain, 1873, and James Sherrington, 1890), researchers have sought to assemble computational circuits that resemble networks of interacting neurons. Over time, the interpretation of biology has become less literal, but the name has remained. Underlying it are some key principles found in most networks today:

  • The alternation of linear and nonlinear processing units, often referred to as layers.
  • The use of the chain rule (also known as backpropagation) to adjust the parameters of the entire network at once.

After rapid initial progress, neural network research came to a standstill from about 1995 to 2005. This was mainly due to two reasons. First, training a network is computationally expensive. While random access memory was abundant at the end of the last century, computational power was scarce. Second, the datasets were relatively small. In fact, Fisher’s Iris dataset of 1936 was still a popular tool for testing the effectiveness of algorithms. The MNIST dataset, with its 60,000 handwritten digits, was considered huge.

Given the scarcity of data and computation, strong statistical tools such as kernel methods, decision trees, and graphical models proved empirically superior in many applications. Moreover, unlike neural networks, they did not require weeks of training and provided predictable results with strong theoretical guarantees.

More To Explore

Artificial intelligence

Sentiment Analysis & Topic Modeling: What Your Customers Really Mean

You have 200 reviews, 500 support tickets, 1,000 social media comments. Reading them all would take days — and you’d still miss the most important patterns. Sentiment Analysis and Topic Modeling solve exactly this: in ten minutes you get the emotional tone of every text, recurring themes grouped automatically, and a strategic summary that manual reading would never have produced.

Artificial intelligence

Multimodal AI: Analyze PDFs, Images and Documents with Claude, GPT-4 and Gemini

AI no longer reads only text. Claude summarizes a 10-page quote in 30 seconds. GPT-4 Vision transcribes data from a dashboard screenshot into a ready-to-use table. Gemini 1.5 Pro navigates 1,000-page documents citing the sources. This guide shows how they work, when to use which tool, and where the time savings are measurable — with real screenshots from live sessions.

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

Progetta con MongoDB!!!

Acquista il nuovo libro che ti aiuterà a usare correttamente MongoDB per le tue applicazioni. Disponibile ora su Amazon!