Nowadays, data has become critical to any business from business to research. In the last two years alone, nearly 90% of the data in circulation has been created. From content shared on social media to banking data, image and video archives, Gps signals, phone data and so on. These are just a few examples that make particularly clear the importance of a professional figure that has become necessary and is increasingly in demand in the job market: the Data Scientist. This figure represents the expert in data analysis, both structured and unstructured, who helps companies and/or research organizations to achieve precise goals.
Developing analytics pipelines requires specific skills in both problem analysis and programming. There are several programming languages available, but in recent years the one that has captured a good share of the market is Python. Just look at the number of searches for tutorials or code examples related to this language. They have grown more than 15% in the last 5 years as reported by PYPL (Popularity of Programming Language). The main reasons are related to the ease of learning the language itself and the wide range of libraries offered.
Regardless of the language preferred, a very common problem for Data Scientists is to have a simple and functional working environment for analysis. This environment must not be too difficult to set up for each project and must also allow to visualize intermediate results and to insert the related documentation. Among the various tools available we can only mention Jupyter Notebook.
Jupyter Notebook is an open source Web application that allows you to create and share interactive textual documents containing objects such as equations, graphs, and executable source code. Jupyter provides the ability to create, document and share data analyses that include for example:
- data cleaning and transformation
- numeric simulation
- statistical modeling
- data visualization
- machine learning.
It also supports over 40 programming languages, making it the ideal tool for any Data Scientist.
In this article we’ll see how to install Jupyter Notebook on your PC and how to take your first steps with this tool.
Although Jupyter can run code in several programming languages, Python is a requirement (Python 3.3 or higher, or Python 2.7) to install it.
The easiest way to get started with Jupyter Notebook is to install Anaconda. We recommend the individual version which is free. Paid versions are aimed at commercial product development or team use of the software.
Anaconda lets you start working on your projects without the hassle of managing countless installations or worrying about dependencies and installation problems specific to the operating system.
Once Anaconda is installed you can launch either from the command line or from the Juptyter Notebook application interface. Below is the interface of Anaconda Navigator available for Mac. In this case to launch the Notebook just click on the Launch button for Jupyter.
If you prefer to install Jupyter and the various packages manually, you can use the pip command.
First, you have to make sure you have the latest version of pip installed to avoid problems with some dependencies.
pip3 install --upgrade pip
Then you can install Jupyter Notebook using the following command.
pip3 install jupyter
Regardless of your choice to open Jupyter, a new tab will open in your default browser at http://localhost:8888/tree.
The dashboard that opens contains several tabs including:
- File: where all files and notebooks in the current directory are shown;
- Running: allows you to show all kernels currently running on your computer;
- Cluster: allows kernels to be started for parallel computing.
Creating a Notebook
After opening Jupyter we are ready to create our first notebook.
To create a notebook, all you have to do is click on the New button (top right) and a list of choices will open.
Since only Python 3 is present in the basic installation, it is only possible to create a Notebook that uses this language. Otherwise you can open a text file, a folder or a terminal. In this tutorial we choose Python 3. At this point a new browser tab will be opened.
The Notebook will be named Untitled. We recommend that you enter a name right at the beginning so as to avoid possible loss of work. To do this, just click on the name and the following window will appear. At this point you can enter the name you prefer.
The notebooks that are created will have the standard ipynb extension. To see the corresponding file, which can then be released to other people and/or collaborators, simply return to the home tab. In the current folder you will find the corresponding file.
Jupyter Notebook Interface
Each Jupyter Notebook features the same interface consisting of the following components:
- Menu bar: allows you to access different actions related to the notebook or kernel.
- Kernel: to the right of the menu bar, it indicates the process that runs an interactive session. When using IPython, this kernel is a Python process.
- Toolbar: contains icons for common actions.
Let’s look at these menus and the type of cell you can use below.
The menu bar, visible along the top of the notebook, contains the following items.
- File: allows you to create a new notebook or open an existing one. One of the most useful menu items is the Save and checkpoint option, which allows you to create checkpoints for possible rollbacks.
- Edit: is used to edit, cut, copy, paste cells, delete, split, join cells or rearrange cells. Only menu items that can be applied to the currently selected cell are enabled. For example, a code cell cannot have an image embedded in it, while a Markdown cell can.
- View: is used to enable or disable the visibility of the header and toolbar. You can also enable or disable line numbers within cells.
- Insert: inserts cells above or below the currently selected cell.
- Cell: allows you to execute a cell, a group of cells or all cells. It can also be used to change the type of a cell and clear the output of a cell.
- Kernel: is used to interact with the kernel running in the background such as restarting it, shutting it down or even changing it.
- Help: provides notebook keyboard shortcuts, a tour of the user interface, and lots of reference material.
The toolbar menu is a widely used menu for creating, copying or deleting new cells quickly.
Starting from the left we find the following icons:
- Save: allows you to save the notebook.
- Add: allows you to add a new cell in the position you prefer.
- Cut: allows you to delete a new cell in the chosen position.
- Copy: copies the code of one cell (or the text) to another.
- Duplicate: paste the code or text of the cell copied with the paste button.
- Up Arrow: moves a cell from a lower position to a higher position.
- Down Arrow: moves a cell from a higher position to a lower position.
- Run: executes the contents of a cell of type code. You can use the shift + enter command to execute the same command without clicking on this button.
- Stop: interrupts the kernel.
- Refresh: after a confirmation message it restarts the kernel, and the calculations performed up to that point are lost.
- Forward: The forward arrow allows you to restart the kernel and at the same time re-execute all cells within the kernel.
- Cell Type: indicates the type of cell you are using.
- Keyboard: clicking on the keyboard button opens a list of commands that can be chosen to edit text, change cell type, and a variety of other commands.
There are two main types of cells: Markdown cells and code cells.
A Markdown cell contains RTF text. In addition to the classic formatting options such as bold or italics, we can add links, images, HTML elements, mathematical equations, LaTeX, and more.
Usually this cell type is used to insert documentation for the Notebook and/or the code cells that follow.
A code cell, on the other hand, contains the code that is executed by the kernel. The peculiarity of this cell type is that they can be executed. To execute a cell, simply select the cell of interest and click the Run button or press shift + enter.
After executing a code cell, an output cell is displayed below the code cell, which contains the result from the cell you just executed. Note that output cells are only displayed if the written code produces an output. The results in the output cells cannot be changed.
There are also Heading and Raw NBConvert types. The first one, however, is deprecated and can be replaced with a Markdown cell. For details of the other type we refer to the official documentation.
Export a Jupyter file
To share a notebook, in addition to providing the ipynb file directly, you can export its contents in the following formats:
- Reveal JS
- Restructured Text
- Executable script
The export can be done directly from the File menu of the Notebook under Download. Otherwise you can use the following shell command to export the content in html format
jupyter nbconvert Tutorial.ipynb
Otherwise, if you want to export in pdf you have to execute the following command:
jupyter nbconvert Tutorial.ipynb –to PDFviaHTML
Downloading in the various formats, except HTML, requires the installation of some packages.
In this article we have seen how to install Jupyter Notebook and its main features.
An interesting aspect of notebooks is that you can share them to other users in different formats. In addition to the formats seen, it is possible to embed them in GitHub repositories or share them via Binder.