Tutorial: Running Python in Galileo

Written and developed by Matthew Gasperetti

Tutorial: Running Python in Galileo

Written and developed by Matthew Gasperetti

Getting started with Python in Galileo

To get started with Galileo log into your account using Firefox or Chrome, and download our Python example file from GitHub

The downloaded file consists of a .py file, a .csv file, and a Dockerfile. We’ll try running this folder in Galileo, first, and then take a look at what’s happening behind the scenes. 

Let’s have a look at our files and then drag and drop the python_example folder to Galileo.

Our python_example folder contains three files named python_example.py, mtcars.csv, and Dockerfile. The example_python.py script conducts a linear regression, makes two simple plots, and then runs a Monte Carlo simulation. The Monte Carlo simulates tossing a die 10 million times and calculates the ratio of rolls that equal six.

When you log into Galileo, the first thing you’ll see is your Dashboard:

View of the Galileo Dashboard

To run the python_example.py file, drag and drop the python_example folder you downloaded from our GitHub to the Galilei station at the top of the Dashboard:

Drag and drop the python_example folder to the Galilei station

After you drag and drop the python_example folder to Galileo, you’ll be able to see the job running in the Your Recent Jobs panel:

The job only takes seconds to complete – try running it locally and comparing

When the example job completes, hit the Download button under “Action” to download the results:

Download button

The results folder will be downloaded as a .zip that contains an output.log file returning the results of the analysis and a folder called filesys where any plots or other files that were created by the analysis are stored.

The Downloaded .zip file contains a folder called filesys and a file called output.log

Let’s take a look at the output.log file first, which returns the results of the regression we ran and the results of our Monte Carlo simulations:

The results of our regression analysis and Monte Carlo experiments

Next, if we look in the filesys folder, we can see the plots we made:

Regression Plot: MPG vs. Weight

Getting started with Python in Galileo

To get started with Galileo log into your account using Firefox or Chrome, and download our Python example file from GitHub

The downloaded file consists of a .py file, a .csv file, and a Dockerfile. We’ll try running this folder in Galileo, first, and then take a look at what’s happening behind the scenes. 

Let’s have a look at our files and then drag and drop the python_example folder to Galileo.

Our python_example folder contains three files named python_example.pymtcars.csv, and Dockerfile. The example_python.py script conducts a linear regression, makes two simple plots, and then runs a Monte Carlo simulation. The Monte Carlo simulates tossing a die 10 million times and calculates the ratio of rolls that equal six.

When you log into Galileo, the first thing you’ll see is your Dashboard:

View of the Galileo Dashboard

To run the python_example.py file, drag and drop the python_example folder you downloaded from our GitHub to the Galilei station at the top of the Dashboard:

Drag and drop the python_example folder to the Galilei station

After you drag and drop the python_example folder to Galileo, you’ll be able to see the job running in the Your Recent Jobs panel:

The job only takes seconds to complete – try running it locally and comparing

When the example job completes, hit the Download button under “Action” to download the results:

Download button

The results folder will be downloaded as a .zip that contains an output.log file returning the results of the analysis and a folder called filesys where any plots or other files that were created by the analysis are stored.

The Downloaded .zip file contains a folder called filesys and a file called output.log

Let’s take a look at the output.log file first, which returns the results of the regression we ran and the results of our Monte Carlo simulations:

The results of our regression analysis and Monte Carlo experiments

Next, if we look in the filesys folder, we can see the plots we made:

Regression Plot: MPG vs. Weight

Running your own Python files in Galileo—A closer look at how it works

A closer study of the files in our python_example folder will help illustrate how to modify them so we can run other jobs. After that, we’ll have a look at the Galileo Docker Wizard, which helps automate the process. 

How to code the Dockerfile

Let’s quickly review the example Dockerfile, which you can open with a text editor like Atom.

The first thing to notice is that the file is called Dockerfile with no extension. It cannot be called anything else—Dockerfile2, Dockerfile copy, or Dockerfile.txt won’t work.

Looking at the Dockerfile with our text editor, the first Docker command we see is:

FROM tensorflow/tensorflow:latest-py3

This tells Docker how to setup a Python3 environment. We want to leave it as is. 

Next, we see the Python command pip3 install python command wrapped in a Docker RUN command:

RUN pip3 install pandas
RUN pip3 install numpy
RUN pip3 install matplotlib
RUN pip3 install seaborn
RUN pip3 install statsmodels
RUN pip3 install patsy

These commands install your Python packages to the Docker container we are creating. This can take a while with packages that are large, but once the package is installed and the Docker container is built, it will run quickly the next time.

To install another package, add an additional RUN command. For example, you could add the package scipy to the bottom of our list like this:

RUN pip3 install patsy
RUN pip3 install scipy

Now that we understand installing packages, let’s look at the next line of code we see in our Docker file:

COPY . .

This tells Docker where to look for, and where to save, our files and should be left as is.

The final command is:

ENTRYPOINT [“python3”,“python_example.py”]

This tells Docker that we are running a python3 script called python_example.py. To run a python3 script called my_project.py, we’d use the following code:

ENTRYPOINT [“python3”,“my_project.py”]

Here is the Dockerfile from the python_example folder in its entirety with comments:

##The line below determines the build image to use
FROM python:latest
#The next block determines what dependencies to load<
RUN pip3 install pandas
RUN pip3 install numpy
RUN pip3 install matplotlib
RUN pip3 install seaborn
RUN pip3 install statsmodels
RUN pip3 install patsy
#This line determines where to copy project files from, and where to copy them to
COPY . .
#The entrypoint is the command used to start your project
ENTRYPOINT [“python3″,”python_example.py”]

Now, Let’s have a look at our Python script

The python_example.py should look familiar. We import our dependencies and write out our code just like we would locally.

However, one important thing to note is that we read in the dataset we are using, mtcars.csv, like it is in our working directory with the following command:

d1 = pd.read_csv(“mtcars.csv”)

Notice the path is relative not absolute. The code below will NOT work and will cause an error:

d1 = pd.read_csv (“/Users/Matthew/mtcars.csv”)

Let’s look at the dataset next

Similarly, our dataset is just the standard mtcars.csv dataset that is used in a lot of applied examples and tutorials. There’s nothing special here. I just want to show how to call data correctly from your python script. 

Running your own Python files in Galileo—A closer look at how it works

A closer study of the files in our python_example folder will help illustrate how to modify them so we can run other jobs. After that, we’ll have a look at the Galileo Docker Wizard, which helps automate the process. 

How to code the Dockerfile

Let’s quickly review the example Dockerfile, which you can open with a text editor like Atom.

The first thing to notice is that the file is called Dockerfile with no extension. It cannot be called anything else—Dockerfile2, Dockerfile copy, or Dockerfile.txt won’t work.

Looking at the Dockerfile with our text editor, the first Docker command we see is:

FROM tensorflow/tensorflow:latest-py3

This tells Docker how to setup a Python3 environment. We want to leave it as is. 

Next, we see the Python command pip3 install python command wrapped in a Docker RUN command:

RUN pip3 install pandas
RUN pip3 install numpy
RUN pip3 install matplotlib
RUN pip3 install seaborn
RUN pip3 install statsmodels
RUN pip3 install patsy

These commands install your Python packages to the Docker container we are creating. This can take a while with packages that are large, but once the package is installed and the Docker container is built, it will run quickly the next time.

To install another package, add an additional RUN command. For example, you could add the package scipy to the bottom of our list like this:

RUN pip3 install patsy
RUN pip3 install scipy

Now that we understand installing packages, let’s look at the next line of code we see in our Docker file:

COPY . .

This tells Docker where to look for, and where to save, our files and should be left as is.

The final command is:

ENTRYPOINT [“python3”,“python_example.py”]

This tells Docker that we are running a python3 script called python_example.py. To run a python3 script called my_project.py, we’d use the following code:

ENTRYPOINT [“python3”,“my_project.py”]

Here is the Dockerfile from the python_example folder in its entirety with comments:

##The line below determines the build image to use
FROM python:latest
#The next block determines what dependencies to load<
RUN pip3 install pandas
RUN pip3 install numpy
RUN pip3 install matplotlib
RUN pip3 install seaborn
RUN pip3 install statsmodels
RUN pip3 install patsy
#This line determines where to copy project files from, and where to copy them to
COPY . .
#The entrypoint is the command used to start your project
ENTRYPOINT [“python3″,”python_example.py”]

Now, Let’s have a look at our Python script

The python_example.py should look familiar. We import our dependencies and write out our code just like we would locally.

However, one important thing to note is that we read in the dataset we are using, mtcars.csv, like it is in our working directory with the following command:

d1 = pd.read_csv(“mtcars.csv”)

Notice the path is relative not absolute. The code below will NOT work and will cause an error:

d1 = pd.read_csv (“/Users/Matthew/mtcars.csv”)

Let’s look at the dataset next

Similarly, our dataset is just the standard mtcars.csv dataset that is used in a lot of applied examples and tutorials. There’s nothing special here. I just want to show how to call data correctly from your python script. 

Using the Docker Wizard to create your own project

If you drag and drop a folder to Galileo that only contains a .py file and a .csv file, but no Dockerfile, you will see a Docker Wizard prompt:

The Docker Wizard helps automate creating a Docker file

To create a Docker file for python script called my_project.py that installs the packages pandasnumpy, and scipy, enter the following settings into the Docker Wizard:

An example showing how to use Galileo’s Docker Wizard

Once you complete your custom Dockerfile, make sure to add it to the project folder containing your my_project.py script and your data. Your folder should look like this:

Your project folder should contain your python script, your data, and your Dockerfile

Now that your folder looks right, drag and drop the folder onto Galilei on your Dashboard at https://app.galileoapp.io.

Using the Docker Wizard to create your own project

If you drag and drop a folder to Galileo that only contains a .py file and a .csv file, but no Dockerfile, you will see a Docker Wizard prompt:

The Docker Wizard helps automate creating a Docker file

To create a Docker file for python script called my_project.py that installs the packages pandasnumpy, and scipy, enter the following settings into the Docker Wizard:

An example showing how to use Galileo’s Docker Wizard

Once you complete your custom Dockerfile, make sure to add it to the project folder containing your my_project.py script and your data. Your folder should look like this:

Your project folder should contain your python script, your data, and your Dockerfile

Now that your folder looks right, drag and drop the folder onto Galilei on your Dashboard at https://app.galileoapp.io.

I hope this tutorial was helpful. Please let me know if you have any questions or any problems using Galileo. Your feedback is extremely important to us. Contact me anytime at matthew@hypernetlabs.io.