Tutorial: Docker Anaconda Python -- 4
Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications (large-scale data processing, predictive analytics, scientific computing), that aims to simplify package management and deployment. Package versions are managed by the package management system conda. The Anaconda distribution is used by over 6 million users, and it includes more than 250 popular data science packages suitable for Windows, Linux, and MacOS.
Anaconda contains over 1,000 data packages and 100s of packages. This page lists all the packages contained in the Mac OSX Anaconda package, 598 in total.
The main advantage of Anaconda, is that all the packages it contains are version-compatible with each other, and, if installed in a container, will not interfere with any of the packages installed on your host computer (laptop). However, such a container will be huge. The one we will create in the exercise below is 9.7 GB in length. That is a significant amount of space.
Exercise 5: Creating a Container with Anaconda
towardsDataScience: Not Reinventing the Wheel
We could create a simple Docker file based on the latest Ubuntu container, and install Anaconda, and then refine the installation. Instead, we'll use one of the Anaconda Docker public images that have already been created, and either download it or recreate it on our system.
We found https://towardsdatascience.com/docker-for-data-science-9c0ce73e8263 towards-data-science] to provide a solid container, and we will use their product here.
Option 1: Pull the Public towardsdatascience Image
The fastest option (relatively) is to pull their public Docker image:
We can pull their public image from the Docker Hub. You need to login to the Docker Hub from the command line first.
docker login docker pull evheniy/docker-data-science Using default tag: latest latest: Pulling fromevheniy/docker-data-science cc1a78bfd46b: Pull complete 314b82d3c9fe: Pull complete adebea299011: Pull complete f7baff790e81: Pull complete Digest: sha256:e07b9ca98ac1eeb1179dbf0e0bbcebd87701f8654878d6d8ce164d71746964d1 Status: Downloaded newer image for evheniy/docker-data-science:latest
This took almost 10 minutes on a MacBook Pro 2016 with Wifi connection.
Option 2: Dockerfile
The other option (which probably takes the same amount of time) is to recreate their Dockerfile in a directory of your choice, customize it if needed (such as adding a new user, for example), and building it. Here's our customized version of it.
# We will use Ubuntu for our image FROM ubuntu:latest # Updating Ubuntu packages RUN apt-get update && yes|apt-get upgrade RUN apt-get install -y emacs # Adding wget and bzip2 RUN apt-get install -y wget bzip2 # Add sudo RUN apt-get -y install sudo # Add user ubuntu with no password, add to sudo group RUN adduser --disabled-password --gecos '' ubuntu RUN adduser ubuntu sudo RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers USER ubuntu WORKDIR /home/ubuntu/ RUN chmod a+rwx /home/ubuntu/ #RUN echo `pwd` # Anaconda installing RUN wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh RUN bash Anaconda3-5.0.1-Linux-x86_64.sh -b RUN rm Anaconda3-5.0.1-Linux-x86_64.sh # Set path to conda #ENV PATH /root/anaconda3/bin:$PATH ENV PATH /home/ubuntu/anaconda3/bin:$PATH # Updating Anaconda packages RUN conda update conda RUN conda update anaconda RUN conda update --all # Configuring access to Jupyter RUN mkdir /home/ubuntu/notebooks RUN jupyter notebook --generate-config --allow-root RUN echo "c.NotebookApp.password = u'sha1:6a3f528eec40:6e896b6e4828f525a6e20e5411cd1c8075d68619'" >> /home/ubuntu/.jupyter/jupyter_notebook_config.py # Jupyter listens port: 8888 EXPOSE 8888 # Run Jupytewr notebook as Docker main process CMD ["jupyter", "notebook", "--allow-root", "--notebook-dir=/home/ubuntu/notebooks", "--ip='*'", "--port=8888", "--no-browser"]
The container will run with its internal /home/ubuntu/notebooks directory mounted with our local $PWD/notebooks directory.
docker build -t toward-data-science .
This will take quite a long time (around 15 minutes) and generate a huge log.
Running a Jupyter Notebook
We run the newly created container first:
docker run --name toward-data-science -p 8888:8888 --env="DISPLAY" \ -v "$PWD/notebooks:/home/ubuntu/notebooks" -d toward-data-science
Then we proceed to open the following URL on our browser: http://localhost:8888
Enter "root" as the password. Then create a simple notebook to test various libraries
This conclude this exercise.
Click here to go to Part 5 of this tutorial.