What's under your hood?

This tutorial is not about math methods and models. I'll focus on the environment for learning ML

Learning process is a hard way. And there are many factors that got influence on this process. Your mood often make big sense.
And in self-study programms motivation and attitude - your main friends.

Let's start.

Now there will be a bit of philosophy, and I ask you don't scroll to the Technical part. There will be some big ideas.

Folks that came over Tutorials in this session are already definitely strong guys and knows a lot.

I think that this notebook (maybe an article) will be more useful for the next session patients. А lot of tutorials from the previous course played an important role for me.

Why do I think that this course is the best of the best? Two things. First - excellent presentation of the material. And the second is - mechanism of motivation. It is awesome. COMPETITIONS. The competitive process gives a rise to the best decisions and motivates the participants. So it was with me.

But, what if the new feature for regression no longer come to mind

enter image description here

And Medium-competition has already killed the kernel on Google Colab for several times. Your computer becomes a brick? enter image description here

At the same time, a guy (or girl) from the other end of the world successfully hits the baselines and goes to the top 1%. So come on a piece of iron! Let's try to tune some parameters of the model! enter image description here Restarting. Oh sh*t...I will go to sleep, Tomorrow I will optimize:

  • [ ] code
  • [ ] calculation
  • [ ] hardware
  • [ ] brain

Forget it

  • Try out a new approach (hello n-gramm in td-idf).
  • Look if there will be a gain in cross-validation by adding Alice’s time spent on YouTube.

    Both of them takes time, which directly (almost always) depends on the hardware which you launching your model.

If you also discovered the world of gradient boostings

enter image description here

Options

I have an elderly laptop with i5 and 4 GB RAM. My brain is faster than him sometimes.

I was annoyed by his long freezes on training of the model or the memory error in challenge of beating the baselines in this course.

What I can do:

  1. The first variant:

    Buy a new hardware. What we have there, oh yes:

  • I7-8700k 400 \$
  • 1080TI 1000 \$
  • 64 GB RAM = in about 500 \$
  • motherboard
  • ssd disk and so on...

     Stop-stop, I'm just learning ...
  1. Option 2:

    Сode or computing optimization. A good choice, the right groundwork for the future. But deadlines-work-study-family... where to find time for all of this? Plus, I spend a lot of time in transport (metro-train-transfer) and I want to spend my time effectively. In addition, good idea often comes in the most unexpected places and I want to immediately test it.

  2. From the above, a third option appears - take a strong hardware in the cloud. And connect with it from any device, always having a powerful beast under the hood. "A rom dom dom" NFS Underground OST.

Google Compute Engine.

In fact, there are many companies whom offer resources for cloud computing. An important aspect for me is that the service must by free of charge, flexy and completeness of the opportunities provided. Therefore, I've choosed Google.

On light computations you can work with Colab or Kaggle Kernels. But there is limit of the memory.

enter image description here

There may be several reasons. The most often is a lack of computing resources (memory). And the most annoying thing is that all cells now need to be restarted. When you save the resulting objects in txt or pickle you are the winner. But only practice makes perfect. At the beginning of the learning path, such restarts of the kernel can, in the first place, enrage. Secondly, it takes a lot of time.

For now we going to use of a nice gift from Google in the amount of 300\$.

Technical part

What you need to start:

  1. Register account (interesting, how many of you, dear readers, doesn’t have it yet?) enter image description here There is nothing complicated, everything is standard. Mail and phone confirmation. Specify a real phone number it will be used for billing confirmation.

  2. Let's go on https://console.cloud.google.com/freetrial/signup enter image description here You'll see that they give us 300\$ and 12 months. I note that my account decreased by only 10.53 \$ during this course from Professor Yorko and his colleagues. You can safely confirm the card (system will block $ 1 on it for one hour in average)

  1. Jump to the https://console.cloud.google.com/compute/

  2. #### Create a project.

  3. #### The most interesting part starts now. enter image description here Open the hardware shop (or rental).

    Fill the power?!

  4. #### Let's create new instance! Take more powerful. 8 CPU and 52 GB RAM.

    I think there is no point for my instance to take the power of the GPU. Because neural networks in this course are not considered. And if you want - there is a free GPU in Google Colab. But some of the boosting libraries are now supports GPU. Therefore, keep a close eye on the changes and choose the parameters of the instance based on your needs.

I felt an urgent need for additional resources when I started using XGBOOST, the documentation for it says that there is support for the GPU, but when I started on the Colab GPUs, it began to kill the kernels.

  • #### Boot Disk change to Ubuntu 16.04. But it is for your choice enter image description here

  • #### Configure the firewall and remove the checkbox from "Disks" tab. enter image description here

  • #### Press a create button - and voila - we have a strong hardware to ride with.

But wait a minute. You'll need to set a static IP adress:

  • #### Go to the VPC Network - Extrenal IP Addresses. Change it to Static. enter image description here enter image description here

  • #### Ok. Next step - tune Firewall settings - Firewall rules. Add one new. enter image description here

  • #### Connect with instance by SSH. enter image description here

  • #### Let's install Anaconda enter image description here

wget http://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh


bash Anaconda3-5.0.1-Linux-x86_64.sh

Read the rules, review and run the installation.

  • Now we will make the config, and open a port for tcp connection.

jupyter notebook --generate-config

nano ~/.jupyter/jupyter_notebook_config.py

  • Add this lines. Or search and uncomment existing.

c.NotebookApp.ip = '*'

c.NotebookApp.open_browser = False

c.NotebookApp.port = 8000

  • Lunch Jupyter

Jupyter notebook --ip=0.0.0.0

  • To see the usual Jupyter main page - launch the browser and go to:

https://"StaticIP":"Port"/?token=YOUR_TOKEN

"StaticIP" - External IP of instance
"Port"= 8000 (tcp port that we allowed in Firewall )
Token- you'll see it in terminal

You are in business.

All the necessary libraries can be installed by run the following command.

! pip install lightGBM

Let's see what's under the hood:

!cat /proc/cpuinfo

enter image description here There are 8 pcs like this

!cat /proc/meminfo

enter image description here Not bad, it's time to ride on

Kaggel API

  • Make dir for token:

!mkdir -p ~/.kaggle

  • And put it here: enter image description here

  • Copy to the "kaggle" directory:

!cp kaggle.json ~/.kaggle/

Now you can download/send files from Kaggle. And it very fast

  • Download the Medium Competition dataset Make new dir for Medium

!mkdir Medium

  • Change os dir for easy access to files

import os os.chdir('Medium')

  • Start downloading

!kaggle competitions download -c how-good-is-your-medium-article.

Now about the speed.

The same dataset, the same code. Look at the time:

Google Cloud

enter image description here

My Laptop enter image description here

Fine.

Git

  • By the way you can easily clone your repositories from github.

!git clone https://github.com/mikhailsergeevi4/crm.git

  • Commit

  • And push them back.

!git push origin master

There are many other useful tricks about Google Cloud and Colab. But I think it's enough for this time

As a result, you get a powerful machine for study ML.

Learn, or work, or make presentations of your notebook from any device.

Presented 300$ is enough to complete the course. I've used instances mostly for high resource operations. For simple things, there is Google Colab (with a free video card, by the way)

IMPORTANT: don't forget to stop the instance when you finished. This machine will quickly devour the allocated budget! enter image description here