This tutorial is not about math methods and models. I'll focus on the environment for learning ML
Learning process is a hard way. And there are many factors that got influence on this process. Your mood often make big sense.
And in self-study programms motivation and attitude - your main friends.
Now there will be a bit of philosophy, and I ask you don't scroll to the Technical part. There will be some big ideas.
Folks that came over Tutorials in this session are already definitely strong guys and knows a lot.
I think that this notebook (maybe an article) will be more useful for the next session patients. А lot of tutorials from the previous course played an important role for me.
Why do I think that this course is the best of the best? Two things. First - excellent presentation of the material. And the second is - mechanism of motivation. It is awesome. COMPETITIONS. The competitive process gives a rise to the best decisions and motivates the participants. So it was with me.
But, what if the new feature for regression no longer come to mind
And Medium-competition has already killed the kernel on Google Colab for several times. Your computer becomes a brick?
At the same time, a guy (or girl) from the other end of the world successfully hits the baselines and goes to the top 1%. So come on a piece of iron! Let's try to tune some parameters of the model!
Restarting. Oh sh*t...I will go to sleep, Tomorrow I will optimize:
Forget it
Look if there will be a gain in cross-validation by adding Alice’s time spent on YouTube.
Both of them takes time, which directly (almost always) depends on the hardware which you launching your model.
If you also discovered the world of gradient boostings
I have an elderly laptop with i5 and 4 GB RAM. My brain is faster than him sometimes.
I was annoyed by his long freezes on training of the model or the memory error in challenge of beating the baselines in this course.
The first variant:
Buy a new hardware. What we have there, oh yes:
ssd disk and so on...
Stop-stop, I'm just learning ...
Option 2:
Сode or computing optimization. A good choice, the right groundwork for the future.
But deadlines-work-study-family... where to find time for all of this?
Plus, I spend a lot of time in transport (metro-train-transfer) and I want to spend my time effectively.
In addition, good idea often comes in the most unexpected places and I want to immediately test it.
In fact, there are many companies whom offer resources for cloud computing. An important aspect for me is that the service must by free of charge, flexy and completeness of the opportunities provided. Therefore, I've choosed Google.
On light computations you can work with Colab or Kaggle Kernels. But there is limit of the memory.
There may be several reasons. The most often is a lack of computing resources (memory). And the most annoying thing is that all cells now need to be restarted. When you save the resulting objects in txt or pickle you are the winner. But only practice makes perfect. At the beginning of the learning path, such restarts of the kernel can, in the first place, enrage. Secondly, it takes a lot of time.
For now we going to use of a nice gift from Google in the amount of 300\$.
What you need to start:
Register account (interesting, how many of you, dear readers, doesn’t have it yet?)
There is nothing complicated, everything is standard. Mail and phone confirmation. Specify a real phone number it will be used for billing confirmation.
Let's go on https://console.cloud.google.com/freetrial/signup
You'll see that they give us 300\$ and 12 months. I note that my account decreased by only 10.53 \$ during this course from Professor Yorko and his colleagues.
You can safely confirm the card (system will block $ 1 on it for one hour in average)
#### Create a project.
#### The most interesting part starts now.
Open the hardware shop (or rental).
Fill the power?!
#### Let's create new instance! Take more powerful. 8 CPU and 52 GB RAM.
I think there is no point for my instance to take the power of the GPU. Because neural networks in this course are not considered. And if you want - there is a free GPU in Google Colab. But some of the boosting libraries are now supports GPU. Therefore, keep a close eye on the changes and choose the parameters of the instance based on your needs.
I felt an urgent need for additional resources when I started using XGBOOST, the documentation for it says that there is support for the GPU, but when I started on the Colab GPUs, it began to kill the kernels.
#### Boot Disk change to Ubuntu 16.04. But it is for your choice
#### Configure the firewall and remove the checkbox from "Disks" tab.
#### Press a create button - and voila - we have a strong hardware to ride with.
But wait a minute. You'll need to set a static IP adress:
#### Go to the VPC Network - Extrenal IP Addresses. Change it to Static.
#### Ok. Next step - tune Firewall settings - Firewall rules. Add one new.
#### Connect with instance by SSH.
#### Let's install Anaconda
wget http://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
bash Anaconda3-5.0.1-Linux-x86_64.sh
Read the rules, review and run the installation.
jupyter notebook --generate-config
nano ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8000
Jupyter notebook --ip=0.0.0.0
"StaticIP" - External IP of instance
"Port"= 8000 (tcp port that we allowed in Firewall )
Token- you'll see it in terminal
All the necessary libraries can be installed by run the following command.
! pip install lightGBM
Let's see what's under the hood:
!cat /proc/cpuinfo
There are 8 pcs like this
!cat /proc/meminfo
Not bad, it's time to ride on
!mkdir -p ~/.kaggle
- Take the json from here: https://www.kaggle.com/"YourAccountName"/account
And put it here:
Copy to the "kaggle" directory:
!cp kaggle.json ~/.kaggle/
Now you can download/send files from Kaggle. And it very fast
!mkdir Medium
import os os.chdir('Medium')
!kaggle competitions download -c how-good-is-your-medium-article.
The same dataset, the same code. Look at the time:
Fine.
!git clone https://github.com/mikhailsergeevi4/crm.git
Commit
And push them back.
!git push origin master
There are many other useful tricks about Google Cloud and Colab. But I think it's enough for this time
As a result, you get a powerful machine for study ML.
Learn, or work, or make presentations of your notebook from any device.
Presented 300$ is enough to complete the course. I've used instances mostly for high resource operations. For simple things, there is Google Colab (with a free video card, by the way)
IMPORTANT: don't forget to stop the instance when you finished. This machine will quickly devour the allocated budget!