mlcourse.ai – Open Machine Learning Course

Author: Yury Kashnitsky (@yorko). Edited by Roman Volykhin (@GerrBert). This material is subject to the terms and conditions of the Creative Commons CC BY-NC-SA 4.0 license. Free use is permitted for any non-commercial purpose

Fall 2019. Quiz 1. Decision trees and Random Forests

Prior to working on the assignment, you'd better check out the corresponding course material:

  1. Classification, Decision Trees and k Nearest Neighbors, the same as an interactive web-based Kaggle Kernel
  2. Ensembles:
  3. There are 5 video lectures on trees, forests and their applications: mlcourse.ai/lectures

We suggest that you first read the articles (quiz questions are based on them), if something is not clear - watch thr corresponding lecture.

Your task is to:

  1. study the materials
  2. write code where needed
  3. choose answers in the webform.

Deadline for Quiz: 2019 September 27, 20:59 CET (London time)

Solutions will be discussed during a live YouTube session on September 28. You can get up to 10 credits (those points in a web-form, 15 max, will be scaled to a max of 10 credits).

Part 1. Decision trees

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #quiz1_fall2019

Question 1. Which of these problems does not fall into 3 main types of ML tasks: classification, regression, and clustering?

  1. Identifying a topic of a live-chat with a customer
  2. Grouping news into topics
  3. Predicting LTV (Life-Time Value) - the amount of money spent by a customer in a certain large period of time
  4. Listing top products that a user is prone to buy (based on his/her click history)

Question 2. Maximal possible entropy is achieved when all states are equally probable (prove it yourself for a system with 2 states with probabilities $p$ and $1-p$). What's the maximal possible entropy of a system with N states? (here all logs are with base 2)

  1. $N \log N$
  2. $-\log N$
  3. $\log N$
  4. $-N \log N$

Question 3. In Topic 3 article, toy example with 20 balls, what's the information gain of splitting 20 balls in 2 groups based on the condition X <= 8?

  1. ~ 0.1
  2. ~ 0.01
  3. ~ 0.001
  4. ~ 0.0001

Question 4. In a toy binary classification task, there are $d$ features $x_1 \ldots x_d$, but target $y$ depends only on $x_1$ and $x_2$: $y = [\frac{x_1^2}{4} + \frac{x_2^2}{9} \leq 16]$, where $[\cdot]$ is an indicator function. All of features $x_3 \ldots x_d$ are noisy, i.e. do not influence the target feature at all. Obviously, machine learning algorithms shall perform almost perfectly in this task, where target is a simple function of input features. If we train sklearn's DecisionTreeClassifier for this task, which parameters have crucial effect on accuracy (crucial - meaning that if these parameters are set incorrectly, then accuracy can drop significantly)? Select all that apply (to get credits, you need to select all that apply, no partially correct answers).

  1. max_features
  2. criterion
  3. min_samples_leaf
  4. max_depth

Question 5. Load iris data with sklearn.datasets.load_iris. Train a decision tree with this data, specifying params max_depth=4 and random_state=17 (all other arguments shall be left unchanged). Use all available 150 instances to train a tree (do not perform train/validation split). Visualize the fitted decision tree, see topic 3 for examples. Let's call a leaf in a tree pure if it contains instances of only one class. How many pure leaves are there in this tree?

  1. 6
  2. 7
  3. 8
  4. 9

Part 2. Ensembles and Random Forest

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #quiz1_fall2019

Question 6. There are 7 jurors in the courtroom. Each of them individually can correctly determine whether the defendant is guilty or not with 80% probability. How likely is the jury will make a correct verdict jointly if the decision is made by majority voting?

  1. 20.97%
  2. 80.00%
  3. 83.70%
  4. 96.66%

Question 7. In Topic 5, part 2, section 2. "Comparison with Decision Trees and Bagging" we show how bagging and Random Forest improve classification accuracy as compared to a single decision tree. Which of the following is a better explanation of the visual difference between decision boundaries built by a single desicion tree and those built by ensemble models?

  1. Ensembles ignore some of the features. Thus picking only important ones, they build a smoother decision boundary
  2. Some of the classification rules built by a decision tree can be applied only to a small number of training instances
  3. When fitting a decision tree, if two potential splits are equally good in terms of information criterion, then a random split is chosen. This leads to some randomness in building a decision tree. Therefore its decision boundary is so jagged

Question 8. Random Forest learns a coefficient for each input feature, which shows how much this feature influences the target feature. True/False?

  1. True
  2. False

Question 9. Suppose we fit RandomForestRegressor to predict age of a customer (a real task actually, good for targeting ads), and the maximal age seen in the dataset is 98 years. Is it possible that for some customer in future the model predicts his/her age to be 105 years?

  1. Yes
  2. No

Question 10. Select all statements supporting advantages of Random Forest over decision trees (some statements might be true but not about Random Forest's pros, don't select those).

  1. Random Forest is easier to train in terms of computational resources
  2. Random Forest typically requires more RAM than a single decision tree
  3. Random Forest typically achieves better metrics in classification/regression tasks
  4. Single decision tree's prediction can be much easier interpreted