#!/usr/bin/env python # coding: utf-8 #
# # # ## [mlcourse.ai](https://mlcourse.ai) - Open Machine Learning Course # # # Author: Vitaly Radchenko. All content is distributed under the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. # #
Assignment # 5 (demo)
# ##
Logistic Regression and Random Forest in the credit scoring problem
# # **Same assignment as a [Kaggle Kernel](https://www.kaggle.com/kashnitsky/a5-demo-logit-and-rf-for-credit-scoring-sol) + [solution](https://www.kaggle.com/kashnitsky/a5-demo-logit-and-rf-for-credit-scoring-sol).** # In this assignment, you will build models and answer questions using data on credit scoring. # # Please write your code in the cells with the "Your code here" placeholder. Then, answer the questions in the [form](https://docs.google.com/forms/d/1gKt0DA4So8ohKAHZNCk58ezvg7K_tik26d9QND7WC6M/edit). # # Let's start with a warm-up exercise. # **Question 1.** There are 5 jurors in a courtroom. Each of them can correctly identify the guilt of the defendant with 70% probability, independent of one another. What is the probability that the jurors will jointly reach the correct verdict if the final decision is by majority vote? # # 1. 70.00% # 2. 83.20% # 3. 83.70% # 4. 87.50% # Great! Let's move on to machine learning. # # ## Credit scoring problem setup # # #### Problem # # Predict whether the customer will repay their credit within 90 days. This is a binary classification problem; we will assign customers into good or bad categories based on our prediction. # # #### Data description # # | Feature | Variable Type | Value Type | Description | # |:--------|:--------------|:-----------|:------------| # | age | Input Feature | integer | Customer age | # | DebtRatio | Input Feature | real | Total monthly loan payments (loan, alimony, etc.) / Total monthly income percentage | # | NumberOfTime30-59DaysPastDueNotWorse | Input Feature | integer | The number of cases when client has overdue 30-59 days (not worse) on other loans during the last 2 years | # | NumberOfTimes90DaysLate | Input Feature | integer | Number of cases when customer had 90+dpd overdue on other credits | # | NumberOfTime60-89DaysPastDueNotWorse | Input Feature | integer | Number of cased when customer has 60-89dpd (not worse) during the last 2 years | # | NumberOfDependents | Input Feature | integer | The number of customer dependents | # | SeriousDlqin2yrs | Target Variable | binary: