#!/usr/bin/env python # coding: utf-8 # [Table of Contents](00.00-Learning-ML.ipynb#Table-of-Contents) • [← *Chapter 1 - Introduction*](01.00-Introduction.ipynb) • [*Chapter 2.01 - Dummy Classifiers* →](02.01-Dummy-Classifiers.ipynb) # # --- # # # Chapter 2 - Classification # # ## Introduction to Classification # # As a supervised method, classification is the process of predicting a discrete class of an obversation. Classes can be binary (having only two values) or multi-class (having more than two values). # # Typically binary classes will be represented as 1 and 0, where the digits correspond to True and False, Yes and No, On and Off, Cat or Dog and so on. # # The model constructed for this process is called a *classifier*. As we will explore, a classifier can be constructed in many different ways - with some methods more suitable in certain scenarios than others. We can compare the performance of classifiers with a few different metrics that indicate different performance measures of the model - such as how accurate it is. # # *Scoring* is the process of using a classifier on a set of obversations to output a prediction. When used for scoring, a classifier typically outputs a probability of the positive class occuring (i.e. probability of Yes, True or On) which provides us additional information about how we should use that score. # # ### Example # # Banks want to make the best credit decisions possible when approving loans for borrowers - to reduce the risk of losses from the borrower failing to repay their loan. To provide additional information to the credit decisioning process, a classifier is built to predict the likelihood of default of the borrower and loan being assessed. # # For this example, we aren't concerned so much about the process of building the model (that will come soon!). # # Once a suitably performing model has been constructed, the data of new loan applications can be submitted for scoring, and the classifier will output a positive real-valued number between 0 and 1, such as: # # * 0.99 - almost certain # * 0.75 - quite likely # * 0.50 - somewhat likely # * 0.25 - unlikely # * 0.01 - very unlikely # # With this additional information, the bank can better understand their credit risk when making lending decisions - potentially declining higher-risk loans and providing lower interest rates to lower-risk loans. # # --- # # [Table of Contents](00.00-Learning-ML.ipynb#Table-of-Contents) • [← *Chapter 1 - Introduction*](01.00-Introduction.ipynb) • [*Chapter 2.01 - Dummy Classifiers* →](02.01-Dummy-Classifiers.ipynb) # #