Table of ContentsChapter 1 - IntroductionChapter 2.01 - Dummy Classifiers


Chapter 2 - Classification

Introduction to Classification

As a supervised method, classification is the process of predicting a discrete class of an obversation. Classes can be binary (having only two values) or multi-class (having more than two values).

Typically binary classes will be represented as 1 and 0, where the digits correspond to True and False, Yes and No, On and Off, Cat or Dog and so on.

The model constructed for this process is called a classifier. As we will explore, a classifier can be constructed in many different ways - with some methods more suitable in certain scenarios than others. We can compare the performance of classifiers with a few different metrics that indicate different performance measures of the model - such as how accurate it is.

Scoring is the process of using a classifier on a set of obversations to output a prediction. When used for scoring, a classifier typically outputs a probability of the positive class occuring (i.e. probability of Yes, True or On) which provides us additional information about how we should use that score.

Example

Banks want to make the best credit decisions possible when approving loans for borrowers - to reduce the risk of losses from the borrower failing to repay their loan. To provide additional information to the credit decisioning process, a classifier is built to predict the likelihood of default of the borrower and loan being assessed.

For this example, we aren't concerned so much about the process of building the model (that will come soon!).

Once a suitably performing model has been constructed, the data of new loan applications can be submitted for scoring, and the classifier will output a positive real-valued number between 0 and 1, such as:

  • 0.99 - almost certain
  • 0.75 - quite likely
  • 0.50 - somewhat likely
  • 0.25 - unlikely
  • 0.01 - very unlikely

With this additional information, the bank can better understand their credit risk when making lending decisions - potentially declining higher-risk loans and providing lower interest rates to lower-risk loans.


Table of ContentsChapter 1 - IntroductionChapter 2.01 - Dummy Classifiers