The CorrPlot
builder takes a dataframe (Kotlin Map<*, *>
) as the input and builds a correlation plot.
If the input has NxN shape and contains only numbers in range [0..1], then it is plotted as is. Otherwise CorrPlot
will compute correlation coefficients using the Pearson's method.
CorrPlot
allows to combine 'tile', 'point' or 'label' layers in a matrix of "full", "lower" or "upper" type.
A call to the terminal build()
method will create a resulting 'plot' object.
This 'plot' object can be further refined using regular Lets-Plot (ggplot) API, like + ggsize()
and so on.
The Ames Housing dataset for this demo was downloaded from House Prices - Advanced Regression Techniques (train.csv), (c) Kaggle.
%useLatestDescriptors
%use lets-plot
LetsPlot.getInfo() // This prevents Krangl from loading an obsolete version of Lets-Plot classes.
Lets-Plot Kotlin API v.4.1.1. Frontend: Notebook with dynamically loaded JS. Lets-Plot JS v.2.5.1.
%use krangl
// Cars MPG dataset
var mpg_df = DataFrame.readCSV("https://raw.githubusercontent.com/JetBrains/lets-plot-kotlin/master/docs/examples/data/mpg.csv")
mpg_df.head(3)
manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
2 | audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
3 | audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
Shape: 3 x 12.
mpg_df = mpg_df.remove("")
mpg_df.head(3)
manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class |
---|---|---|---|---|---|---|---|---|---|---|
audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
Shape: 3 x 11.
val mpg_dat = mpg_df.toMap()
When combining layers, CorrPlot
chooses an acceptable plot configuration by default.
gggrid(
listOf(
CorrPlot(mpg_dat, "Tiles").tiles().build(),
CorrPlot(mpg_dat, "Points").points().build(),
CorrPlot(mpg_dat, "Tiles and labels").tiles().labels().build(),
CorrPlot(mpg_dat, "Tiles, points and labels").points().labels().tiles().build()
), 2, 400, 320)
The default plot configuration adapts to the changing options - compare "Tiles and labels" plot above and below.
You can also override the default plot configuration using the parameter type
- compare "Tiles, points and labels" plot above and below.
gggrid(
listOf(
CorrPlot(mpg_dat, "Tiles and labels").tiles().labels(color="white").build(),
CorrPlot(mpg_dat, "Tiles, points and labels")
.tiles(type="upper")
.points(type="lower")
.labels(type="full").build()
), 2, 400, 320)
Instead of the default blue-grey-red gradient you can define your own lower-middle-upper colors, or choose one of the available 'Brewer' diverging palettes.
Let's create a gradient resembling one of Seaborn gradients.
val corrPlot = CorrPlot(mpg_dat).points().labels().tiles()
// Configure gradient resembling one of Seaborn gradients.
val withGradientColors = (corrPlot
.paletteGradient(low="#417555", mid="#EDEDED", high="#963CA7")
.build()) + ggtitle("Custom gradient")
// Configure Brewer 'BrBG' palette.
val withBrewerColors = (corrPlot
.paletteSpectral()
.build()) + ggtitle("Brewer 'Spectral'")
// Show both plots
gggrid(listOf(withGradientColors, withBrewerColors), 2, 400, 320)
The Kaggle House Prices dataset contains 81 variables.
val housing_df = DataFrame.readCSV("../data/Ames_house_prices_train.csv")
housing_df.head(3)
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 60 | RL | 65 | 8450 | Pave | null | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | null | Attchd | 2003 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | null | null | null | 0 | 2 | 2008 | WD | Normal | 208500 |
2 | 20 | RL | 80 | 9600 | Pave | null | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | null | null | null | 0 | 5 | 2007 | WD | Normal | 181500 |
3 | 60 | RL | 68 | 11250 | Pave | null | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | null | null | null | 0 | 9 | 2008 | WD | Normal | 223500 |
Shape: 3 x 81.
Correlation plot that shows all the correlations in this dataset is too large and barely useful.
CorrPlot(housing_df.toMap())
.tiles(type="lower")
.paletteBrBG()
.build()
threshold
parameter.¶The threshold
parameter let us specify a level of significance, below which variables are not shown.
CorrPlot(housing_df.toMap(), "Threshold: 0.5", threshold = 0.5, adjustSize = 0.7)
.tiles(type="full", diag=false)
.paletteBrBG()
.build()
Let's further increase our threshold in order to see only highly correlated variables.
CorrPlot(housing_df.toMap(), "Threshold: 0.8", threshold = 0.8)
.tiles(diag=false)
.labels(color="white", diag=false)
.paletteBrBG()
.build()