In large dataset a relatively small group of points might be overplotted by the dominant group. In this case stratified sampling can help.
%useLatestDescriptors
%use lets-plot
import java.util.Random
val N = 5000
val smallGroup = 3
val largeGroup = N - smallGroup
val rand = Random(123)
val data = mapOf (
"x" to List(N) { rand.nextGaussian() },
"y" to List(N) { rand.nextGaussian() },
"cond" to List(smallGroup) { 'A' } + List(largeGroup) { 'B' }
)
// Data points in group 'A' (small group) are overplotted by the dominant group 'B'.
val p = ggplot(data) { x="x"; y="y"; color="cond" } +
scaleColorManual(values=listOf("red", "#1C9E77"), breaks=listOf('A', 'B'))
p + geomPoint(size=5, alpha=.2)
// The 'random' sampling loses the group 'A' altogether.
p + geomPoint(size=5, sampling=samplingRandom(50, seed=2))
// Stratified sampling ensures that group 'A' is represented.
p + geomPoint(size=5, sampling=samplingRandomStratified(50, seed=2))