Use Capacity Progression to determine -
Does my data have a discernible function? and How much data do I really need?
Capacity progression measures the learnability of a dataset, by plotting the number of decisions needed to memorize the function presented by the training data relative to the number of instances presented to the predictor (for an ideal model).*
From the Brainome Glossary
This notebook assumes brainome is installed as per notebook brainome_101_Quick_Start
The training data sets are:
!python3 -m pip install brainome --quiet
!brainome -version
The Capacity Progression of a random data set displays an ever increasing linear function.
!brainome https://download.brainome.ai/data/public/test_data9.csv -y -target SomeEmail -measureonly | grep -A 1 Capacity -
The Capacity Progression of a deterministic data set displays a plateau. In this case, 40% of the data has enough information content to train a strong model.
!brainome https://download.brainome.ai/data/public/vehicle.csv -y -measureonly | grep -A 1 Capacity -
The Capacity Progression of a real world data set is somewhere in the middle.
!brainome https://download.brainome.ai/data/public/titanic_train.csv -y -measureonly | grep -A 1 Capacity -
Typical actions to improve the learnability of your data: