I think that it will be important for machine learning experts and software engineers to come together to develop best practices for software development of machine learning systems. Currently we have a software testing regime where you define unit tests... We will need new testing processes that involve running experiments, analyzing the results... This is a great area for software engineers and machine learning people to work together to build something new and better.
smsdata:
source: psql
table_name: messages_db
config:
hostname: 127.0.0.1
db_name: foo
username: bar
chunksize: 100000
sampling:
factor: 0.1
kind: random
dtypes:
Message: &string !!python/name:__builtin__.str
Number: *string
person: *string
postprocessors:
- !!python/name:jeeves.preprocessing.text.remove_unicode
- !!python/name:jeeves.preprocessing.text.remove_tabs
- !!python/name:jeeves.preprocessing.text.remove_digits
- !!python/name:jeeves.preprocessing.text.remove_stopwords
- !!python/name:jeeves.preprocessing.text.to_lowercase
- !!python/name:jeeves.feature_extraction.text.get_regex_features
- !!python/name:jeeves.feature_extraction.text.get_tfidf_features
>>> from pysemantic import Project
>>> smartsms = Project("smartsms")
>>> X = smartsms.load_dataset("smsdata")
class CrossValidationTask(luigi.Task):
estimator = luigi.Parameter() # or luigi.Target
def run(self):
# Run CV loop
class GridSearchTask(luigi.Task):
grid = luigi.Parameter() # or Target
estimator = luigi.Parameter() # or Target
def run(self):
X, y = self.input()
clf = GridSearchCV(self.estimator, param_grid=self.grid, ...)
clf.fit(X, y)
joblib.dump(clf.best_estimator_, self.output())