Using H2O AutoML for Kaggle Porto Seguro Safe Driver Prediction Competition

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0

If you into competitive machine learning you must be visiting Kaggle routinely. Currently you can compete for cash and recognition at the Porto Seguro’s Safe Driver Prediction as well.

I did try to given training dataset (as it is) with H2O AutoML which ran for about 5 hours and I was able to get into top 280th position. If you could transform the dataset properly and run H2O AutoML you may be able to get even higher ranking.

Following is the simplest H2O AutoML python script which you can try as well (Note: Make sure to change the run_automl_for_seconds to the desired time you would want to run the experiment.)

import h2o
import pandas as pd
from h2o.automl import H2OAutoML

h2o.init()
train = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroTrain.csv')
test = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroTest.csv')
sub_data = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroSample_submission.csv')

y = 'target'
x = train.columns
x.remove(y)

## Time to run the experiment
run_automl_for_seconds = 18000
## Running AML for 4 Hours
aml = H2OAutoML(max_runtime_secs =run_automl_for_seconds)
train_final, valid = train.split_frame(ratios=[0.9])
aml.train(x=x, y =y, training_frame=train_final, validation_frame=valid)

leader_model = aml.leader
pred = leader_model.predict(test_data=test)

pred_pd = pred.as_data_frame()
sub = sub_data.as_data_frame()

sub['target'] = pred_pd
sub.to_csv('/data/avkash/PortoSeguro/PortoSeguroResult.csv', header=True, index=False)

That’s it, enjoy!!

 

Advertisements

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0