Is there a python equivalent for R's h2o.stack?

joceratops

I am working with stacked learners. According to the docs for H2OStackedEnsembleEstimator h2o's python implementation allows you to easily build ensemble models. However this is limited to building base classifiers with the same underlying training data. I have time based features whose minimum date varies depending on the data source. Each sample of data is a point in time. To take advantage of as much data as I can, I split the features up until two groups (depending on relevance and minimum date) and train two separate models. I would like to combine these models, but H2OStackedEnsembleEstimator requires the features to be the same.

According to this post about R's stacked ensemble implementation there is an option to only perform the metalearning step which should require only the k-fold cross-validation predicitons for each base model and the true target value.

In case it crosses anyone's mind...for my particular problem, I realize I am going to run into an issue with the metalearning step with this mismatch in minimum date, and I have ideas to circumvent this.

Erin LeDell

For the Super Learner algorithm (stacking such that you use the cross-validated predicted values from the base learners as training data for the metalearner), the only requirement is that the base learners must be trained on the same rows -- the columns can be different. There is a variant of stacking, let's call it "Holdout Stacking", where you score the base models on a holdout dataset and use those predictions to train the metalearner instead. In this case, you can use entirely different training frames for the base learners.

The current Stacked Ensembles implementation in H2O has a restriction that the whole training frame (rows and columns) must be the same for the base learners, but we will relax that requirement in the future (since it's not really required).

Before we moved Stacked Ensembles in to the Java backend of H2O, I coded a simple reference implementation in Python using only the h2o Python module. For the time being, you could probably modify that code fairly easily to get the type of Stacked Ensemble that you're looking for. It's in a gist here.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Equivalent of R's createDataPartition in Python

From Dev

Equivalent of R's removeSparseTerms in Python

From Dev

reshape equivalent of stack in R

From Dev

pandas equivalent of R's cbind (concatenate/stack vectors vertically)

From Dev

equivalent of R's View for Python's pandas

From Dev

What is python's equivalent of R's NA?

From Dev

Python's equivalent for R's dput() function

From Dev

Is there a Python equivalent to R's sample() function?

From Dev

Python equivalent for R's 'zoo' package

From Dev

R's read.table equivalent in Python

From Dev

Python equivalent of R's head and tail function

From Dev

Python equivalent of R's rnbinom parametrized with mu

From Dev

simplest python equivalent to R's grepl

From Dev

equivalent to R's `do.call` in python

From Dev

simplest python equivalent to R's gsub

From Dev

Python equivalent to R 's factor data type

From Dev

simplest python equivalent to R's grepl

From Dev

Equivalent of R's sapply with a condition In Python

From Dev

Python equivalent for Perl's pack(H*, $string)

From Dev

Error in R and H2O initialization

From Dev

Importing R model into h2o

From Dev

issue installing h2o on r

From Dev

Equivalent of R's paste command for vector of numbers in Python

From Dev

Is there a Python equivalent of R's str(), returning only the structure of an object?

From Dev

R's which() and which.min() Equivalent in Python

From Dev

Java equivalent to python's "with"

From Dev

Equivalent of Python's 'with' in Julia?

From Dev

Equivalent of Python's locals()?

From Dev

Initializing H2O in Python