scikit-learnでmake_pipelineを使用すると、「パイプラインの最後のステップ」エラーが発生するのはなぜですか？

debugcn 投稿 Dev

アンシュ

だから私はデータをきれいにするためmake_pipelineにを使用しようとしscikit-learnています（欠落している値を置き換えてから外れ値をきれいにし、カテゴリ変数にエンコード関数を適用し、最後にランダムフォレストリグレッサを追加しRandomForestRegressorます。入力はDataFrameです。最終的にはこれを実行GridSearchCVして、リグレッサの最適なハイパーパラメータを検索します。

これを行うために、ここでTransformerMixinアドバイスされているように、クラスを継承するいくつかのカスタムクラスを作成しました。これが私がこれまでに持っているものです

from sklearn.pipeline import make_pipeline
from sklearn.base import TransformerMixin
import pandas as pd

class Cleaning(TransformerMixin):
    def __init__(self, column_labels):
        self.column_labels = column_labels
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        """Given a dataframe X with predictors, clean it."""
        X_imputed, medians_X = median_imputer(X) # impute all missing numeric data with median
        
        quantiles_X = get_quantiles(X_imputed, self.column_labels)
        X_nooutliers, _ = replace_outliers(X_imputed, self.column_labels, medians_X, quantiles_X)
        return X_nooutliers

class Encoding(TransformerMixin):
    def __init__(self, encoder_list):
        self.encoder_list = encoder_list
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        """Takes in dataframe X and applies encoding transformation to them"""
        return encode_data(self.encoder_list, X)

ただし、次のコード行を実行すると、エラーが発生します。

import category_encoders as ce

pipeline_cleaning = Cleaning(column_labels = train_labels)

OneHot_binary = ce.OneHotEncoder(cols = ['new_store']) 
OneHot = ce.OneHotEncoder(cols= ['transport_availability']) 
Count = ce.CountEncoder(cols = ['county'])
pipeline_encoding = Encoding([OneHot_binary, OneHot, Count])

baseline = RandomForestRegressor(n_estimators=500, random_state=12)
make_pipeline([pipeline_cleaning, pipeline_encoding,baseline])

エラーは言っていLast step of Pipeline should implement fit or be the string 'passthrough'ます。理由がわかりませんか？

編集：最後の行のわずかなタイプミス、正しい。渡されるリストの3番目の要素make_pipelineはリグレッサです

セルゲイ・ブッシュマノフ

行を変更します。

make_pipeline([pipeline_cleaning, pipeline_encoding,baseline])

to（リストなし）：

make_pipeline(pipeline_cleaning, pipeline_encoding,baseline)
Pipeline(steps=[('cleaning', <__main__.Cleaning object at 0x7f617260c1d0>),
                ('encoding', <__main__.Encoding object at 0x7f617260c278>),
                ('randomforestregressor',
                 RandomForestRegressor(n_estimators=500, random_state=12))])

そして、あなたは行っても大丈夫です

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-13

コメントを追加

サインイン

Related 関連記事

記事