I found 2 ways to replace nan values in pythons, One using sklearn's imputer class and the other using df.fillnan() the later seems easy with less code. But efficiency wise which is better. Can anyone explain the use cases of each.?
I feel imputer class has its own benefits because you can just simply mention mean or median to perform some action unlike in fillna where you need to supply values. But in imputer you need to fit and transform the dataset which means more lines of code. But it may give you better speed over fillna but unless really big dataset it doesn’t matter.
But fillna has something which is really cool. You can fill the na even with a custom value which you may sometime need. This makes fillna better IMHO even if it may perform slower.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments