I have a dataframe (df) that contains 30 000 rows coming from a web scraping exercice
Name NameID Age
John www.link.com/www.link.com/https://www.link.com/ct/John 25
Samanta www.link.com/www.link.com/https://www.link.com/ct/Samanta 24
Johnny www.link.com/www.link.com/ 22
Mary www.link.com/www.link.com/https://www.link.com/ct/Mary 35
I want to clean the "NameID" row in a way where i only read "https://www.link.com/ct/ " part. So my output dataframe should look like this :
Name NameID Age
John https://www.link.com/ct/John 25
Samanta https://www.link.com/ct/Samanta 24
Johnny 22
Mary https://www.link.com/ct/Mary 35
My code so far:
df['NameID'] = df['NameID'].str.split("https://www.link.com/ct/")[1][1]
df['NameID'] = "https://www.link.com/ct/" + df['NameID'].astype(str)
The output looks like this now:
Name NameID Age
John https://www.link.com/ct/John 25
Samanta https://www.link.com/ct/John 24
Johnny https://www.link.com/ct/John 22
Mary https://www.link.com/ct/John 35
Any help?
You're close, you need .str[1]
. Try changing your code to this:
df['NameID'] = df['NameID'].str.split("https://www.link.com/ct/").str[1]
df['NameID'] = "https://www.link.com/ct/" + df['NameID'].astype(str)
df
Name NameID Age
0 John https://www.link.com/ct/John 25
1 Samanta https://www.link.com/ct/Samanta 24
2 Johnny https://www.link.com/ct/nan 22
3 Mary https://www.link.com/ct/Mary 35
You can tweak your code a bit to return back a ''
, as you specified in your desired outcome.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments