Dataframe split before a specific string for all rows

J1701 Published at Dev

J1701

I have a dataframe (df) that contains 30 000 rows coming from a web scraping exercice

Name     NameID                                                            Age

John     www.link.com/www.link.com/https://www.link.com/ct/John             25
Samanta  www.link.com/www.link.com/https://www.link.com/ct/Samanta          24
Johnny   www.link.com/www.link.com/                                         22
Mary     www.link.com/www.link.com/https://www.link.com/ct/Mary             35

I want to clean the "NameID" row in a way where i only read "https://www.link.com/ct/ " part. So my output dataframe should look like this :

 Name     NameID                                  Age

John     https://www.link.com/ct/John             25
Samanta  https://www.link.com/ct/Samanta          24
Johnny                                            22
Mary     https://www.link.com/ct/Mary             35

My code so far:

df['NameID'] = df['NameID'].str.split("https://www.link.com/ct/")[1][1]
df['NameID'] =  "https://www.link.com/ct/" + df['NameID'].astype(str)

The output looks like this now:

Name     NameID                                  Age

John     https://www.link.com/ct/John             25
Samanta  https://www.link.com/ct/John             24
Johnny   https://www.link.com/ct/John             22
Mary     https://www.link.com/ct/John             35

Any help?

sophocles

You're close, you need .str[1]. Try changing your code to this:

df['NameID'] = df['NameID'].str.split("https://www.link.com/ct/").str[1]
df['NameID'] =  "https://www.link.com/ct/" + df['NameID'].astype(str)

df

      Name                           NameID  Age
0     John     https://www.link.com/ct/John   25
1  Samanta  https://www.link.com/ct/Samanta   24
2   Johnny      https://www.link.com/ct/nan   22
3     Mary     https://www.link.com/ct/Mary   35

You can tweak your code a bit to return back a '', as you specified in your desired outcome.

Collected from the Internet

Please contact [email protected] to delete if infringement.