How to add new columns to dataframe with value taken from another dataframe?

debugcn Published at Dev

Kunitsyn Artsiom

I have two dataframes: df1.head():

    nazwa województwa   gmina nazwa gminy  rodzaj gminy
0  Zachodniopomorskie  320101   Białogard             1
1  Zachodniopomorskie  320101   Białogard             1
2  Zachodniopomorskie  320101   Białogard             1
3  Zachodniopomorskie  320101   Białogard             1
4  Zachodniopomorskie  320101   Białogard             1

and kts_df.head():

               name         type        KTS_code TERYT_code
0            Polska      COUNTRY  10000000000000       None
1           Bochnia  RURAL_GMINA  10011212001022    1201022
2           Drwinia  RURAL_GMINA  10011212001032    1201032
3         Iwanowice  RURAL_GMINA  10011212006032    1206032
4  Lipnica Murowana  RURAL_GMINA  10011212001042    1201042

Currently to add new column to df1 I am using

df['kts'] = df.apply(lambda row: self.get_kts_code(row, kts_df), axis=1) where

    def get_kts_code(self, row, kts_df: DataFrame) -> str:
            """Get the KTS code of each miasto/wieś."""
            gmina_types = {
                '1': AdministrativeUnitType.URBAN_GMINA,
                '2': AdministrativeUnitType.RURAL_GMINA,
                '4': AdministrativeUnitType.MIXED_GMINA,
                '5': AdministrativeUnitType.RURAL_AREA,
                '8': AdministrativeUnitType.DISTRICT,
                '9': AdministrativeUnitType.DELEGATION,
            }
            nazwa_gminy = row['nazwa gminy']
            gmina_type = gmina_types[str(row['rodzaj gminy'])]
            teryt = kts_df['TERYT_code'].str.contains(str(row['gmina']))
            kts_code = kts_df.loc[
                (kts_df['name'] == nazwa_gminy) & (kts_df['type'] == gmina_type) & (teryt)
            ]
            kts_code = kts_code['KTS_code'].values[0]
            return kts_code

This code works well, but to process df1 with about 200k rows it takes about an hour, too slow. Probably there can be another way to quickly find correct kts_code from kts_df for each row of df1?

Into Numbers

I'm not sure if I've got your demand right, but you could try the following:

create a df with the corresponding gmina_type mapping
join kts_df with gmina_types_df to get the gmina IDs
join df1 with the enriched kts_df

Code Example:

gmina_types = {
    'id': [1, 2, 3, 4, 5],
    'type': ['URBAN_GMINA', 'RURAL_GMINA', 'MIXED_GMINA', 'RURAL_AREA', 'DISTRICT']
}

gmina_types_df = pd.DataFrame.from_dict(gmina_types)

kts_df = kts_df.join(gmina_types_df.set_index('type'), on='type')
df1 = df1.join(kts_df.set_index('id'), on='rodzaj gminy')

Collected from the Internet

Please contact [email protected] to delete if infringement.