I have two dataframes: df1.head():
nazwa województwa gmina nazwa gminy rodzaj gminy
0 Zachodniopomorskie 320101 Białogard 1
1 Zachodniopomorskie 320101 Białogard 1
2 Zachodniopomorskie 320101 Białogard 1
3 Zachodniopomorskie 320101 Białogard 1
4 Zachodniopomorskie 320101 Białogard 1
and kts_df.head():
name type KTS_code TERYT_code
0 Polska COUNTRY 10000000000000 None
1 Bochnia RURAL_GMINA 10011212001022 1201022
2 Drwinia RURAL_GMINA 10011212001032 1201032
3 Iwanowice RURAL_GMINA 10011212006032 1206032
4 Lipnica Murowana RURAL_GMINA 10011212001042 1201042
Currently to add new column to df1 I am using
df['kts'] = df.apply(lambda row: self.get_kts_code(row, kts_df), axis=1)
where
def get_kts_code(self, row, kts_df: DataFrame) -> str:
"""Get the KTS code of each miasto/wieś."""
gmina_types = {
'1': AdministrativeUnitType.URBAN_GMINA,
'2': AdministrativeUnitType.RURAL_GMINA,
'4': AdministrativeUnitType.MIXED_GMINA,
'5': AdministrativeUnitType.RURAL_AREA,
'8': AdministrativeUnitType.DISTRICT,
'9': AdministrativeUnitType.DELEGATION,
}
nazwa_gminy = row['nazwa gminy']
gmina_type = gmina_types[str(row['rodzaj gminy'])]
teryt = kts_df['TERYT_code'].str.contains(str(row['gmina']))
kts_code = kts_df.loc[
(kts_df['name'] == nazwa_gminy) & (kts_df['type'] == gmina_type) & (teryt)
]
kts_code = kts_code['KTS_code'].values[0]
return kts_code
This code works well, but to process df1 with about 200k rows it takes about an hour, too slow. Probably there can be another way to quickly find correct kts_code from kts_df for each row of df1?
I'm not sure if I've got your demand right, but you could try the following:
Code Example:
gmina_types = {
'id': [1, 2, 3, 4, 5],
'type': ['URBAN_GMINA', 'RURAL_GMINA', 'MIXED_GMINA', 'RURAL_AREA', 'DISTRICT']
}
gmina_types_df = pd.DataFrame.from_dict(gmina_types)
kts_df = kts_df.join(gmina_types_df.set_index('type'), on='type')
df1 = df1.join(kts_df.set_index('id'), on='rodzaj gminy')
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments