我需要在地理图上绘制一些数据。具体来说,我想强调一下数据来源的国家和州。我的数据集是
Year Country State/City
0 2009 BGR Sofia
1 2018 BHS New Providence
2 2002 BLZ NaN
3 2000 CAN California
4 2002 CAN Ontario
... ... ... ...
250 2001 USA Ohio
251 1998 USA New York
252 1995 USA Virginia
253 2011 USA NaN
254 2019 USA New York
要创建地理图,我一直使用geopandas
以下方法:
import geopandas as gpd
shapefile = 'path/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
然后,我合并了两个数据集:
merged = gdf.merge(df, left_on = 'country_code', right_on = 'Country')
并将数据转换为json:
import json
merged_json = json.loads(merged.to_json())
#Convert to String like object.
json_data = json.dumps(merged_json)
最后,我尝试按如下方式创建图表:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)
p = figure(title = 'Creation year across countries', plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :'per_cent_year', 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
output_notebook()
#Display figure.
show(p)
当我运行它时,它说BokehJS 1.0.2 successfully loaded
。但不会显示任何内容。我的预期输出将是一张地图,其中颜色基于一个国家的出现次数(例如,USA = 5将更暗),另一张基于州/城市(纽约将更暗)。
上面的代码有什么问题吗?
(如果需要,很乐意分享更多数据/信息)
从您发布的代码中,我看不到绘图有任何问题,因此我认为问题可能出在您的数据聚合或合并中。
这是一个解决方案,首先生成应与您的数据相似的数据,然后将国家/地区出现在数据中的次数作为数据集大小的一部分进行计数,因为这是必需的指标。我们将重点仅以几个国家为例:
from random import choices
import pandas as pd
import numpy as np
def generate_data():
k = 100
countries_of_interest = ['USA','ARG','BRA','GBR','ESP','RUS']
countries = choices(countries_of_interest, k=k)
start_yr = 2010
end_yr = 2021
return pd.DataFrame({'Country':countries,
'Year':np.random.randint(start_yr, end_yr, k)},
index=range(len(countries)))
def aggregate_data(df):
data = df.groupby('Country').agg('count')*100.0/len(df)
data = data.reset_index().rename(columns={'Year':'proportion_of_dataset'})
return data
df = generate_data()
# Country Year
# 0 USA 2017
# 1 GBR 2014
# 2 USA 2013
# 3 BRA 2016
# 4 BRA 2018
# .. ... ...
# 95 ESP 2014
# 96 USA 2015
# 97 RUS 2019
# 98 RUS 2012
# 99 RUS 2011
#
# [100 rows x 2 columns]
data = aggregate_data(df)
# Country proportion_of_dataset
# 0 ARG 20.0
# 1 BRA 17.0
# 2 ESP 14.0
# 3 GBR 14.0
# 4 RUS 19.0
# 5 USA 16.0
现在使用geopandas加载国家边界shapefile,并重命名列:
import geopandas as gpd
shapefile = 'path_to_shapfile_folder/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
gdf.head()
# country country_code \
# 0 Fiji FJI
# 1 United Republic of Tanzania TZA
# 2 Western Sahara SAH
# 3 Canada CAN
# 4 United States of America USA
#
# geometry
# 0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
# 1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
# 2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
# 3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
# 4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
现在,我们想将国家/地区多边形数据框与我们的汇总数据合并。注意:我们想进行左连接(在整个国家/地区的多边形数据框中),以便我们包括所有国家,甚至包括我们没有数据的国家。还要注意,我们通过用零填充NaN来为这些国家/地区添加缺失值:
merged = gdf.merge(data, left_on = 'country_code', right_on = 'Country', how='left')
merged['proportion_of_dataset'] = merged['proportion_of_dataset'].fillna(0)
使用您的代码创建geojson:
import json
merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)
最后,我们将绘图代码放入一个函数中,并将geojson,要打印的列和绘图标题作为参数传递:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
def plot_map(json_data,plot_col,title):
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)
p = figure(title = title, plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :plot_col, 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
output_notebook()
#Display figure.
show(p)
现在我们要做的就是调用绘图函数,传入所需的参数:
plot_map(json_data,'proportion_of_dataset','Dataset countries of origin')
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句