将大型数据集分解为有组织的索引

用户名

我正在尝试shape_id从我拥有的数据集中创建的索引索引字典(请参见下文)。我意识到我可以使用循环(并尝试这样做),但是我有一种直觉,认为在熊猫中有很多方法可以做到这一点,而这在计算上并不昂贵。

可能的解决方案:groupbystr.findallstr.extract

字典的结构应如下所示:

{shape_id: [shape_pt_sequence, [shape_pt_lat,shape_pt_lon]]}

到目前为止,这是我的代码:

import pandas as pd

# readability assignments for shapes.csv
shapes = pd.read_csv('csv/shapes.csv')
shapes_shape_id = shapes['shape_id']
shapes_shape_id_index = list(set(shapes_shape_id))
shapes_shape_pt_sequence = shapes['shape_pt_sequence']
shapes_shape_pt_lat = shapes['shape_pt_lat']
shapes_shape_pt_lon = shapes['shape_pt_lon']

shapes_tuple = []

# add shape index to final dict
for i in range(len(shapes_shape_id_index)):
    shapes_tuple.append([shapes_shape_id_index[i]])

print(shapes_tuple)

这里的LINKshapes.csv主旨。

这是一个空的shape_id索引:

[[20992], [20993], [20994], [20995], [20996], [20997], [20998], [20999], [21000], [21001], [21002], [21003], [21004], [21005], [21006], [21007], [21008], [21009], [21010], [21011], [21012], [21013], [21014], [21015], [21016], [21017], [21018], [21019], [21020], [21021], [21022], [21023], [21026], [21027], [21028], [21029], [21030], [21031], [21032], [21033], [21034], [21035], [21036], [21037], [21038], [21039], [21040], [21041], [21042], [21043], [21044], [21045], [21046], [21047], [21048], [21049], [21050], [21051], [21052], [21053], [21054], [21055], [21056], [21057], [21058], [21059], [21060], [21061], [21062], [21063], [21064], [21065], [21066], [21067], [21068], [21069], [21070], [21071], [21072], [21073], [21074], [21075], [21076], [21077], [21078], [21079], [21080], [21081], [21082], [21083], [21084], [21085], [21086], [21087], [21088], [21089], [20958], [20959], [20960], [20961], [20962], [20963], [20964], [20965], [20966], [20967], [20968], [20969], [20970], [20971], [20972], [20973], [20974], [20975], [20976], [20977], [20978], [20979], [20980], [20981], [20982], [20983], [20984], [20985], [20986], [20987], [20988], [20989], [20990], [20991]]

shapes.csv如下所示:

shape_id,shape_pt_lat,shape_pt_lon,shape_pt_sequence,is_stop
20958,44.0577683,-123.0873313,1,0
20958,44.0577163,-123.087073,2,0
20958,44.0576286,-123.0867103,3,0
20958,44.0574258,-123.086641,4,0
20958,44.0571421,-123.0866518,5,0
20958,44.0568706,-123.086653,6,0
20958,44.0566161,-123.0867028,7,0
20958,44.0565641,-123.0869733,8,0
20958,44.0565503,-123.0872603,9,0
20958,44.0565536,-123.087631,10,0
20958,44.0565439,-123.0879283,11,0
20958,44.0564661,-123.087894,12,0
20958,44.0565124,-123.0881793,13,0
20958,44.0565181,-123.0884921,14,0
20958,44.0565331,-123.0888668,15,0
20958,44.0565406,-123.0892323,16,0
20958,44.0565406,-123.0896295,17,0
20958,44.0563515,-123.0897096,18,0
20958,44.056073,-123.0897108,19,0
20958,44.0558501,-123.0897,20,0
20958,44.0558358,-123.0897016,21,0
20958,44.0556489,-123.0896861,22,0
20958,44.0554398,-123.0896781,23,0
20958,44.0552033,-123.0896776,24,0
20958,44.0549253,-123.089692,25,0
20958,44.0546778,-123.0897281,26,0
20958,44.0546578,-123.0897326,27,0
20958,44.0546338,-123.0896965,28,0
20958,44.0543988,-123.0896838,29,0
20958,44.0543536,-123.0899543,30,0
20958,44.0543628,-123.0903496,31,0
20958,44.0543668,-123.0906733,32,0
20958,44.0543718,-123.0910178,33,0

例如,在shapes.csv中,20958最大值shape_pt_sequence为72。20960最大值shape_pt_sequence为400,依此类推

Y

我不知道为什么需要这样的结构[shape_id:[shape_pt_sequence, [shape_pt_lat,shape_pt_lon]]],它对于数据选择不是很有用,可以使用MultiIndex:

shapes = pd.read_csv('shapes.csv')
shapes.set_index(["shape_id", "shape_pt_sequence"], inplace=True)

然后选择20958的所有数据:

print shapes.loc[20958]

选择一个点:

print shapes.loc[20958, 45]

选择shape_pt_sequence范围为20958的数据:

print print shapes.loc[(20958, slice(45, 48)), :]

在[45,48]中选择shape_pt_sequence的数据:

print shapes.loc[(20958, [45, 48]), :]

如果您确实需要此表单,请使用以下代码:

shapes = pd.read_csv('shapes.csv')

def f(df):
    return [df.shape_pt_sequence.tolist(), [df.shape_pt_lat.tolist(), df.shape_pt_lon.tolist()]]

res = shapes.groupby("shape_id").apply(f).to_dict()

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

将具有日期时间索引的日期/小时数据帧分解为单个列-python,pandas

来自分类Dev

将大型“整体”类分解为较小的类

来自分类Dev

将列表分解为一组索引列表

来自分类Dev

将一行数据分解为多行

来自分类Dev

将 Hive Map 数据对象分解为长格式

来自分类Dev

将多维数据集递归分解为8个较小的多维数据集(当多维数据集由中点和大小定义时)

来自分类Dev

推送时将大型仓库分解为多个较小的仓库

来自分类Dev

何时将大型Git存储库分解为较小的存储库?

来自分类Dev

什么时候将大型Git存储库分解为较小的存储库?

来自分类常见问题

将较大的数据框分为两个有组织的数据框的最佳方法?

来自分类Dev

将较大的数据框分为两个有组织的数据框的最佳方法?

来自分类Dev

将大型数据集组织到单独的行中

来自分类Dev

ggplot:当每周有数据点时,如何将x轴分解为几个月?

来自分类Dev

将嵌套映射分解为键值对

来自分类Dev

将匿名功能分解为术语

来自分类Dev

将FD分解为BCNF

来自分类Dev

递归将列表分解为元素

来自分类Dev

将this.props分解为Component

来自分类Dev

将序列分解为词汇变量

来自分类Dev

将shell输出分解为文件

来自分类Dev

自动将单词分解为字母?

来自分类Dev

将传入的JSON分解为数组

来自分类Dev

将FD分解为BCNF

来自分类Dev

MATLAB:将矩阵分解为向量

来自分类Dev

将 CSS 分解为规则

来自分类Dev

将两个“类别”分解为两个索引表的范围?

来自分类Dev

将 SparseVector 列分解为包含索引和值的行

来自分类Dev

根据R中键列表中的键将较大的数据帧分解为较小的数据帧

来自分类Dev

在Matlab中组织大型数据集

Related 相关文章

热门标签

归档