如何从没有逗号且键中没有值的字典中提取记录

debugcn 发表于 Dev

彼得

这本词典的格式很奇怪。

当运行嵌套的for循环时，它会因为某些键'top'而中断，并'rising'带有“ None”
实际包含可用数据的索引有一些噪音，例如文本：query value或非索引的文本数字，例如0 1 2 3
也没有逗号分隔行。

因此，目标是将数据的可用部分转换为数据框。

数据：

d = 

{1: {'abroad': {'top': None, 'rising': None}},
 2: {'house': {'top': None, 'rising': None}},
 3: {'school': {'top':                            query  value
   0     l    100
   1     x    100
   2     y     44
   3     j     31
   4     k      6, 'rising': None}},
 4: {'in_house': {'top':                            query  value
   0            a    100
   1            b     97
   2            c     32
   3            d     12,  'rising': None}},
 5: {'community': {'top': None, 'rising':      query  value
   0            s    100}},
 }

我的代码：

list_words = []


for x in dicti:

    for a in dicti[x]:
        print(x, a)

        for b in dicti[x][a].values():
            print(b)
            list_words.append(b)



data = pd.DataFrame(list_words)
data = data.dropna(how='all')  
data = data.rename(columns={0:'search'})
data = data.search.astype(str)
data = data.reset_index()

data = data[0].str.split(",")

所需的输出：

search     score    status
l        100      top
x        100      top
y        44       top 
j        31       top
k        6        top
a        100      top
b        97       top
c        32       top
d        12       top
s        100      rising

广晃

IIUC，您可以这样做concat：

pd.concat(pd.DataFrame(v).assign(status=k) for y in d.values() 
            for x in y.values() for k,v in x.items()
         )

输出：

   status query  value
0     top     l  100.0
1     top     x  100.0
2     top     y   44.0
3     top     j   31.0
4     top     k    6.0
0     top     a  100.0
1     top     b   97.0
2     top     c   32.0
3     top     d   12.0
0  rising     s  100.0

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。