我正在尝试从DataFrame链接多行,以获取通过将接收者ID连接到发送者ID形成的所有可能路径。
这是我的DataFrame的示例:
transaction_id sender_id receiver_id amount
0 213234 002 125 10
1 223322 017 354 90
2 343443 125 689 70
3 324433 689 233 5
4 328909 354 456 10
创建于:
df = pd.DataFrame(
{'transaction_id': {0: '213234', 1: '223322', 2: '343443', 3: '324433', 4: '328909'},
'sender_id': {0: '002', 1: '017', 2: '125', 3: '689', 4: '354'},
'receiver_id': {0: '125', 1: '354', 2: '689', 3: '233', 4: '456'},
'amount': {0: 10, 1: 90, 2: 70, 3: 5, 4: 10}}
)
我的代码的结果应该是链ID列表和交易链的总金额。对于上面示例中的前两行,类似于:
[('002', '125', '689', '233'), 85]
[('017', '354', '456'), 100]
我已经尝试遍历各行并将每行转换为一个Node
类的实例,然后使用遍历链接列表的方法,但是我不知道下一步是什么:
class Node:
def __init__(self,transaction_id,sender,receiver,amount):
self.transac = transaction_id
self.val = sender_id
self.next = receiver_id
self.amount = amount
def traverse(self):
node = self # start from the head node
while node != None:
print (node.val) # access the node value
node = node.next # move on to the next node
for index, row in customerTransactionSqlDf3.iterrows():
index = Node(
row["transaction_id"],
row["sender_id"],
row["receiver_id"],
row["amount"]
)
附加信息:
我不知道下一步是什么
通过使用当前的实现,可以Node
通过迭代每个节点来连接两个对象。您还可以visited
在Node
类中添加属性,以便在遍历树时可以标识唯一的链,即不存在一个链是另一链的子链。但是,如果您想知道每个的链sender_id
,则可能没有必要。
编辑:我注意到您提到的预期结果的示例是对于前两行。这意味着每个人都sender_id
应该有自己的链。修改traverse
方法,以便在所有节点都连接后可以使用它。
编辑:重新实现visited
属性以获得唯一的链
df = pd.DataFrame(
{'transaction_id': {0: '213234', 1: '223322', 2: '343443', 3: '324433', 4: '328909'},
'sender_id': {0: '002', 1: '017', 2: '125', 3: '689', 4: '354'},
'receiver_id': {0: '125', 1: '354', 2: '689', 3: '233', 4: '456'},
'amount': {0: 10, 1: 90, 2: 70, 3: 5, 4: 10}}
)
class Node:
def __init__(self,transaction_id,sender_id,receiver_id,amount):
self.transac = transaction_id
self.sender = sender_id
self.receiver = receiver_id
self.next = None
self.amount = amount
self.visited = False
def traverse(self, chain=None, total=0):
if (self.visited): # undo visited nodes
return
self.visited = True
if chain is None: # this is the beginning of the traversal
chain = [self.sender]
chain += [self.receiver]
total += self.amount
if self.next is not None:
return self.next.traverse(chain, total)
return chain, total
transc = [Node(
row["transaction_id"],
row["sender_id"],
row["receiver_id"],
row["amount"]
) for i, row in df.iterrows()]
# connect the nodes
for i, v in enumerate(transc):
for j, k in enumerate(transc):
# if the receiver v same as the sender from j
if v.receiver == k.sender:
v.next = k
summary = [i.traverse() for i in transc]
summary = [i for i in summary if i is not None] # removing None
print(summary)
输出:
[
(['002', '125', '689', '233'], 85),
(['017', '354', '456'], 100)
]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句