计算所有可能行的差

Georges 发表于 Dev

乔治斯

基于对ds数据框的选择，d其中：

{ 'x': d.x, 'y': d.y, 'a':d.a, 'b':d.b, 'c':d.c 'row:d.n'})

有n行，x范围从0到n-1。该列n是必需的，因为它是一个选择，并且需要保留索引以供以后查询。

您如何有效地计算每列（a_0, a_1, etc）的每一行（例如）之间的差异，a, b, c而又不丢失行信息（例如具有所使用行索引的新列）？

威斯康星州

样品选择ds：

             x           y      a     b      c     n

    554.607085  400.971878   9789  4151   6837   146
    512.231450  405.469524   8796  3811   6596   225
    570.427284  694.369140   1608  2019   2097   291

所需的输出：

dist 欧氏距离 math.hypot(x2 - x1, y2 - y1)

da, db, dc 为了 da: np.abs(a1-a2)

ns同时使用ns行的字符串

结果将如下所示：

             dist          da        db       dc         ns
42.61365102824963         993       340      241    146-225
293.82347069813255       8181      2132     4740    146-291
                ..         ..        ..       ..    225-291

亨利

您可以itertools.combinations()用来生成对：

首先读取数据：

import pandas as pd
from io import StringIO
import numpy as np

text = """             x           y      a     b      c     n
    554.607085  400.971878   9789  4151   6837   146
    512.231450  405.469524   8796  3811   6596   225
    570.427284  694.369140   1608  2019   2097   291"""

df = pd.read_csv(StringIO(text), delim_whitespace=True)

创建索引并计算结果：

from itertools import combinations

index = np.array(list(combinations(range(df.shape[0]), 2)))

df1, df2 = [df.iloc[idx].reset_index(drop=True) for idx in index.T]

res = pd.concat([
    np.hypot(df1.x - df2.x, df1.y - df2.y),
    df1[["a", "b", "c"]] - df2[["a", "b", "c"]],
    df1.n.astype(str) + "-" + df2.n.astype(str)
], axis=1)

res.columns = ["dist", "da", "db", "dc", "ns"]
res

输出：

         dist    da    db    dc       ns
0   42.613651   993   340   241  146-225
1  293.823471  8181  2132  4740  146-291
2  294.702805  7188  1792  4499  225-291

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。