基于列内部索引对大熊猫数据框进行分组

debugcn 发表于 Dev

绿色

我有一个熊猫数据框，当每一行都是一个单词时，它代表一个句子列表，并且它的ID对应于它在句子中的位置。
看起来像：

       ID        FORM 
  0    1           A   
  1    2        word   
  2    3          in   
  3    4         the   
  4    5       first   
  5    6    sentence   
  6    7           .   
  7    1         The   
  8    2      second   
  9    3    sentence   
  10   4           .   
  11   1         the   
  12   2       third   
  13   3    sentence     
        ...

如何添加一个名为“句子”的额外列，该列将与给定单词所属的句子相对应，并且我的数据框架如下所示：

        ID        FORM  Sentence  
  0    1           A    1
  1    2        word    1
  2    3          in    1
  3    4         the    1
  4    5       first    1
  5    6    sentence    1
  6    7           .    1
  7    1         The    2
  8    2      second    2
  9    3    sentence    2
  10   4           .    2
  11   1         the    3
  12   2       third    3
  13   3    sentence    3

我可以通过迭代数据框并手动创建一个序列来完成此操作，但是它看起来很丑陋，而且不那么张扬。有没有一种好的方法可以使用熊猫为我做？

皮特巴格

尝试这个

df['Sentence']=(df['ID'].diff()<0).cumsum()
df

产生

     ID  FORM        Sentence
--  ----  --------  ----------
 0     1  A                  0
 1     2  word               0
 2     3  in                 0
 3     4  the                0
 4     5  first              0
 5     6  sentence           0
 6     7  .                  0
 7     1  The                1
 8     2  second             1
 9     3  sentence           1
10     4  .                  1
11     1  the                2
12     2  third              2
13     3  sentence           2

这(df['ID'].diff()<0)是一个布尔数组，当ID减少时为True 。.cumsum()每次发生这种情况时加1

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。