大型数据集 - 选择列后选择特定的行

debugcn 发表于 Dev

黑金

我正在使用一个相当大的数据集，其中包含许多甚至多行具有相似名称的行。

这是我到目前为止一直在使用的代码：

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv("dataset_20001_20180801113759.csv")
df = df.set_index(["Small Molecule HMS LINCS ID"])

Chosen_SmallMoleculeName="10104-101-1"
df2 = df.loc[Chosen_SmallMoleculeName, ["Cell count", "% Apoptotic cells"]]
df3 = df2.loc[Chosen_SmallMoleculeName, "Cell count"]

df4 = df.loc[Chosen_SmallMoleculeName, "Cell count"]
print("Cell count")
print(df4.values)

df5 = df.loc[Chosen_SmallMoleculeName, "% Apoptotic cells"]
print("% Apoptotic cells")
print(df5.values)

有了这个，它会打印出“细胞计数”和“细胞凋亡百分比”的整列，这些列太大而无法在此处复制和粘贴。从上图中，我想尝试仅获取第 2-7 行的特定数据。

数据集可以从这里获得：http : //lincs.hms.harvard.edu/db/datasets/20001/results

问题 1：如何选择“细胞计数”和“凋亡细胞百分比”的第 2 至 7 行特定数据？

Question 2 (Not as important, but I am wondering):Is it possible to do this "dynamically"? As in, instead of myself manually having to look at each row to find the unique or related ones, is it possible to write the code that chooses rows 2-7 to be printed, but intuitively chooses, say rows 14 to 19? I feel this would be delving into machine learning territory...

I have looked at the Python API and have not found a similar question.

Luca Cappelletti

To retrieve rows from 2 to 7 you can use slicing, once you have considered that you have to subtract 1 for the header and another 1 since arrays start from 0:

result = df[:6][["Cell count", "% Apoptotic cells"]]

With the result being:

          Cell count       % Apoptotic cells
0         576              60.59
1         373              79.09
2         436              56.19
3         654              43.88
4         284              58.10
5         574              41.81

现在，如果您要更彻底地解释您有兴趣从该数据集中提取的属性是什么，我们也可以帮助您解决这个问题。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-07-21

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

大型数据集 - 选择列后选择特定的行

大型数据集 - 选择列后选择特定的行

选择查询并集后追加数据

数据集选择列

如何导入大型数据集并将其放在单个列中

汇总大型数据集以获取列列表并使用不同的FUN

MySQL大型数据集

postgresql 9.4 / 9.5-选择...以在具有高读写次数的大型数据集上更新一个随机行

从SQL数据库中选择特定的行和列

使用RecordLinkage包为大型数据集生成唯一的ID列

更新大型数据集

在汇总二进制列的大型数据集上优化pandas groupby（）

将特定的列与另一个数据框合并后无法选择它

能否将大型数据集的总和向量化到数组的记录特定元素上？

在d3.js中更新大型数据集的列类型

汇总大型数据集的数据框中的列

如何迭代拆分大型数据集以按行获取较小的数据集

合并大型数据集

如何在MATLAB中为大型数据集选择2行并跳过3行？

管理大型数据集

如何在大型数据集上按行执行prop.test（比较两组）

将行添加到大型Excel数据集

如何导入大型数据集并将其放在单个列中

使用列值从大型数据集中选择特定行

Python / Pandas：在大型数据集的多个列中替换某些值

让Excel每49-50行平均一个大型数据集

将大型数据集组织到单独的行中

在 SPSS 中聚合大型数据集时，如何获得百分比列？

MongoDB Java -3.x ，根据 mongoshell pgmm 来自大型数据集的特定键

R - 基于列数据的计时器（条件在 x 时间为真）用于大型数据集

排序后如何仅从数据集中选择特定列