我在 50 个文件夹中有 counts.txt 文件,每个文件夹都与一个样本相关。counts.txt 中有两列:第一列是字符串,另一列是数字。我尝试通过它们制作嵌套字典。目标是将counts.txt和文件夹的第一列用作字典的键,将counts.txt中的第二列用作值。不幸的是,文件夹列表,我想在它们上循环以给我正确的形状,但它不起作用并面临问题!
import os
from natsort import natsorted
path1 = "/home/ali/Desktop/SAMPLES/"
data_ali={}
samples_name=natsorted(os.listdir(path1))
data_ali = {}
samples_name=natsorted(os.listdir(path1))
for i in samples_name:
with open(path1+i[0:]+"/counts.txt","rt") as fin:
for l in fin.readlines():
l=l.strip().split()
if l[0][:4]=='ENSG':
gene=l[0]
data_ali[gene]={}
reads=int(l[1])
data_ali[gene][samples_name]=reads
print(data_ali)
i expect the output like this:
'ENSG00000120659': {
'Sample_1-Leish_011_v2': 14,
'Sample_2-leish_011_v3': 7,
'Sample_3-leish_012_v2': 6,
'Sample_4-leish_012_v3': 1,
'Sample_5-leish_015_v2': 9,
'Sample_6-leish_015_v3': 3,
'Sample_7-leish_016_v2': 4,
'Sample_8-leish_016_v3': 8,
'Sample_9-leish_017_v2': 8,
'Sample_10-leish_017_v3': 2,
'Sample_11-leish_018_v2': 4,
'Sample_12-leish_018_v3': 4,
'Sample_13-leish_019_v2': 7,
'Sample_14-leish_019_v3': 4,
'Sample_15-leish_021_v2': 12,
'Sample_16-leish_021_v3': 5,
'Sample_17-leish_022_v2': 4,
'Sample_18-leish_022_v3': 2,
'Sample_19-leish_023_v2': 9,
'Sample_20-leish_023_v3': 6,
'Sample_21-leish_024_v2': 22,
'Sample_22-leish_024_v3': 10,
'Sample_23-leish026_v2': 9,
'Sample_24-leish026_v3': 5,
'Sample_25-leish027_v2': 4,
'Sample_26-leish027_v3': 1,
'Sample_27-leish028_v2': 7,
'Sample_28-leish028_v3': 5,
'Sample_29-leish032_v2': 8,
'Sample_30-leish032_v3': 2
}
试试这个:
if l[0][:4] == 'ENSG':
gene = l[0]
reads = int(l[1])
data_ali.setdefault(gene, {})[i] = reads
两个重要的变化:
data_ali[gene]={}
总是清除以前存在的内容并创建一个新的空字典。setdefault
仅当键gene
不存在时才创建字典。i
,而不是列表samples_name
。完整的代码清理:
import os
from natsort import natsorted
root = "/home/ali/Desktop/SAMPLES/"
data_ali = {}
for sample_name in natsorted(os.listdir(root)):
with open(os.path.join(root, sample_name, "counts.txt"), "r") as fin:
for line in fin.readlines():
gene, reads = line.split()
reads = int(reads)
if gene.startswith('ENSG'):
data_ali.setdefault(gene, {})[sample_name] = reads
print(data_ali)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句