I have a CSV I'm attempting to build a small python script for that will 'convert' it to CSV (basically to prepare data into an acceptable format).
I'm hitting a bit of a road block as I need to detect the first result out of 'blocks' of results;
for example
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
EEFFGGHHII-2.4-5.6-7.5
The first part (preceding the dash) has a variable length and is the only way to detect an 'individual' listing in the particular database. I basically want to insert a flag in a separate column which identifies each cluster that share the same code.
There are several hundred thousand listings so I can't come up with a list to just search through.
Thanks for any help.
If the data is grouped as shown, itertools.groupby
can iterate ordered data grouping by a common key:
import csv
import itertools
import operator
data1 = '''\
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
EEFFGGHHII-2.4-5.6-7.5
'''
data2 = '''\
SHIRT-RED
SHIRT-BLUE
SHIRT-GREEN
SHOE-RED
SHOE-BLUE
'''
def setup():
'''Generate some sample input files.'''
with open('sample1.hsv','w') as f:
f.write(data1)
with open('sample2.hsv','w') as f:
f.write(data2)
def process(infile,outfile):
with open(infile,'r',newline='') as ifile, open(outfile,'w',newline='') as ofile:
r = csv.reader(ifile,delimiter='-')
w = csv.writer(ofile,delimiter=',')
# key is the first column (offset 0)
# group is an iterator over the lines that have the same key
for key,group in itertools.groupby(r,operator.itemgetter(0)):
# Add a final column to the row list. 1 for first item.
w.writerow(next(group) + [1])
# Remaining items in group get a zero value in new column.
for other in group:
w.writerow(other + [0])
if __name__ == '__main__':
setup()
process('sample1.hsv','sample1.csv')
process('sample2.hsv','sample2.csv')
sample1.hsv
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
AABBCCDD-1.2-2.4-2.6
EEFFGGHHII-2.4-5.6-7.5
sample1.csv
AABBCCDD,1.2,2.4,2.6,1
AABBCCDD,1.2,2.4,2.6,0
AABBCCDD,1.2,2.4,2.6,0
AABBCCDD,1.2,2.4,2.6,0
EEFFGGHHII,2.4,5.6,7.5,1
sample2.hsv
SHIRT-RED
SHIRT-BLUE
SHIRT-GREEN
SHOE-RED
SHOE-BLUE
sample2.csv
SHIRT,RED,1
SHIRT,BLUE,0
SHIRT,GREEN,0
SHOE,RED,1
SHOE,BLUE,0
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments