So I've got a .csv
file that I've imported into an array. They're all comma separated so I've gone ahead and made a nice array for em.
Now I'm trying to find records with matching id's so I can remove duplicates and only keep the last encountered. Using ID
for instance.
I've imported to array but for some reason I can't get a tool like uniq to display the new unique list even though when I do .length on it, it returns the right amount of rows.
Any help would be greatly appreciated.
CODE
lines = []
i = 0
file = File.open("./properties.csv", "r")
elements = Array[]
element2 = Array[]
output = Array[]
while (line = file.gets)
i += 1
# use split to break array up using commas
arr = line.split(',')
elements.push({ id: arr[0], streetAddress: arr[1], town: arr[2], valuationDate: arr[3], value: arr[4] })
end
file.close
# Loop through array and sort nicely
element2 = elements.group_by { |c| c[:id] }.values.select { |elements| elements.size > 1 }
output = (element2.uniq)
puts output
puts element2.length
SAMPLE .CSV FILE
ID,Street address,Town,Valuation date,Value
1,1 Northburn RD,WANAKA,1/1/2015,280000
2,1 Mount Ida PL,WANAKA,1/1/2015,280000
3,1 Mount Linton AVE,WANAKA,1/1/2015,780000
1,1 Northburn RD,WANAKA,1/1/2015,330000
2,1 Mount Ida PL,WANAKA,1/1/2015,330000
3,1 Mount Linton AVE,WANAKA,1/1/2015,830000
1,1 Northburn RD,WANAKA,1/1/2016,340000
2,1 Mount Ida PL,WANAKA,1/1/2016,340000
3,1 Mount Linton AVE,WANAKA,1/1/2016,840000
4,1 Kamahi ST,WANAKA,1/1/2016,215000
5,1 Kapuka LANE,WANAKA,1/1/2016,209000
6,1 Mohua MEWS,WANAKA,1/1/2016,620000
7,1 Kakapo CT,WANAKA,1/1/2016,490000
8,1 Mt Gold PL,WANAKA,1/1/2016,1320000
9,1 Penrith Park DR,WANAKA,1/1/2016,1310000
So I've actually swapped my approach to using hashes. which seems to automatically remove duplicates and leave the last encountered record intact? Can anyone shed some light here?
require 'csv'
element = {}
CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row|
element[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
puts element["1"]
element.each do |key, value|
puts key
puts value
end
puts "#{element.length} records returned"
To keep the first matching element, instead of the last, you can do a key existence check before assigning the value. This can be done like so:
CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row|
key = row.fields[0]
if !element.key?(key)
element[key] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
end
which can also be written much more efficiently like this:
CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row|
element[row.fields[0]] ||= Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
Note that these methods to preserve the first found record for a key will perform much better than the version that preserves the final found record for a key. This is because of work avoidance, primarily in producing the hash value, which is done with slice
and zip
in this code.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments