Ruby: CSV parser tripping over double quotation in my data

Tatiana Frank

I 'm working on a daily scheduled rake task that will download a CSV that is automatically sent to Dropbox every day, parse it and save to the database. I don't have control over the way the data is entered into the program that generates the CSV reports for this, so I can't avoid there being double quotes used in some of the data. However, I am wondering if there is a way I can strip or replace them with single quotes within the rake task or somehow inform the parser so it doesn't throw this error.

Rake task code:

require 'net/http'
require 'csv'
require 'open-uri'

namespace :fp_import do
    desc "download abc_relations from dropbox, save as csv, create or update record in db"
    task :fp => :environment do
        data = URI.parse("<<file's dropbox link>>").read

       File.open(Rails.root.join('lib/assets', 'fp_relation.csv'), 'w') do |file|
         file.write(data)
       end

       file= Rails.root.join('lib/assets', 'fp_relation.csv')

        CSV.foreach(file) do |row|
            div, fg_style, fg_color, factory, part_style, part_color, comp_code, vendor, design_no, comp_type = row
            fg_sku = fg_style + "-" + fg_color
            part_sku = part_style + "-" + part_color

            relation = FgPart.where('part_sku LIKE ? AND fg_sku LIKE?', "%#{part_sku}%", "%#{fg_sku}%").exists?
            if relation == false

                FgPart.create(fg_style: fg_style, fg_color: fg_color, fg_sku: fg_sku, factory: factory, part_style: part_style, part_color: part_color, part_sku: part_sku, comp_code: comp_code, comp_type: comp_type, design_no: design_no)
            end
        end
    end
end

There are about 35,000 rows in this CSV. Below is a sample. You can see the double quotes in the 4th row of the sample.

Sample data:

"01","502210","018","ZH","5931","001","M","","UPHOLSTERED GLIDER A","RM"
"01","502310","053","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","502310","065","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","502312","424","ZH","25332","NO","O","","UPHOLSTERED GLIDER"AUS"","BAG"
"01","503210","277","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","503310","076","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","506210","018","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","506210","467","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","507610","932","AZ","25332","NO","O","","GLIDER","BAG"
"01","507610","932","AZ","5936","001","M","","GLIDER","RM"
Agush

The source CSV is malformed, quotes should be escaped before.

I would edit the file before parsing it with CSV and remove quotes between commas, and replace double quotes with simple ones, you can create a new file in case you don't want to edit the original.

def fix_csv(file)
  out = File.open("fixed_"+file, 'w')
  File.readlines(file).each do |line|
    line = line[1...-2] #remove beggining and end quotes
    line.gsub!(/","/,",") #remove all quotes between commas
    line.gsub!(/"/,"'") #replace double quotes to single
    out << line +"\n" #add the line plus endline to output
  end

  out.close
  return "fixed_"+file
end

In case you want to modify the same CSV file, you can do it this way:

require 'tempfile'
require 'fileutils'

def modify_csv(file)
  temp_file = Tempfile.new('temp')
  begin
    File.readlines(file).each do |line|
      line = line[1...-2]
      line.gsub!(/","/,",")
      line.gsub!(/"/,"'")
      temp_file << line +"\n"
    end
    temp_file.close
    FileUtils.mv(temp_file.path, file)
  ensure
    temp_file.close
    temp_file.unlink
  end
end

This is explained here in case you want to take a look, this will fix or sanitize your original CSV file

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Lots of red in htop -- does that mean my tasks are tripping over each other?

From Dev

Handling Data with and Without double quotation marks In Hive

From Dev

What is the advantage of using csv.reader over writing my own parser in python

From Dev

Twig - Replacing double quotation in an href inside data-description

From Dev

PowerShell – CSV (import/export) removes my quotation marks

From Dev

Use of sed with double quotation

From Dev

AngularJs printing double quotation

From Dev

Stop Eclipse to show documents when mouse is hover over words in double quotation

From Dev

Android: Simple XML SAX Parser - Displays Same Data Over and Over

From Dev

str_getcsv doesn't enclose first column in double quotation marks in multi-line CSV

From Dev

str_getcsv doesn't enclose first column in double quotation marks in multi-line CSV

From Dev

Spring Boot + Cucumber test: cucumber cannot detect my step definition method due to double quotation escaping in JSON

From Dev

JSON.NET Parser *seems* to be double serializing my objects

From Dev

Insert double quotation about double quotation mark in string java?

From Dev

Double Quotation marks for html classes

From Dev

how to escape double quotation in terraform?

From Dev

Ruby CSV gem returning Infinity instead of double

From Dev

Double scale for CSV data in gnuplot

From Dev

Opencsv parser in JAVA, unable to parse double quotes in the data

From Dev

Changing over from double data type to decimal

From Dev

Parsing data from CSV with Ruby?

From Dev

syslog-ng replacing double quotation with single quotation in MESSAGE

From Dev

CSV rows wrapped in quotation marks

From Dev

Pandas error tokenizing data when field in csv file contains quotation mark

From Dev

Cannot calculate double over time within my method

From Dev

Error when I added a data object inside my JSON Parser

From Dev

Escape sequence for double quotation mark is not working

From Dev

Remove double quotation from write statement in vba

From Dev

double quotation inside url c#

Related Related

  1. 1

    Lots of red in htop -- does that mean my tasks are tripping over each other?

  2. 2

    Handling Data with and Without double quotation marks In Hive

  3. 3

    What is the advantage of using csv.reader over writing my own parser in python

  4. 4

    Twig - Replacing double quotation in an href inside data-description

  5. 5

    PowerShell – CSV (import/export) removes my quotation marks

  6. 6

    Use of sed with double quotation

  7. 7

    AngularJs printing double quotation

  8. 8

    Stop Eclipse to show documents when mouse is hover over words in double quotation

  9. 9

    Android: Simple XML SAX Parser - Displays Same Data Over and Over

  10. 10

    str_getcsv doesn't enclose first column in double quotation marks in multi-line CSV

  11. 11

    str_getcsv doesn't enclose first column in double quotation marks in multi-line CSV

  12. 12

    Spring Boot + Cucumber test: cucumber cannot detect my step definition method due to double quotation escaping in JSON

  13. 13

    JSON.NET Parser *seems* to be double serializing my objects

  14. 14

    Insert double quotation about double quotation mark in string java?

  15. 15

    Double Quotation marks for html classes

  16. 16

    how to escape double quotation in terraform?

  17. 17

    Ruby CSV gem returning Infinity instead of double

  18. 18

    Double scale for CSV data in gnuplot

  19. 19

    Opencsv parser in JAVA, unable to parse double quotes in the data

  20. 20

    Changing over from double data type to decimal

  21. 21

    Parsing data from CSV with Ruby?

  22. 22

    syslog-ng replacing double quotation with single quotation in MESSAGE

  23. 23

    CSV rows wrapped in quotation marks

  24. 24

    Pandas error tokenizing data when field in csv file contains quotation mark

  25. 25

    Cannot calculate double over time within my method

  26. 26

    Error when I added a data object inside my JSON Parser

  27. 27

    Escape sequence for double quotation mark is not working

  28. 28

    Remove double quotation from write statement in vba

  29. 29

    double quotation inside url c#

HotTag

Archive