Unable to get accurate record count of a csv file

debugcn Published at Dev

NightOwl Skeptic

I have a .csv file that has some values formatted as paragraphs with line breaks or sometimes bullet points.

"STAT","ID","DESC"
"UPD", "1", "Updated"
"CHG", "2", "Changed"
"UPD", "3", "Updated.
Might have to update again"
"UPD", "4", "Updated.

 - once 
 - twice
 - thrice
"
"DEL", "5", "unknown"
"DEL", "6", "Deleted
Need to restore"

I need to count the number of records, used awk like below since I knew 2nd column is unique id but returns more than I have. Above text is ofcourse dummy as I'm not allowed to share original but I tried to reflect as closely as I can.

 awk  '{print $2}' FS=","  sample.csv | wc -l

I even printed out first column using awk '{print $1}' to check just the first column values but the output shows the starting parts of new line in paragraphs.

Please let me know if any additional info is needed and I'll update the question.

G-Man Says 'Reinstate Monica'

A way to do this with awk is

awk -v RS=$'"\n"' 'END {print NR}' sample.csv

RS=$'"\n"' sets the Record Separator (which is newline by default) to the three-character string ", newline, ". This syntax might work only in bash. This will cause your file to be decomposed into the following records:

1: "STAT","ID","DESC

2: UPD", "1", "Updated

3: CHG", "2", "Changed

4: UPD", "3", "Updated.
Might have to update again

5: UPD", "4", "Updated.

- once
- twice
- thrice

6: DEL", "5", "unknown

7: DEL", "6", "Deleted
Need to restore"

This assumes that there are no trailing blanks in the file.
'END {print NR}' reads the file until the end and then prints the record number – in other words, the number of records.

Text files are generally considered to consist of a sequence of lines, delimited by newline characters or character sequences. And, generally, a “record” in a text file is considered to be one line. But awk lets you specify a record separator other than newline. Since the quote-newline-quote string appears between each pair of consecutive records in your file, specifying it as your record separator breaks the file into (very nearly) the records that you want.

But the record separator is like the wall between two rooms – it isn’t part of either one. In normal awk processing, you see records that are lines without newline characters – they are removed. Similarly, in my answer, the quote-newline-quote sequences are removed. But, since there isn’t a record separator before the first record or after the last one, the very first and very last quote characters are not removed.

If you want to process the file, one record at a time, this solution might not be good enough, because the first record and last records are treated differently. I agree (somewhat) with Glenn’s recommendation that, for any serious work, you should use a “proper CSV parser”.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-06-24

Comments

0 comments

From Dev

Related Related

Article

Unable to get accurate record count of a csv file

Unable to get accurate record count of a csv file

How to get the length of a record from a csv file?

Unable to get an accurate threshold of an image with a bright spot

How to get accurate path of the uploaded file in jsp?

Get count of all commas in a each line from csv file

Get Unique Column and Count from CSV file in Powershell

Get record count in Azure DocumentDb

Get count for each record mysql

Unable to get the accurate value of shell variable set inside a pipe

Unable to get accurate results from Browserlocation API with curl

Row count in a csv file

fast, accurate, reliable way to remove undesirable values from a csv file

Unable to SaveAs CSV file

Export database record as CSV file

Finding a specific record in a csv file

Get raw record in Apache CSV

Why is count not accurate?

PySpark distinct().count() on a csv file

Count the elements in a csv file with python

Count based on other csv file

MySQL get date of record where count was achieved

SQL COUNT - Advance, get the total of each record

How to get all month record count in laravel

LINQ query to get count of joined record

Unable to write a csv file in spark

Unable to lowercase the header of csv file

Unable to Get CSV as OutPut - PHP

How to get record count if count is already used in group by function

Move file depending on row count of csv file

Python exract unique record from csv file