열 개수가 헤더 열 개수와 동일하지 않은 데이터 줄을 제거해야합니다. 다음은 필드에 큰 따옴표 안에 쉼표가있는 데이터가 포함 된 경우를 제외하고 작동합니다. 제발 고치는 방법에 대한 아이디어가 있습니까?
cleanColumns=$(awk -F, 'NR==1{ count=NF; } NF==count' testData.txt);
echo "$cleanColumns" > noErrors.tx
전에
timeStamp,elapsed,label,responseCode,responseMessage,dataType,success,bytes,grpThreads,allThreads,Latency,Hostname,IdleTime,Connect
1459774220811,2018,Fizz_Homepage_2,403," transaction : 1,failing samples : 0",,false,12928,2,2,0,HOST1,5008,0
1459774225103,3485,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,1138878,2,2,0,HOST1,5022,0
1459774227844,1653,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,18792,2,2,0,HOST1,5012,0
1459774227844,1653,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,18792,2,2,0,HOST1,
1459774227844,1653,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,
후
1459774220811,2018,Fizz_Homepage_2,403," transaction : 1,failing samples : 0",,false,12928,2,2,0,HOST1,5008,0
1459774225103,3485,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,1138878,2,2,0,HOST1,5022,0
1459774227844,1653,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,18792,2,2,0,HOST1,5012,0
FPAT 용 GNU awk 사용 :
$ awk -v FPAT='[^,]*|"[^"]+"' 'NR==1{nf=NF} NF==nf' file
timeStamp,elapsed,label,responseCode,responseMessage,dataType,success,bytes,grpThreads,allThreads,Latency,Hostname,IdleTime,Connect
1459774220811,2018,Fizz_Homepage_2,403," transaction : 1,failing samples : 0",,false,12928,2,2,0,HOST1,5008,0
1459774225103,3485,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,1138878,2,2,0,HOST1,5022,0
1459774227844,1653,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,18792,2,2,0,HOST1,5012,0
다른 awk의 while(match())
경우 동일한 정규식을 가진 루프가 필요합니다 . 예 :
$ cat tst.awk
BEGIN { FS=RS; OFS="," }
{
head = ""
tail = $0
while( (tail!="") && match(tail,/[^,]*|"[^"]+"/) ) {
head = head (head==""?"":FS) substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART+RLENGTH+1)
}
$0 = head tail
}
NR==1 { nf=NF }
NF==nf { $1=$1; print }
$
$ awk -f tst.awk file
timeStamp,elapsed,label,responseCode,responseMessage,dataType,success,bytes,grpThreads,allThreads,Latency,Hostname,IdleTime,Connect
1459774220811,2018,Fizz_Homepage_2,403," transaction : 1,failing samples : 0",,false,12928,2,2,0,HOST1,5008,0
1459774225103,3485,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,1138878,2,2,0,HOST1,5022,0
1459774227844,1653,Fizz_Launch_1,200," transaction : 1,failing samples : 0",,true,18792,2,2,0,HOST1,5012,0
위의 코드는 각 필드가 먼저 줄 바꿈으로 구분 된 레코드를 생성 한 다음 인쇄하기 전에 줄 바꿈을 다시 쉼표로 변경하므로 필요한 것 이상을 수행합니다. 이는 루프를 작성하거나 액세스 할 수있는 필드의 수를 가져 오는 것보다 당신이 사용했던 것처럼 필드 등 FPAT
. 다음은 GNU awk가 없을 때 CSV 파일에서 필드를 식별하는 문제에 대한 일반적인 접근 방식입니다.
$ cat tst.awk
{
csv2flds()
for (i=0;i<=NF;i++) {
print "NR="NR, "NF="NF, "$"i"="$i
}
print "-----"
}
function csv2flds( head, tail, ofs) {
ofs=OFS; OFS=","; FS=RS
head = ""
tail = $0
while( (tail!="") && match(tail,/[^,]*|"[^"]+"/) ) {
head = head (head==""?"":FS) substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART+RLENGTH+1)
}
$0 = head tail # calculates NF and splits into fields using FS="\n"
$1 = $1 # converts "xFSy" into "xOFSy" so "x\ny" becomes "x,y"
FS=OFS; OFS=ofs
}
.
$ awk -f tst.awk file | head -32
NR=1 NF=14 $0=timeStamp,elapsed,label,responseCode,responseMessage,dataType,success,bytes,grpThreads,allThreads,Latency,Hostname,IdleTime,Connect
NR=1 NF=14 $1=timeStamp
NR=1 NF=14 $2=elapsed
NR=1 NF=14 $3=label
NR=1 NF=14 $4=responseCode
NR=1 NF=14 $5=responseMessage
NR=1 NF=14 $6=dataType
NR=1 NF=14 $7=success
NR=1 NF=14 $8=bytes
NR=1 NF=14 $9=grpThreads
NR=1 NF=14 $10=allThreads
NR=1 NF=14 $11=Latency
NR=1 NF=14 $12=Hostname
NR=1 NF=14 $13=IdleTime
NR=1 NF=14 $14=Connect
-----
NR=2 NF=14 $0=1459774220811,2018,Fizz_Homepage_2,403," transaction : 1,failing samples : 0",,false,12928,2,2,0,HOST1,5008,0
NR=2 NF=14 $1=1459774220811
NR=2 NF=14 $2=2018
NR=2 NF=14 $3=Fizz_Homepage_2
NR=2 NF=14 $4=403
NR=2 NF=14 $5=" transaction : 1,failing samples : 0"
NR=2 NF=14 $6=
NR=2 NF=14 $7=false
NR=2 NF=14 $8=12928
NR=2 NF=14 $9=2
NR=2 NF=14 $10=2
NR=2 NF=14 $11=0
NR=2 NF=14 $12=HOST1
NR=2 NF=14 $13=5008
NR=2 NF=14 $14=0
-----
이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.
침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제
몇 마디 만하겠습니다