파이썬을 사용하여 CSV 파일의 모든 열에서 문자열을 필터링하는 방법

debugcn 에 게시 Dev

abhishek gaikwad

csv 파일 예제 csv 파일이 있는데 모든 열을 확인해야하나요? csv 파일에서 해당 행을 제거하십시오.

아래는 예입니다

Column1 Column 2 Column 3
1 ? 3
2 ?.. 1
? 2 ?.
? 4 4

나는 아래를 시도했지만 작동하지 않습니다.

data = readData(“text.csv”)
print(data)

def Filter(string, substr):
return [str for str in string if
any(sub not in str for sub in substr)]

string = data
substr = [’?’,’?.’,’? ‘,’? ']
filter_data=Filter(string, substr)

내 코드는 tupple에서 ouptut을 얻는 것입니다.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def readData(filename) :
    data = pd.read_csv(filename, skipinitialspace=True)
    return [d for d in data.itertuples(index=False, name=None)]

data = readData("problem2.csv")
print(data)

[('18.0', 8, '307.0 ', '130.0 ', '3504.', '12.0', 70, 1, 'chevrolet chevelle malibu'), ('15.0', 8, '350.0 ', '165.0 ', '3693.', '11.5', 70, 1, 'buick skylark 320'), ('18.0', 8, '318.0 ', '150.0 ', '?.', '11.0', 70, 1, 'plymouth satellite'), ('16.0', 8, '304.0 ', '150.0 ', '3433.', '12.0', 70, 1, 'amc rebel sst'), ('17.0', 8, '302.0 ', '140.0 ', '3449.', '10.5', 70, 1, 'ford torino'), ('15.0', 8, '429.0 ', '198.0 ', '4341.', '10.0', 70, 1, 'ford galaxie 500'), ('14.0', 8, '454.0 ', '220.0 ', '4354.', '9.0', 70, 1, 'chevrolet impala'), ('14.0', 8, '440.0 ', '215.0 ', '4312.', '8.5', 70, 1, 'plymouth fury iii'),

다음으로 '?; 모든 열에서 튜플에서 동일한 출력을 제공합니다.

조 Ferndz

내 입력 파일은 다음과 같습니다.

mpg,cylinder,displace,horsepower,weight,accelerate,year,origin,name
18,8,307,130,3504,12,70,1,chevy malibu
18,8,308,140,?.,14,70,1,plymoth satellite
18,8,309,150,?,15,70,1,ford torino
18,8,310,150,? ,16,70,1,ford galaxy
18,8,310,150, ?,17,70,1,pontiac catalina
18,8,310,150,3505,18,70,1,ford maverick

다음 항목을 대체하는 코드는 다음 ['?','?.',' ?','? ']과 같습니다.

import csv
qs = ['?','?.',' ?','? ']
with open('abc.txt') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        row = ['' if r in qs else r for r in row]
        print (row)

이 결과는 다음과 같습니다.

['mpg', 'cylinder', 'displace', 'horsepower', 'weight', 'accelerate', 'year', 'origin', 'name']
['18', '8', '307', '130', '3504', '12', '70', '1', 'chevy malibu']
['18', '8', '308', '140', '', '14', '70', '1', 'plymoth satellite']
['18', '8', '309', '150', '', '15', '70', '1', 'ford torino']
['18', '8', '310', '150', '', '16', '70', '1', 'ford galaxy']
['18', '8', '310', '150', '', '17', '70', '1', 'pontiac catalina']
['18', '8', '310', '150', '3505', '18', '70', '1', 'ford maverick']

보시다시피 행 3에서 6까지의 값은 ''.

하나 이상의 샘플 데이터 세트로 실행했습니다.

mpg,cylinder,displace,horsepower,weight,accelerate,year,origin,name
18,8,307,130,3504,12,70,1,chevy malibu
18,8,308,140,?.,14,70,1,plymoth satellite
18,8,309,?,3506,15,70,1,ford torino
18,8,310,160,? ,16,70,1,ford galaxy
18,8,311,170,3508, ?,70,1,pontiac catalina
18,8,312,180,3509,18,70,1,ford maverick

출력은 다음과 같습니다.

['mpg', 'cylinder', 'displace', 'horsepower', 'weight', 'accelerate', 'year', 'origin', 'name']
['18', '8', '307', '130', '3504', '12', '70', '1', 'chevy malibu']
['18', '8', '308', '140', '', '14', '70', '1', 'plymoth satellite']
['18', '8', '309', '', '3506', '15', '70', '1', 'ford torino']
['18', '8', '310', '160', '', '16', '70', '1', 'ford galaxy']
['18', '8', '311', '170', '3508', '', '70', '1', 'pontiac catalina']
['18', '8', '312', '180', '3509', '18', '70', '1', 'ford maverick']

이 시나리오에서는 ?다양한 열에 있습니다. 여전히 문제를 해결합니다.

한 번에 모든 행을 찾는 경우 모든 행을 하나의 변수로 읽어서 처리 할 수 있습니다.

qs = {'?.':'',' ?':'','? ':'','?':''}
with open('abc.txt') as csv_file:
    lines = csv_file.readlines()
    for i,text in enumerate(lines):
        [text := text.replace(a,b) for a,b in qs.items()]
        lines[i] = text
    print (lines)

출력 데이터는 다음과 같습니다.

['mpg,cylinder,displace,horsepower,weight,accelerate,year,origin,name\n', '18,8,307,130,3504,12,70,1,chevy malibu\n', '18,8,308,140,,14,70,1,plymoth satellite\n', '18,8,309,,3506,15,70,1,ford torino\n', '18,8,310,160,,16,70,1,ford galaxy\n', '18,8,311,170,3508,,70,1,pontiac catalina\n', '18,8,312,180,3509,18,70,1,ford maverick\n']

튜플 출력

출력으로 튜플을 예상하는 것 같습니다.

이를 수행하는 코드는 다음과 같습니다.

import csv
qs = {'?.':'',' ?':'','? ':'','?':''}
final_list = []

with open('abc.txt') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        row = ['' if r in qs else r for r in row]
        final_list.append(tuple(row))

print (final_list)

출력은 다음과 같습니다.

[('mpg', 'cylinder', 'displace', 'horsepower', 'weight', 'accelerate', 'year', 'origin', 'name'), ('18', '8', '307', '130', '3504', '12', '70', '1', 'chevy malibu'), ('18', '8', '308', '140', '', '14', '70', '1', 'plymoth satellite'), ('18', '8', '309', '', '3506', '15', '70', '1', 'ford torino'), ('18', '8', '310', '160', '', '16', '70', '1', 'ford galaxy'), ('18', '8', '311', '170', '3508', '', '70', '1', 'pontiac catalina'), ('18', '8', '312', '180', '3509', '18', '70', '1', 'ford maverick')]

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-04-5

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사