在Lambda中读取AWS S3 CSV列名称

debugcn 发表于 Dev

我正在尝试编写一个脚本，该脚本从AWS Aurora Serverless MySQL数据库表收集架构，从AWS S3存储桶中存储的CSV文件收集列标题，并且仅在其列标题是子集的情况下才将CSV写入表模式的名称（例如，如果表字段为['Name'，'DOB'，'Height']，而CSV字段为['Name'，'DOB'，'Weight']，则脚本将引发异常。

到目前为止，我已经在AWS Lambda函数中成功返回了表架构，并成功读取了CSV文件，但是我不确定如何从S3对象获取列标题。

def return_db_schema(event):
    schema = []
    conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
    with conn.cursor() as cur:
        cur.execute('SHOW columns FROM SampleTable')
        conn.commit()
        cur.close()
        for row in cur:
            schema.append(list(row)[0])
        return schema

def return_csv_cols(event):
    s3 = boto3.client('s3')
    tester = s3.get_object(Bucket=s3_bucket, Key=test_key)
    contents = tester['Body'].read()

def main(event, context):
    print(return_db_schema(event))
    print()
    print(return_csv_cols(event))

我不确定如何从此处继续（例如，是否有一种方法可以在不将CSV加载到pandas DataFrame并调用df.columns()或类似的方式的情况下进行此操作？）。

我已经用以下代码解决了这个问题：

    s3 = boto3.client('s3')
    tester = s3.get_object(Bucket=s3_bucket, Key=test_key)
    contents = tester['Body'].read().decode('UTF-8')
    cols = contents.split('\n')[0].split(',')
    return cols, contents

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。