如何调整Hive插入覆盖分区？

William R 发表于 Dev

威廉·R

我在蜂巢中编写了插入覆盖分区，以将分区中的所有文件合并为更大的文件，

SQL：

SET hive.exec.compress.output=true;
set hive.merge.smallfiles.avgsize=2560000000;
set hive.merge.mapredfiles=true;
set hive.merge.mapfiles =true;
SET mapreduce.max.split.size=256000000;
SET mapreduce.min.split.size=256000000;
SET mapreduce.output.fileoutputformat.compress.type =BLOCK;
SET hive.hadoop.supports.splittable.combineinputformat=true;
SET mapreduce.output.fileoutputformat.compress.codec=${v_compression_codec};

INSERT OVERWRITE TABLE ${source_database}.${table_name} PARTITION (${line}) \n SELECT ${prepare_sel_columns} \n from ${source_database}.${table_name} \n WHERE ${partition_where_clause};\n"

通过以上设置，我获得了压缩输出，但是生成输出文件所花费的时间太长。

即使它只运行地图作业，也要花费很多时间。

从蜂巢侧寻找任何进一步的设置，以调整“插入”以使其运行更快。

指标。

15 GB文件==>需要10分钟。

威廉·R

SET hive.exec.compress.output=true;
SET mapreduce.input.fileinputformat.split.minsize=512000000; 
SET mapreduce.input.fileinputformat.split.maxsize=5120000000;
SET mapreduce.output.fileoutputformat.compress.type =BLOCK;
SET hive.hadoop.supports.splittable.combineinputformat=true;
SET mapreduce.output.fileoutputformat.compress.codec=${v_compression_codec};

上面的设置非常有用，持续时间从10分钟减少到1分钟。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。