I am new to Map-reduce and I want to understand what is sequence file data input? I studied in the Hadoop book but it was hard for me to understand.
First we should understand what problems does the SequenceFile try to solve, and then how can SequenceFile help to solve the problems.
Map tasks usually process a block of input at a time (using the default FileInputFormat).
The more the number of files is, the more number of Map task need and the job time can be much more slower.
These two cases require different solutions.
HAR files
SequenceFile
For example, suppose there are 10,000 100KB files, then we can write a program to put them into a single SequenceFile like below, where you can use filename to be the key and content to be the value.
(source: csdn.net)
Some benefits:
Supported Compressions, the file structure depends on the compression type.
Record-Compressed: Compresses each record as it’s added to the file.
(source: csdn.net)
Block-Compressed
(source: csdn.net)
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加