我正在尝试使用 a 读取一个包含超过 100 万行的非常大的 CSV 文件,FlatFileItemReader
但是在启动我的批处理作业时,我OutOfMemoryException
在大约 10 分钟后得到了一个。
这是我的代码:
@Slf4j
@Configuration
@EnableBatchProcessing
@ComponentScan({
"f.p.f.batch",
"f.p.f.batch.tasklet"
})
public class BatchConfig {
@Autowired
private StepBuilderFactory steps;
@Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private DemoTasklet demoTasklet;
@Bean
public ResourcelessTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
}
@Bean
public JobRepository jobRepository(ResourcelessTransactionManager transactionManager) {
MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean(transactionManager);
mapJobRepositoryFactoryBean.setTransactionManager(transactionManager);
try {
return mapJobRepositoryFactoryBean.getObject();
} catch (Exception ex) {
log.error("Exception : {}", ex.getMessage(), ex);
return null;
}
}
@Bean
//@StepScope
public FlatFileItemReader<Balance> csvAnimeReader() {
FlatFileItemReader<Balance> reader = new FlatFileItemReader<>();
DefaultLineMapper lineMapper = new DefaultLineMapper();
FieldSetMapper fieldSetMapper = new BalanceFieldSetMapper();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setNames(new String[]{
"EXER",
"IDENT",
"NDEPT",
"LBUDG",
"INSEE",
"SIREN",
"CREGI",
"NOMEN",
"CTYPE",
"CSTYP",
"CACTI",
"FINESS",
"SECTEUR",
"CBUDG",
"CODBUD1",
"COMPTE ",
"BEDEB",
"BECRE",
"OBNETDEB",
"OBNETCRE",
"ONBDEB",
"ONBCRE",
"OOBDEB",
"OOBCRE",
"SD",
"SC"
});
tokenizer.setDelimiter(";");
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
reader.setLineMapper(lineMapper);
reader.setResource(new ClassPathResource("Balance_Exemple_2016.csv"));
reader.setLinesToSkip(1);
return reader;
}
@Bean
public ItemProcessor<Balance, Balance> CsvFileProcessor() {
return new BalanceProcessor();
}
@Bean
public BalanceWriter balanceWriter() {
return new BalanceWriter();
}
@Bean
public SimpleJobLauncher jobLauncher(JobRepository jobRepository) {
SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
simpleJobLauncher.setJobRepository(jobRepository);
return simpleJobLauncher;
}
@Bean
public Step step1() {
return steps.get("step1")
.<Balance, Balance>chunk(1)
.reader(csvAnimeReader())
.writer(balanceWriter())
.build();
}
@Bean
public Step step2() {
return steps.get("step2")
.tasklet(demoTasklet)
.build();
}
@Bean
public Job readCsvJob() {
return jobBuilderFactory.get("readCsvJob")
.incrementer(new RunIdIncrementer())
.flow(step1())
.next(step2())
.end()
.build();
}
}
我建议您使用流式传输,因为您永远不想一次读取所有文件,这是一个主要问题。
这是一篇不错的文章,如何在不占用整个内存空间的情况下更有效地读取文件
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句