What would be the most primitive way of parsing a tab-separated file in Java, so that the tabular data would not lose the structure? I am not looking for a way to do it with Bean or Jsoup, since they are not familiar to me, a beginner. I need advice on what would be the logic behind it and what would be the efficient way to do it, for example if I have a table like
ID reference | Identifier | Type 1| Type 2 | Type 3 |
1 | red#01 | 15% | 20% | 10% |
2 | yellow#08 | 13% | 20% | 10% |
Correction: In this example I have Types 1 - 3, but my question applies to N number of types.
Can I achieve table parsing by just using arrays or is there a different data structure in Java that would be better for this task? This is how I think I should do it:
"\t"
and create a String array.Does this plan sound right or am I overcomplicating things/being completely wrong? Is there an easier way to do it? (provided that I still don't know how to split arrays into subarrays and how to populate the subarrays with the values from the table)
I would strongly suggest you use a read flat file parsing library for this, like the excellent OpenCSV.
Failing that, here is a solution in Java 8.
First, create a class to represent your data:
static class Bean {
private final int id;
private final String name;
private final List<Integer> types;
public Bean(int id, String name, List<Integer> types) {
this.id = id;
this.name = name;
this.types = types;
}
//getters
}
Your suggestion to use various lists is very scripting based. Java is OO so you should use that to your advantage.
Now we just need to parse the file:
public static void main(final String[] args) throws Exception {
final Path path = Paths.get("path", "to", "file.tsv");
final List<Bean> parsed;
try (final Stream<String> lines = Files.lines(path)) {
parsed = lines.skip(1).map(line -> line.split("\\s*\\|\\s*")).map(line -> {
final int id = Integer.parseInt(line[0]);
final String name = line[1];
final List<Integer> types = Arrays.stream(line).
skip(2).map(t -> Integer.parseInt(t.replaceAll("\\D", ""))).
collect(Collectors.toList());
return new Bean(id, name, types);
}).collect(Collectors.toList());
}
}
In essence the code skips the first line then loops over lines in the file and for each line:
|
. This requires regex so you need to escape the pipe as it is a special character. Also we consume any spaces before/after the delimiter.new Bean
for each line by parsing the array elements.int
Stream
of the lines, skip the first two elements, and parse the remaining to a List<Integer>
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments