A strategy for parsing a tab-separated file

nanachan

What would be the most primitive way of parsing a tab-separated file in Java, so that the tabular data would not lose the structure? I am not looking for a way to do it with Bean or Jsoup, since they are not familiar to me, a beginner. I need advice on what would be the logic behind it and what would be the efficient way to do it, for example if I have a table like

ID reference | Identifier    | Type 1| Type 2  | Type 3 |
1            | red#01        | 15%   |  20%    | 10%    |
2            | yellow#08     | 13%   |  20%    | 10%    |

Correction: In this example I have Types 1 - 3, but my question applies to N number of types.

Can I achieve table parsing by just using arrays or is there a different data structure in Java that would be better for this task? This is how I think I should do it:

  1. Scan/read the first line splitting at "\t" and create a String array.
  2. Split that array into sub-arrays of 1 table heading per sub-array
  3. Then, start reading the next line of the table, and for each sub-array, add the corresponding values from the columns.

Does this plan sound right or am I overcomplicating things/being completely wrong? Is there an easier way to do it? (provided that I still don't know how to split arrays into subarrays and how to populate the subarrays with the values from the table)

Boris the Spider

I would strongly suggest you use a read flat file parsing library for this, like the excellent OpenCSV.

Failing that, here is a solution in Java 8.

First, create a class to represent your data:

static class Bean {

    private final int id;
    private final String name;
    private final List<Integer> types;

    public Bean(int id, String name, List<Integer> types) {
        this.id = id;
        this.name = name;
        this.types = types;
    }

    //getters 

}

Your suggestion to use various lists is very scripting based. Java is OO so you should use that to your advantage.

Now we just need to parse the file:

public static void main(final String[] args) throws Exception {
    final Path path = Paths.get("path", "to", "file.tsv");
    final List<Bean> parsed;
    try (final Stream<String> lines = Files.lines(path)) {
        parsed = lines.skip(1).map(line -> line.split("\\s*\\|\\s*")).map(line -> {
            final int id = Integer.parseInt(line[0]);
            final String name = line[1];
            final List<Integer> types = Arrays.stream(line).
                    skip(2).map(t -> Integer.parseInt(t.replaceAll("\\D", ""))).
                    collect(Collectors.toList());
            return new Bean(id, name, types);
        }).collect(Collectors.toList());
    }
}

In essence the code skips the first line then loops over lines in the file and for each line:

  1. Split the line on the delimiter - seems to be |. This requires regex so you need to escape the pipe as it is a special character. Also we consume any spaces before/after the delimiter.
  2. Create a new Bean for each line by parsing the array elements.
  3. First parse the id to an int
  4. Next get the name
  5. Finally get a Stream of the lines, skip the first two elements, and parse the remaining to a List<Integer>

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

A strategy for parsing a tab-separated file

From Dev

How to convert space separated file into tab separated?

From Dev

Parsing separated text file in Scala

From Dev

parsing tab delimited file in javascript

From Dev

Parsing comma separated JSON from a file

From Dev

How to find the number of columns in a tab separated file

From Dev

str_getcsv on a tab-separated file

From Dev

Reading tab separated file using getchar()

From Dev

Read a file separated by tab and put the words in an ArrayList

From Dev

Writing into text file as tab separated columns in Python

From Dev

php looping through tab separated file

From Dev

PowerShell tab separated file contents import

From Dev

PowerShell Tab Separated File Import and Blank Values

From Dev

Writing into text file as tab separated columns in Python

From Dev

Saving a tab separated file in Excel without quotes

From Dev

How to sort a tab-separated file?

From Dev

Format Tab separated file to replace space with a character

From Dev

PowerBuilder overwrite txt file to make it tab separated

From Dev

Fix text file from double space separated to tab separated

From Dev

Parsing a tab-delimited text file

From Dev

How do I produce a tab separated file from a text file?

From Dev

How do I produce a tab separated file from a text file?

From Dev

Parsing a file containing multiple JSON objects separated by blank lines or tabs

From Dev

Parsing a comma separated file using C using fscanf()

From Dev

Parsing a text file in Shell Script separated by comma and equalsTo sign

From Dev

Parsing CSV File when header fields separated by Space

From Dev

How to read tab separated file into data.table using fread?

From Dev

Read a tab separated file with first column as key and the rest as values

From Dev

How do I create a tab separated file from a hive query?

Related Related

  1. 1

    A strategy for parsing a tab-separated file

  2. 2

    How to convert space separated file into tab separated?

  3. 3

    Parsing separated text file in Scala

  4. 4

    parsing tab delimited file in javascript

  5. 5

    Parsing comma separated JSON from a file

  6. 6

    How to find the number of columns in a tab separated file

  7. 7

    str_getcsv on a tab-separated file

  8. 8

    Reading tab separated file using getchar()

  9. 9

    Read a file separated by tab and put the words in an ArrayList

  10. 10

    Writing into text file as tab separated columns in Python

  11. 11

    php looping through tab separated file

  12. 12

    PowerShell tab separated file contents import

  13. 13

    PowerShell Tab Separated File Import and Blank Values

  14. 14

    Writing into text file as tab separated columns in Python

  15. 15

    Saving a tab separated file in Excel without quotes

  16. 16

    How to sort a tab-separated file?

  17. 17

    Format Tab separated file to replace space with a character

  18. 18

    PowerBuilder overwrite txt file to make it tab separated

  19. 19

    Fix text file from double space separated to tab separated

  20. 20

    Parsing a tab-delimited text file

  21. 21

    How do I produce a tab separated file from a text file?

  22. 22

    How do I produce a tab separated file from a text file?

  23. 23

    Parsing a file containing multiple JSON objects separated by blank lines or tabs

  24. 24

    Parsing a comma separated file using C using fscanf()

  25. 25

    Parsing a text file in Shell Script separated by comma and equalsTo sign

  26. 26

    Parsing CSV File when header fields separated by Space

  27. 27

    How to read tab separated file into data.table using fread?

  28. 28

    Read a tab separated file with first column as key and the rest as values

  29. 29

    How do I create a tab separated file from a hive query?

HotTag

Archive