Regex for custom parsing

Tara

Regex isn't my strongest point. Let's say I need a custom parser for strings which strips the string of any letters and multiple decimal points and alphabets.

For example, input string is "--1-2.3-gf5.47", the parser would return "-12.3547". I could only come up with variations of this :

string.replaceAll("[^(\\-?)(\\.?)(\\d+)]", "")

which removes the alphabets but retains everything else. Any pointers?

More examples: Input: -34.le.78-90 Output: -34.7890

Input: df56hfp.78 Output: 56.78

Some rules:

  • Consider only the first negative sign before the first number, everything else can be ignored.
  • I'm trying to do this using Java.
  • Assume the -ve sign, if there is one, will always occur before the decimal point.
Matthew

Just tested this on ideone and it seemed to work. The comments should explain the code well enough. You can copy/paste this into Ideone.com and test it if you'd like.

It might be possible to write a single regex pattern for it, but you're probably better off implementing something simpler/more readable like below.

The three examples you gave prints out:

--1-2.3-gf5.47   ->   -12.3547
-34.le.78-90     ->   -34.7890
df56hfp.78       ->    56.78

import java.util.*;
import java.lang.*;
import java.io.*;

/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        System.out.println(strip_and_parse("--1-2.3-gf5.47"));
        System.out.println(strip_and_parse("-34.le.78-90"));
        System.out.println(strip_and_parse("df56hfp.78"));
    }

    public static String strip_and_parse(String input)
    {
        //remove anything not a period or digit (including hyphens) for output string
        String output = input.replaceAll("[^\\.\\d]", "");

        //add a hyphen to the beginning of 'out' if the original string started with one
        if (input.startsWith("-"))
        {
            output = "-" + output;
        }

        //if the string contains a decimal point, remove all but the first one by splitting
        //the output string into two strings and removing all the decimal points from the
        //second half           
        if (output.indexOf(".") != -1)
        {
            output = output.substring(0, output.indexOf(".") + 1) 
                   + output.substring(output.indexOf(".") + 1, output.length()).replaceAll("[^\\d]", "");
        }

        return output;
    }
}

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related