XSLT + regular expression replace

tschlein

I'm having a XML snippet like this:

...
<housenumber>23</housenumber>
...
<housenumber>453a</housenumber>
...
<housenumber>76-79</housenumber>
...
<housenumber>12 foo bar something 43</housenumber>
...

How can I cut these housenumbers into two parts with XSLT so that I get two variables - the first containing "everything from position 1 to the first occurence of a non-numeric character" and the second containing "everything else"?

So something like this:

...
<housenumber>23</housenumber>
<!-- v1 = 23, v2 = null -->
...
<housenumber>453a</housenumber>
<!-- v1 = 453, v2 = a -->
...
<housenumber>76-79</housenumber>
<!-- v1 = 76, v2 = -79 -->
...
<housenumber>12 foo bar something 43</housenumber>
<!-- v1 = 12, v2 = foo bar something 43 -->
...

Any hints/ideas?

Thanks.

Martin Honnen

As already pointed out in a comment, analyze-string can help, here is an example using XSLT 3.0 (as supported by Saxon 9.7 or Exselt) which makes use of the XPath 3.0 function analyze-string (https://www.w3.org/TR/xpath-functions-30/#func-analyze-string)

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:fn="http://www.w3.org/2005/xpath-functions"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math fn"
    version="3.0">

    <xsl:template match="root">
        <xsl:variable name="matches" select="housenumber/analyze-string(., '(^[0-9]+)([^0-9]?.*)')//fn:match"/>
        <xsl:variable name="v1" select="$matches//fn:group[@nr = 1]/xs:integer(.)"/>
        <xsl:variable name="v2" select="$matches//fn:group[@nr = 2]/string()"/>
        <integers>
            <xsl:value-of select="$v1" separator=","/>
        </integers>
        <strings>
            <xsl:value-of select="$v2" separator=","/>
        </strings>
    </xsl:template>
</xsl:stylesheet>

With the sample

<root>
    <housenumber>23</housenumber>
    ...
    <housenumber>453a</housenumber>
    ...
    <housenumber>76-79</housenumber>
    ...
    <housenumber>12 foo bar something 43</housenumber>
</root>

I get the output

<integers>23,453,76,12</integers><strings>,a,-79, foo bar something 43</strings>

With XSLT 2.0 you could use the xsl:analyze-string instruction:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">

    <xsl:template match="root">
        <xsl:variable name="matches" as="element(match)*">
            <xsl:apply-templates select="housenumber"/>
        </xsl:variable>
        <xsl:variable name="v1" select="$matches//group[@nr = 1]/xs:integer(.)"/>
        <xsl:variable name="v2" select="$matches//group[@nr = 2]/string()"/>
        <integers>
            <xsl:value-of select="$v1" separator=","/>
        </integers>
        <strings>
            <xsl:value-of select="$v2" separator=","/>
        </strings>
    </xsl:template>

    <xsl:template match="housenumber">
        <xsl:analyze-string select="." regex="(^[0-9]+)([^0-9]?.*)">
            <xsl:matching-substring>
                <match>
                    <group nr="1">
                        <xsl:value-of select="regex-group(1)"/>
                    </group>
                    <group nr="2">
                        <xsl:value-of select="regex-group(2)"/>
                    </group>
                </match>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:template>

</xsl:stylesheet>

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related