I have read that it's a bad idea to parse XML/HTML using regular expressions. The alternative suggestion is to use an XML parser. Does one exist in the BigQuery Standard SQL library?
Here is the documentation to how to use Javascript UDFs in BigQuery like Elliot has mentioned.
https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
I imagine the UDF might look something like
CREATE TEMPORARY FUNCTION XML(x STRING)
RETURNS STRING
LANGUAGE js AS """
var data = fromXML(x);
return data.title;
"""
OPTIONS(
library="gs://<BUCKET_NAME>/from-xml.min.js"
);
SELECT XML(a) FROM UNNEST(["<title>Title of Page</title>"]) as a
Where from-xml.min.js is from this library and loaded into your gcs account
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加