如何在PostgreSQL中的字符串中获取正则表达式匹配的位置？

wobmene 发表于 Dev

Wobmene

我有一个包含书名的表，我想选择标题与正则表达式匹配的书，并按标题中正则表达式匹配项的位置对结果进行排序。

单字搜索很容易。例如

TABLE book
id   title
1    The Sun
2    The Dead Sun
3    Sun Kissed

.*在将查询发送到数据库之前，我将在客户的搜索词中插入单词，因此我将在此处编写带有准备好的正则表达式的SQL。

SELECT book.id, book.title FROM book
    WHERE book.title ~* '.*sun.*'
    ORDER BY COALESCE(NULLIF(position('sun' in book.title), 0), 999999) ASC;

RESULT
id   title
3    Sun Kissed
1    The Sun
2    The Dead Sun

但是，如果搜索词有多个单词，我想将标题中包含搜索词中所有单词的标题与它们之间的任何内容进行匹配，并按照之前的位置进行排序，因此我需要一个返回regexp位置的函数，在正式的PostgreSQL文档中找到合适的。

TABLE books
id   title
4    Deep Space Endeavor
5    Star Trek: Deep Space Nine: The Never Ending Sacrifice
6    Deep Black: Space Espionage and National Security

SELECT book.id, book.title FROM book
    WHERE book.title ~* '.*deep.*space.*'
    ORDER BY ???REGEXP_POSITION_FUNCTION???('.*deep.*space.*' in book.title);

DESIRED RESULT
id   title
4    Deep Space Endeavor
6    Deep Black: Space Espionage and National Security
5    Star Trek: Deep Space Nine: The Never Ending Sacrifice

我没有找到类似于??? REGEXP_POSITION_FUNCTION ???的任何功能，您有什么想法吗？

欧文·布兰德斯特

一种方法（在许多方法中）：从比赛开始移除其余的字符串，并测量截断的字符串的长度：

SELECT id, title
FROM   book
WHERE  title ILIKE '%deep%space%'
ORDER  BY length(regexp_replace(title, 'deep.*space.*', '','i'));

使用ILIKEWHERE子句中，因为这是通常更快（在这里所做的一样）。
还要注意regexp_replace()函数（'i'）的第四个参数，以使其不区分大小写。

备择方案

根据评论中的要求。
同时说明如何首先（和）对匹配项进行排序NULLS LAST。

SELECT id, title
      ,substring(title FROM '(?i)(^.*)deep.*space.*') AS sub1
      ,length(substring(title FROM '(?i)(^.*)deep.*space.*')) AS pos1

      ,substring(title FROM '(?i)^.*(?=deep.*space.*)') AS sub2
      ,length(substring(title FROM '(?i)^.*(?=deep.*space.*)')) AS pos2

      ,substring(title FROM '(?i)^.*(deep.*space.*)') AS sub3
      ,position((substring(title FROM '(?i)^.*(deep.*space.*)')) IN title) AS p3

      ,regexp_replace(title, 'deep.*space.*', '','i') AS reg4
      ,length(regexp_replace(title, 'deep.*space.*', '','i')) AS pos4
FROM   book
ORDER  BY title ILIKE '%deep%space%' DESC NULLS LAST
         ,length(regexp_replace(title, 'deep.*space.*', '','i'));

您可以在此处和此处的手册中找到上述所有内容的文档。

-> SQLfiddle演示所有内容。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。