Matching with missing spaces in ElasticSearch

David Pfeffer Published at Dev

David Pfeffer

I have documents that I want to index in ElasticSearch that contains a text field called name. I currently index the name using the snowball analyzer. However, I would like to match names both with and without included spaces. For example, a document with the name "The Home Depot" should match "homedepot", "home", and "home depot". Additionally, documents with a single word name like "ExxonMobil" should match "exxon mobil" and "exxonmobil".

I can't seem to find the right combination of analyzer/filters to accomplish this.

femtoRgon

I think the most direct approach to this problem would be to apply a Shingle token filter, which, instead of creating ngrams of characters, creates combinations of incoming tokens. You can add it to your analyzer something like:

filter:
    ........
    my_shingle_filter:
        type: shingle
        min_shingle_size: 2
        max_shingle_size: 3
        output_unigrams: true
        token_separator: ""

you should be mindful of where this filter is placed in your filter chain. It should probably come late in the chain, after all token separation/removal/replacement has already occurred (ie. after any StopFilters, SynonymFilters, stemmers, etc).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-4

Comments

0 comments

From Dev

Related Related

Article

Matching with missing spaces in ElasticSearch

Matching with missing spaces in ElasticSearch

Matching words surrounded by spaces

Elasticsearch wildcard query with spaces

How to matching missing IDs?

Missing something re: matching

Java Regex matching not working with spaces

Matching field by omitting spaces - MongoDB

Pervasive SQL with WHERE - matching spaces

Java Regex matching not working with spaces

problems with phrase matching in elasticsearch

ElasticSearch matching paths

Elasticsearch: Matching documents with an array in it

Matching multiple attribs with elasticsearch

Matching array property in elasticsearch

Ubuntu grepping strings with missing spaces

Kibana Elasticsearch - Index Missing

elasticsearch.yml missing

Elasticsearch count terms ignoring spaces

Elasticsearch Nest wildcard query with spaces

Elasticsearch seemingly random scoring and matching

matching all nested objects with elasticsearch

Elasticsearch sorting by matching array item

Elasticsearch bool search matching incorrectly

Matching across Multiple documents with ElasticSearch

Elasticsearch seemingly random scoring and matching

Elasticsearch matching similar sentences in arrays

Elasticsearch update document with matching id

Elasticsearch Multiple Prefix query OR Matching

regex matching any character including spaces

regex not matching spaces on text between parentheses