Matching with missing spaces in ElasticSearch

David Pfeffer

I have documents that I want to index in ElasticSearch that contains a text field called name. I currently index the name using the snowball analyzer. However, I would like to match names both with and without included spaces. For example, a document with the name "The Home Depot" should match "homedepot", "home", and "home depot". Additionally, documents with a single word name like "ExxonMobil" should match "exxon mobil" and "exxonmobil".

I can't seem to find the right combination of analyzer/filters to accomplish this.

femtoRgon

I think the most direct approach to this problem would be to apply a Shingle token filter, which, instead of creating ngrams of characters, creates combinations of incoming tokens. You can add it to your analyzer something like:

filter:
    ........
    my_shingle_filter:
        type: shingle
        min_shingle_size: 2
        max_shingle_size: 3
        output_unigrams: true
        token_separator: ""

you should be mindful of where this filter is placed in your filter chain. It should probably come late in the chain, after all token separation/removal/replacement has already occurred (ie. after any StopFilters, SynonymFilters, stemmers, etc).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Matching words surrounded by spaces

From Dev

Elasticsearch wildcard query with spaces

From Dev

How to matching missing IDs?

From Dev

Missing something re: matching

From Dev

Java Regex matching not working with spaces

From Dev

Matching field by omitting spaces - MongoDB

From Dev

Pervasive SQL with WHERE - matching spaces

From Dev

Java Regex matching not working with spaces

From Dev

problems with phrase matching in elasticsearch

From Dev

ElasticSearch matching paths

From Dev

Elasticsearch: Matching documents with an array in it

From Dev

Matching multiple attribs with elasticsearch

From Dev

Matching array property in elasticsearch

From Dev

Ubuntu grepping strings with missing spaces

From Dev

Kibana Elasticsearch - Index Missing

From Dev

elasticsearch.yml missing

From Dev

Elasticsearch count terms ignoring spaces

From Dev

Elasticsearch Nest wildcard query with spaces

From Dev

Elasticsearch seemingly random scoring and matching

From Dev

matching all nested objects with elasticsearch

From Dev

Elasticsearch sorting by matching array item

From Dev

Elasticsearch bool search matching incorrectly

From Dev

Matching across Multiple documents with ElasticSearch

From Dev

Elasticsearch seemingly random scoring and matching

From Dev

Elasticsearch matching similar sentences in arrays

From Dev

Elasticsearch update document with matching id

From Dev

Elasticsearch Multiple Prefix query OR Matching

From Dev

regex matching any character including spaces

From Java

regex not matching spaces on text between parentheses