How to add index in Database using ELKI java API for Custom POJO with String type fields

Pritom

I am using DBSCAN to cluster some categorical data using a POJO. My class looks like this

public class Dimension {
    private String app;
    private String node;
    private String cluster;
 .............

All my fields are String instead of integer or Float because they are discrete/categorical value. Rest of my code is as follows.

    final SimpleTypeInformation<Dimension> dimensionTypeInformation = new SimpleTypeInformation<>(Dimension.class);
    PrimitiveDistanceFunction<Dimension> dimensionPrimitiveDistanceFunction = new PrimitiveDistanceFunction<Dimension>() {
        public double distance(Dimension d1, Dimension d2) {
            return simpleMatchingCoefficient(d1, d2);
        }
        public SimpleTypeInformation<? super Dimension> getInputTypeRestriction() {
            return dimensionTypeInformation;
        }
        public boolean isSymmetric() {
            return true;
        }
        public boolean isMetric() {
            return true;
        }
        public <T extends Dimension> DistanceQuery<T> instantiate(Relation<T> relation) {
            return new PrimitiveDistanceQuery<>(relation, this);
        }
    };
    DatabaseConnection dbc = new DimensionDatabaseConnection(dimensionList);
    Database db = new StaticArrayDatabase(dbc, null);
    db.initialize();
    DBSCAN<Dimension> dbscan = new DBSCAN<>(dimensionPrimitiveDistanceFunction, 0.6, 20);
    Result result = dbscan.run(db);

Now as expected this code works fine for small dataset but gets very very slow when my dataset gets bigger. So I want to add an index to speed up the process. But all the index that I could think of require me to implement NumberVector. But my class has only Strings not number. What index can I use in this case ? can I use the distance function, double simpleMatchingCoefficient(Dimension d1, Dimension d2) to create an IndexFactory ?

Thanks in advance.

Has QUIT--Anony-Mousse

There are (at least) three broad families of indexes:

  1. Coordinate based indexes, such as the k-d-tree and R-tree. These work well on dense, continuous variables
  2. Metric indexes, that require the distance function to satisfy the triangle inequality. These can work on any kind of data, but may still need a fairly smooth distribution of distance values (e.g., they will not help with the discrete metric, that is 0 of x=y and 1 otherwise).
  3. Inverted lookup indexes. They are mostly used for text search, and exploit that for each attribute only a small subset of the data is relevant. These work well for high-cardinality discrete attributes.

In your case, I'd consider an inverted index. If you have a lot of attributes, a metric index may work, but I doubt that holds, because you use POJOs with strings to store your data.

And of course, profile your code and check if you can improve the implementation of your distance function! E.g. string interning may help, it can reduce matching time of strings to equality testing rather than comparing each character...

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Add index to relation in ELKI

分類Dev

How to store index in ELKI?

分類Dev

Custom envelope fields using Docusign REST API

分類Dev

How to add custom fields in ACF programmatically?

分類Dev

How to replicate a set of custom fields in a page or custom post type?

分類Dev

How to I pass a String and Object using Custom Annotations in Java

分類Dev

how to add fields in popover using angularjs?

分類Dev

How to add input fields to a form dynamically with custom validator?

分類Dev

how to get fields from a pojo dynamically

分類Dev

Update SharePoint fields associated with Content Type using Microsoft Graph API

分類Dev

How to add index to my slug if it already exists in database?

分類Dev

Unable to insert data from dynamically created add remove fields in database using php

分類Dev

How to dynamically add <a> tags given an index of HTML page's string?

分類Dev

How to display custom index using Bokeh hover tool?

分類Dev

How to add custom (non const) string information to log format?

分類Dev

pandas matching database with string keeping index of database

分類Dev

How to get custom fields at registration?

分類Dev

I want to know how to add set of string type arraylists into an arraylist

分類Dev

How do I add custom queries in GraphQL using Strapi?

分類Dev

How to add custom hues using seaborn for multiple pointplots?

分類Dev

Creating custom string type in Python

分類Dev

how to add the Navigation View using Java Code

分類Dev

type 'String' is not a subtype of type 'int' of 'index' error

分類Dev

Java JAXB how to create POJO classes

分類Dev

how to get array string value using index in classic asp

分類Dev

How to add custom styles

分類Dev

Firebase Database - Add Index on "dynamic" child

分類Dev

Using auto-number database fields theory

分類Dev

How to add widget fields based on 3rd party API response?

Related 関連記事

  1. 1

    Add index to relation in ELKI

  2. 2

    How to store index in ELKI?

  3. 3

    Custom envelope fields using Docusign REST API

  4. 4

    How to add custom fields in ACF programmatically?

  5. 5

    How to replicate a set of custom fields in a page or custom post type?

  6. 6

    How to I pass a String and Object using Custom Annotations in Java

  7. 7

    how to add fields in popover using angularjs?

  8. 8

    How to add input fields to a form dynamically with custom validator?

  9. 9

    how to get fields from a pojo dynamically

  10. 10

    Update SharePoint fields associated with Content Type using Microsoft Graph API

  11. 11

    How to add index to my slug if it already exists in database?

  12. 12

    Unable to insert data from dynamically created add remove fields in database using php

  13. 13

    How to dynamically add <a> tags given an index of HTML page's string?

  14. 14

    How to display custom index using Bokeh hover tool?

  15. 15

    How to add custom (non const) string information to log format?

  16. 16

    pandas matching database with string keeping index of database

  17. 17

    How to get custom fields at registration?

  18. 18

    I want to know how to add set of string type arraylists into an arraylist

  19. 19

    How do I add custom queries in GraphQL using Strapi?

  20. 20

    How to add custom hues using seaborn for multiple pointplots?

  21. 21

    Creating custom string type in Python

  22. 22

    how to add the Navigation View using Java Code

  23. 23

    type 'String' is not a subtype of type 'int' of 'index' error

  24. 24

    Java JAXB how to create POJO classes

  25. 25

    how to get array string value using index in classic asp

  26. 26

    How to add custom styles

  27. 27

    Firebase Database - Add Index on "dynamic" child

  28. 28

    Using auto-number database fields theory

  29. 29

    How to add widget fields based on 3rd party API response?

ホットタグ

アーカイブ