Most efficient RowScan of very large Bigtable table

VS_FF

I am looking for the fastest way to perform a row scan of very large Bigtable tables using the latest JAVA API. I only need to scan based on partial row values (no column/column family information needed). The row values are well distributed and Bigtable's lexicographic sorting works well for this use case.

There are a lot of answers out there on this topic throughout the years, but some of them are outdated for older versions and some of them seem to be HBase-specific, or shell-specific. I need specifically for Cloud Bigtable and for the latest versions of JAVA API.

For now, based on my own testing, I see this as the best approach:

Scan s = new Scan();
s.setStartRow(startRowKey); // this can also be passed to constructor
s.setStopRow(stopRowKey); // this can also be passed to constructor
s.setRowPrefixFilter(key.getBytes());
s.setFilter(new PageFilter(MaxResult));
s.setFilter(new KeyOnlyFilter());

But my questions are:

1: Is there something I'm not aware of I should be doing to improve the speed?

2: Is there a better way to limit the results other than through PageFilter()? I.e. how can I say "return max 25 rows"

3: what is the difference between scan.setFilter(new PrefixFilter(rowKey)) and scan.setRowPrefixFilter(rowKey)

4: the advantage of putting the startRow parameter for the scan is very clear, but is there any advantage (or disadvantage) to putting the endRow parameter as well? particularly if you are providing the PageSize() or another limit measur

Thanks for any feedback!

Igor Bernstein

It seems like your filters are clobbering each other (the KeyOnlyFilter will overwrite the PageFilter, you should wrap them in a MUST_PASS_ALL FilterList.

  1. Other then the bug I mentioned above, I can't think of any other optimizations.
  2. I don't believe the HBase API provides another way to specify the row limit.
  3. In your case not much. Main reason to use a PrefilterFilter is to be able to chain it together with other filters in a FilterList.
  4. There is definitely no downside to adding an endRow, but at the same time, I don't think there is much gain either.

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

Most efficient way to SELECT rows WHERE the ID EXISTS IN a second table

来自分类Dev

Java: What's the most efficient way to read relatively large txt files and store its data?

来自分类Dev

Using R to eliminate duplicates in a very large table and then use the remaining data to calculate the distance between several points

来自分类Dev

A method for very efficient (fast) network polling in C

来自分类Dev

Is this most efficient to bubble sort a list in python?

来自分类Dev

Most efficient way to schedule 24 different timers

来自分类Dev

which would be the most time efficient way to perform these nested loops?

来自分类Dev

Most efficient way to exclude indexed rows in pandas dataframe

来自分类Dev

Most efficient way to get digit count of arbitrarily big number

来自分类Dev

Most efficient way to maintain list of error codes in Java

来自分类Dev

What is the best way to store very large binary numbers in JavaScript?

来自分类Dev

More efficient sql to find records not in join table?

来自分类Dev

efficient use of R data.table and unique()

来自分类Dev

MySQL Selecting table that has the most columns?

来自分类Dev

Adding column with default value to large table

来自分类Dev

What's the most efficient way to calculate a running total/balance when using pagination (PHP, MySQL)

来自分类Dev

What's the most efficient way to handle different classes in a hierarchy with some same properties

来自分类Dev

What could cause very slow performance of single UPDATEs of a InnoDB table?

来自分类Dev

How to get a very high precision for a too large number and for a too small number in Java?

来自分类Dev

Bigtable CSV导入

来自分类Dev

提高查询速度:简单的SELECT from SELECT in large table

来自分类Dev

BigTable Java API与BigTable HBase Java API的区别

来自分类Dev

BigTable数据加载模式优化

来自分类Dev

如何从bigtable获取最新数据?

来自分类Dev

Google Cloud Bigtable读取多行

来自分类Dev

是否可以从 BigTable 异步读取?

来自分类Dev

按列值查询 Bigtable

来自分类常见问题

BigQuery和BigTable有什么区别?

来自分类Dev

Java中的BigTable身份验证

Related 相关文章

  1. 1

    Most efficient way to SELECT rows WHERE the ID EXISTS IN a second table

  2. 2

    Java: What's the most efficient way to read relatively large txt files and store its data?

  3. 3

    Using R to eliminate duplicates in a very large table and then use the remaining data to calculate the distance between several points

  4. 4

    A method for very efficient (fast) network polling in C

  5. 5

    Is this most efficient to bubble sort a list in python?

  6. 6

    Most efficient way to schedule 24 different timers

  7. 7

    which would be the most time efficient way to perform these nested loops?

  8. 8

    Most efficient way to exclude indexed rows in pandas dataframe

  9. 9

    Most efficient way to get digit count of arbitrarily big number

  10. 10

    Most efficient way to maintain list of error codes in Java

  11. 11

    What is the best way to store very large binary numbers in JavaScript?

  12. 12

    More efficient sql to find records not in join table?

  13. 13

    efficient use of R data.table and unique()

  14. 14

    MySQL Selecting table that has the most columns?

  15. 15

    Adding column with default value to large table

  16. 16

    What's the most efficient way to calculate a running total/balance when using pagination (PHP, MySQL)

  17. 17

    What's the most efficient way to handle different classes in a hierarchy with some same properties

  18. 18

    What could cause very slow performance of single UPDATEs of a InnoDB table?

  19. 19

    How to get a very high precision for a too large number and for a too small number in Java?

  20. 20

    Bigtable CSV导入

  21. 21

    提高查询速度:简单的SELECT from SELECT in large table

  22. 22

    BigTable Java API与BigTable HBase Java API的区别

  23. 23

    BigTable数据加载模式优化

  24. 24

    如何从bigtable获取最新数据?

  25. 25

    Google Cloud Bigtable读取多行

  26. 26

    是否可以从 BigTable 异步读取?

  27. 27

    按列值查询 Bigtable

  28. 28

    BigQuery和BigTable有什么区别?

  29. 29

    Java中的BigTable身份验证

热门标签

归档