I am looking for the fastest way to perform a row scan of very large Bigtable tables using the latest JAVA API. I only need to scan based on partial row values (no column/column family information needed). The row values are well distributed and Bigtable's lexicographic sorting works well for this use case.
There are a lot of answers out there on this topic throughout the years, but some of them are outdated for older versions and some of them seem to be HBase-specific, or shell-specific. I need specifically for Cloud Bigtable and for the latest versions of JAVA API.
For now, based on my own testing, I see this as the best approach:
Scan s = new Scan();
s.setStartRow(startRowKey); // this can also be passed to constructor
s.setStopRow(stopRowKey); // this can also be passed to constructor
s.setRowPrefixFilter(key.getBytes());
s.setFilter(new PageFilter(MaxResult));
s.setFilter(new KeyOnlyFilter());
But my questions are:
1: Is there something I'm not aware of I should be doing to improve the speed?
2: Is there a better way to limit the results other than through PageFilter()
? I.e. how can I say "return max 25 rows"
3: what is the difference between scan.setFilter(new PrefixFilter(rowKey))
and scan.setRowPrefixFilter(rowKey)
4: the advantage of putting the startRow
parameter for the scan is very clear, but is there any advantage (or disadvantage) to putting the endRow
parameter as well? particularly if you are providing the PageSize()
or another limit measur
Thanks for any feedback!
It seems like your filters are clobbering each other (the KeyOnlyFilter
will overwrite the PageFilter
, you should wrap them in a MUST_PASS_ALL
FilterList
.
PrefilterFilter
is to be able to chain it together with other filters in a FilterList
.endRow
, but at the same time, I don't think there is much gain either.本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句