Need advice on efficiently inserting millions of time series data into a Cassandra DB

rabejens

I want to use a Cassandra database to store time series data from a test site. I am using Pattern 2 from the "Getting started with Time Series Data Modeling" tutorial but am not storing the date to limit the row size as a date, but as an int counting the number of days elapsed since 1970-01-01, and the timestamp of the value is the number of nanoseconds since the epoch (some of our measuring devices are that precise and the precision is needed). My table for the values looks like this:

CREATE TABLE values (channel_id INT, day INT, time BIGINT, value DOUBLE, PRIMARY KEY ((channel_id, day), time))

I created a simple benchmark, taking into account using asynchronity and prepared statements for batch loading instead of batches:

  def valueBenchmark(numVals: Int): Unit = {
    val vs = session.prepare(
      "insert into values (channel_id, day, time, " +
      "value) values (?, ?, ?, ?)")
    val currentFutures = mutable.MutableList[ResultSetFuture]()
    for(i <- 0 until numVals) {
      currentFutures += session.executeAsync(vs.bind(-1: JInt,
        i / 100000: JInt, i.toLong: JLong, 0.0: JDouble))
      if(currentFutures.length >= 10000) {
        currentFutures.foreach(_.getUninterruptibly)
        currentFutures.clear()
      }
    }
    if(currentFutures.nonEmpty) {
      currentFutures.foreach(_.getUninterruptibly)
    }
  }

JInt, JLong and JDouble are simply java.lang.Integer, java.lang.Long and java.lang.Double, respectively.

When I run this benchmark for 10 million values, this needs about two minutes for a locally installed single-node Cassandra. My computer is equipped with 16 GiB of RAM and a quad-core i7 CPU. I find this quite slow. Is this normal performance for inserts with Cassandra?

I already read these:

Are there any other things I could check?

doanduyhai

Simple maths:

10 millions inserts/2 minutes ≈ 83 333,33333 inserts/sec which is great for a single machine, did you expect something faster?

By the way, what are the specs of your hard-drives ? SSD or spinning disks ?

You should know that massive insert scenarios are more CPU bound than I/O bound. Try to execute the same test on a machine with 8 physical cores (so 16 vcores with Hyper Threading) and compare the results.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Cassandra partition key for time series data

From Dev

Cassandra saving time series for industry data sensors

From Dev

Choosing Cassandra schema for hydrological time series data

From Dev

Cassandra Time-Series data modelling

From Dev

Cassandra saving time series for industry data sensors

From Dev

Choosing Cassandra schema for hydrological time series data

From Dev

advice on how to efficiently store images in Parse DB

From Dev

Optimizing SQL query with multiple joins. Need advice on techniques to retrieve data fast and efficiently

From Dev

How to model Cassandra DB for Time Series, server metrics

From Dev

How to Efficiently Process Time-Series Data in Pandas

From Dev

Cassandra Time Series sort

From Dev

Cassandra time series modeling

From Dev

Scylla/Cassandra: compaction strategy for time series data without TTL

From Dev

Cassandra - Data Modeling Time Series - Avoiding "Hot Spots"?

From Dev

How to implement difference aggregations / rollups on time series data in Cassandra

From Dev

Range query - Data modeling for time series in CQL Cassandra

From Dev

Need advice on deploying on Liberty and managing sensitive user/passwords data *for db, mq etc.)

From Dev

Stretch time series efficiently in SQL

From Dev

How to efficiently fill a time series?

From Dev

Error While Inserting Data into Cassandra

From Dev

How can I solve error I am having while Inserting Tweet data into Apache Cassandra db?

From Dev

How can I solve error I am having while Inserting Tweet data into Apache Cassandra db?

From Dev

Time Series Graph DB

From Dev

Down sampling a time series data in dplyr from Postgres DB

From Dev

Time series data storage in mongo DB using PHP

From Dev

Required advice on choosing proper data type and algorithm for inserting data into database

From Dev

Need an advice to display large amount of data Android

From Dev

Need advice with php/html on displaying MySQL data

From Dev

need advice on mysql data base design

Related Related

  1. 1

    Cassandra partition key for time series data

  2. 2

    Cassandra saving time series for industry data sensors

  3. 3

    Choosing Cassandra schema for hydrological time series data

  4. 4

    Cassandra Time-Series data modelling

  5. 5

    Cassandra saving time series for industry data sensors

  6. 6

    Choosing Cassandra schema for hydrological time series data

  7. 7

    advice on how to efficiently store images in Parse DB

  8. 8

    Optimizing SQL query with multiple joins. Need advice on techniques to retrieve data fast and efficiently

  9. 9

    How to model Cassandra DB for Time Series, server metrics

  10. 10

    How to Efficiently Process Time-Series Data in Pandas

  11. 11

    Cassandra Time Series sort

  12. 12

    Cassandra time series modeling

  13. 13

    Scylla/Cassandra: compaction strategy for time series data without TTL

  14. 14

    Cassandra - Data Modeling Time Series - Avoiding "Hot Spots"?

  15. 15

    How to implement difference aggregations / rollups on time series data in Cassandra

  16. 16

    Range query - Data modeling for time series in CQL Cassandra

  17. 17

    Need advice on deploying on Liberty and managing sensitive user/passwords data *for db, mq etc.)

  18. 18

    Stretch time series efficiently in SQL

  19. 19

    How to efficiently fill a time series?

  20. 20

    Error While Inserting Data into Cassandra

  21. 21

    How can I solve error I am having while Inserting Tweet data into Apache Cassandra db?

  22. 22

    How can I solve error I am having while Inserting Tweet data into Apache Cassandra db?

  23. 23

    Time Series Graph DB

  24. 24

    Down sampling a time series data in dplyr from Postgres DB

  25. 25

    Time series data storage in mongo DB using PHP

  26. 26

    Required advice on choosing proper data type and algorithm for inserting data into database

  27. 27

    Need an advice to display large amount of data Android

  28. 28

    Need advice with php/html on displaying MySQL data

  29. 29

    need advice on mysql data base design

HotTag

Archive