Cassandra data modeling for a social network

Mohammad Kermani

We are using Datastax Cassandra for our social network and we are designing/data modeling tables we need, it is confusing for us and we don't know how to design some tables and we have some little problems!

As we understood for every query we have to have different tables, and for example user A is following user C and B.

Now, in Cassandra we have a table that is posts_by_user:

user_id      |  post_id       |  text  |  created_on  |  deleted  |  view_count  

likes_count  |  comments_count  |  user_full_name

And we have a table according to the followers of users, we insert the post's info to the table called user_timeline that when the follower users are visiting the first web page we get the post from database from user_timeline table.

And here is user_timeline table:

follower_id      |      post_id      | user_id (who posted)  |  likes_count  |  

comments_count   |   location_name   |  user_full_name

First, Is this data modeling correct for follow base (follower, following actions) social network?

And now we want to count likes of a post, as you see we have number of likes in both tables (user_timeline, posts_by_user), and imagine one user has 1000 followers then by each like action we have to update all 1000 rows in user_timeline and 1 row in posts_by_users; And this is not logical!

Then, my second question is How should it be? I mean how should like (favorite) table be?

peytoncas

Think of using posts_by_user as metadata for a post's information. This would allow you to house user_id, post_id, message_text, etc, but you would abstract the view_count, likes_count, and comments_count into a counter table. This would allow you to fetch either a post's metadata or counters as long as you had the post_id, but you would only have to update the counter_record once.

DSE Counter Documentation: https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html

However,

The article below is a really good starting point in relation to data modeling for Cassandra. Namely, there are a few things to take into consideration when answering this question, many of which will depend on the internals of your system and how your queries are structured. The first two rules are stated as:

Rule 1: Spread Data Evenly Around the Cluster

Rule 2: Minimize the Number of Partitions Read

Taking a moment to consider the "user_timeline" table.

  1. user_id and created_on as a COMPOUND KEY* - This would be ideal if

    • You wanted to query for posts by a certain user and with the assumption that you would have a decent number of users. This would distribute records evenly, and your queries would only be hitting a partition at a time.
  2. user_id and a hash_prefix as a COMPOUND KEY* - This would be ideal if

    • You had a small number of users with a large number of posts, which would allow your data to be evenly spread across the cluster. However you run the risk of having to query across multiple partitions.
  3. follower_id and created_on as a COMPOUND KEY* - This would be ideal if

    • You wanted to query for posts being followed by a certain follower. The records would be distributed and you would minimize queries across partitions

These were 3 examples for 1 table, and the point I wanted to convey is to design your tables around the queries you want to execute. Also don't be afraid to duplicate your data across multiple tables that are setup to handle various queries, this is the way Cassandra was meant to be modeled. Take a bit to read the article below and watch the DataStax Academy Data Modeling Course, to familiarize yourself with the nuances. I also included an example schema below to cover the basic counter schema I was pointing out earlier.

* The reason for the compound key is due to the fact that your PRIMARY KEY has to be unique, otherwise an INSERT with an existing PRIMARY KEY will become an UPDATE.

http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling https://academy.datastax.com/courses

CREATE TABLE IF NOT EXISTS social_media.posts_by_user (
user_id uuid,
post_id uuid,
message_text text,
created_on timestamp,
deleted boolean,
user_full_name text,
PRIMARY KEY ((user_id, created_on))
);
CREATE TABLE IF NOT EXISTS social_media.user_timeline (
follower_id uuid,
post_id uuid,
user_id uuid,
location_name text,
user_full_name text,
created_on timestamp,
PRIMARY KEY ((user_id, created_on))
);
CREATE TABLE IF NOT EXISTS social_media.post_counts (
likes_count counter,
view_count counter,
comments_count counter,
post_id uuid,
PRIMARY KEY (post_id)
);

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Data modeling with Cassandra

From Dev

Data Modeling and uuid on Cassandra

From Dev

Cassandra Data Modeling Design

From Dev

Data modeling with Cassandra

From Dev

Data modeling in mongoDB - Social-media app

From Dev

data modeling in Cassandra with columns that can be text or numbers

From Dev

data modeling in Cassandra with columns that can be text or numbers

From Dev

Cassandra data modeling for range queries using timestamp

From Dev

Neo4j vs. ArangoDB when modeling a social network

From Dev

Cassandra - Data Modeling Time Series - Avoiding "Hot Spots"?

From Dev

Range query - Data modeling for time series in CQL Cassandra

From Dev

Social media's like and unlike data model in Cassandra

From Dev

Cassandra modeling pattern

From Dev

Cassandra time series modeling

From Dev

Cassandra modeling pattern

From Dev

MySql vs NoSql - Social network comments and notifications data structure and implementation

From Dev

Load Social Network Data into Neo4J

From Dev

MySql vs NoSql - Social network comments and notifications data structure and implementation

From Dev

Efficient modeling of versioned hierarchies in Cassandra

From Dev

Modeling account for rest communication cassandra

From Dev

Cassandra filter with ordering query modeling

From Dev

Partially de-normalization Vs de-normalization in Cassandra Data Modeling?

From Dev

Social network - Suggested friends

From Dev

Building A Social Network

From Dev

Keyboard Shortcut in "The Social Network"

From Dev

Configure cassandra to use different network interfaces for data streaming and client connection?

From Dev

Dijkstra Algorithm on a graph modeling a network

From Dev

Dijkstra Algorithm on a graph modeling a network

From Java

Given a data structure representing a social network implement method canBeConnected on class Friend

Related Related

  1. 1

    Data modeling with Cassandra

  2. 2

    Data Modeling and uuid on Cassandra

  3. 3

    Cassandra Data Modeling Design

  4. 4

    Data modeling with Cassandra

  5. 5

    Data modeling in mongoDB - Social-media app

  6. 6

    data modeling in Cassandra with columns that can be text or numbers

  7. 7

    data modeling in Cassandra with columns that can be text or numbers

  8. 8

    Cassandra data modeling for range queries using timestamp

  9. 9

    Neo4j vs. ArangoDB when modeling a social network

  10. 10

    Cassandra - Data Modeling Time Series - Avoiding "Hot Spots"?

  11. 11

    Range query - Data modeling for time series in CQL Cassandra

  12. 12

    Social media's like and unlike data model in Cassandra

  13. 13

    Cassandra modeling pattern

  14. 14

    Cassandra time series modeling

  15. 15

    Cassandra modeling pattern

  16. 16

    MySql vs NoSql - Social network comments and notifications data structure and implementation

  17. 17

    Load Social Network Data into Neo4J

  18. 18

    MySql vs NoSql - Social network comments and notifications data structure and implementation

  19. 19

    Efficient modeling of versioned hierarchies in Cassandra

  20. 20

    Modeling account for rest communication cassandra

  21. 21

    Cassandra filter with ordering query modeling

  22. 22

    Partially de-normalization Vs de-normalization in Cassandra Data Modeling?

  23. 23

    Social network - Suggested friends

  24. 24

    Building A Social Network

  25. 25

    Keyboard Shortcut in "The Social Network"

  26. 26

    Configure cassandra to use different network interfaces for data streaming and client connection?

  27. 27

    Dijkstra Algorithm on a graph modeling a network

  28. 28

    Dijkstra Algorithm on a graph modeling a network

  29. 29

    Given a data structure representing a social network implement method canBeConnected on class Friend

HotTag

Archive