Understanding Basic Concepts of Cassandra

1. Keyspaces:

  • Definition: In Cassandra, a keyspace is the outermost container for data. It is similar to a schema in a relational database.
  • Purpose: A keyspace holds one or more tables and defines the replication strategy and settings for the data it contains.
  • Replication Factor: Determines the number of replicas (copies) of the data across the nodes in the cluster.
CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

2. Tables:

  • Definition: Tables in Cassandra are where data is stored. Each table is associated with a specific keyspace.
  • Schema-less Design: Unlike relational databases, Cassandra tables can be dynamic and do not require a fixed schema.
  • Primary Key: Composed of one or more columns, the primary key uniquely identifies each row in the table.
  • Columns: Besides the primary key, tables can have other columns, including static columns and collections.
CREATE TABLE mykeyspace.mytable ( id UUID PRIMARY KEY, name TEXT, age INT );
  • Inserting Data:
INSERT INTO mykeyspace.mytable (id, name, age) VALUES (uuid(), 'John Doe', 25);

3. Nodes:

  • Definition: Nodes are individual instances of Cassandra running in a cluster. Each node is responsible for storing a portion of the data.
  • Peer-to-Peer Architecture: Cassandra follows a peer-to-peer architecture where all nodes in the cluster are equal and communicate with each other.
  • Data Distribution: Data is distributed across nodes using a partitioner. Each node is responsible for a range of data.
  • Replication: Replicas of data are stored on multiple nodes for fault tolerance.
SELECT * FROM system.peers;

Additional Considerations:

  • Consistency Levels:
    • Cassandra offers different consistency levels for read and write operations, allowing you to balance between consistency and availability.Examples include ONE, QUORUM, LOCAL_QUORUM, etc.
SELECT * FROM mykeyspace.mytable WHERE id = ? CONSISTENCY QUORUM;
  • Tuning and Maintenance:
    • Regular maintenance tasks include compaction, repair, and nodetool operations for monitoring and managing the cluster.
  • CQL (Cassandra Query Language):
    • CQL is the query language used to interact with Cassandra. It is similar to SQL but has its own syntax and features tailored for Cassandra.
CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
USE mykeyspace;
CREATE TABLE mytable ( id UUID PRIMARY KEY, name TEXT, age INT );
INSERT INTO mytable (id, name, age) VALUES (uuid(), 'Jane Doe', 30);
SELECT * FROM mytable WHERE id = ?;

Practice Exercise:

  1. Create a keyspace named myblog with a replication factor of 2.
  2. Design a table named posts within the myblog keyspace to store blog posts. Include columns for post_id (UUID), title (TEXT), content (TEXT), and author (TEXT).
  3. Insert a few sample blog posts into the posts table.
  4. Query the posts table to retrieve the blog posts.