Cassandra 3.11.9 Practice

The target Cassandra version is 3.11.9

目次

Step1.Create a docker-compose.yml of cassandra

1.Download repository to practice

$ git clone https://github.com/blueskyarea/cassandra-practice.git

2.Check the content of the Dockerfile & docker-compose file

Dockerfile

$ cat cassandra-practice/cassandra-docker/Dockerfile
# Use the official Cassandra image as the base image
FROM cassandra:3.11.9

# Install vim
RUN apt-get update && apt-get install -y vim

# Set the default command to run when the container starts
CMD ["cassandra", "-f"]

docker-compose.yml

$ cat cassandra-practice/cassandra-docker/docker-compose.yml
cassandra-node1:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: cassandra-node1
    environment:
        - CASSANDRA_CLUSTER_NAME=MyCluster
        - CASSANDRA_SEEDS=cassandra-node1,cassandra-node2
        - CASSANDRA_DC=se1
        - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
    ports:
        - "9042:9042"
    volumes:
        - ./cassandra-node1-data:/var/lib/cassandra
---(omit)---
  • Docker image version is 3.11.9 that is specified in Dockerfile.
  • Specified container name with “cassandra-node1“.
  • Specified some environment variables.
  • Specified port number.
  • Specified volume to save data persist.

Step2.Start Cassandra with docker environment

1.Download Docker image of Cassandra and start the container

$ cd cassandra-practice/cassandra-docker/
$ docker-compose up -d
-- Output Example --
Pulling cassandra-node1 (cassandra:3.11.9)…
....
Creating cassandra-node1 … done
Creating cassandra-node2 … done
Creating cassandra-node3 … done
-----------------------

2.Check the running process

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b8d95cbdbb15 cassandra:3.11.9 "docker-entrypoint.s…" 12 minutes ago Up 12 minutes 7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9044->9042/tcp, :::9044->9042/tcp cassandra-node3
93fd9d7b458a cassandra:3.11.9 "docker-entrypoint.s…" 12 minutes ago Up 12 minutes 7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9043->9042/tcp, :::9043->9042/tcp cassandra-node2
5ceb8b7b3dd3 cassandra:3.11.9 "docker-entrypoint.s…" 12 minutes ago Up 12 minutes 7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9042->9042/tcp, :::9042->9042/tcp cassandra-node1

Step3.Use cqlsh to connect to the Cassandra container

1.Start the cqlsh from 1 container (it may take time until possible to execute)

$ docker exec -it cassandra-node1 cqlsh
Connected to MyCluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.9 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

2.Check the HELP

cqlsh> HELP

Documented shell commands:
===========================
CAPTURE  CLS          COPY  DESCRIBE  EXPAND  LOGIN   SERIAL  SOURCE   UNICODE
CLEAR    CONSISTENCY  DESC  EXIT      HELP    PAGING  SHOW    TRACING

CQL help topics:
================
AGGREGATES               CREATE_KEYSPACE           DROP_TRIGGER      TEXT
ALTER_KEYSPACE           CREATE_MATERIALIZED_VIEW  DROP_TYPE         TIME
ALTER_MATERIALIZED_VIEW  CREATE_ROLE               DROP_USER         TIMESTAMP
ALTER_TABLE              CREATE_TABLE              FUNCTIONS         TRUNCATE
ALTER_TYPE               CREATE_TRIGGER            GRANT             TYPES
ALTER_USER               CREATE_TYPE               INSERT            UPDATE
APPLY                    CREATE_USER               INSERT_JSON       USE
ASCII                    DATE                      INT               UUID
BATCH                    DELETE                    JSON
BEGIN                    DROP_AGGREGATE            KEYWORDS
BLOB                     DROP_COLUMNFAMILY         LIST_PERMISSIONS
BOOLEAN                  DROP_FUNCTION             LIST_ROLES
COUNTER                  DROP_INDEX                LIST_USERS
CREATE_AGGREGATE         DROP_KEYSPACE             PERMISSIONS
CREATE_COLUMNFAMILY      DROP_MATERIALIZED_VIEW    REVOKE
CREATE_FUNCTION          DROP_ROLE                 SELECT
CREATE_INDEX             DROP_TABLE                SELECT_JSON

3.exit the cqlsh

cqlsh> exit

Step4. Create keyspace and table for sample data

1.Create a keyspace

$ docker exec -it cassandra-node1 cqlsh
cqlsh> CREATE KEYSPACE myblog WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

A replication strategy of SimpleStrategy and a replication factor of 3.
This means that each piece of data will be replicated to three nodes in the cluster for fault tolerance.

2.Create a table

cqlsh> CREATE TABLE myblog.posts (id UUID PRIMARY KEY, title TEXT, content TEXT, author TEXT);

3.Insert a sample data

cqlsh> INSERT INTO myblog.posts (id, title, content, author) VALUES (uuid(), 'First blog', 'My blog', 'John');

4.Select records from the table

cqlsh> SELECT * FROM myblog.posts;

 id                                   | author | content | title
--------------------------------------+--------+---------+------------
 74de1f36-8749-49f9-a5c1-de69832ea80f |   John | My blog | First blog

(1 rows)

Step5. Learn about querying data in Cassandra using CQL

1.Try to shows a table information for a single table(This will fail)

cqlsh> DESCRIBE TABLE posts;
No keyspace specified and no current keyspace

2.Connects the client session to a keyspace

cqlsh:myblog> USE myblog;

3.Shows a executable CQL information for a single table(This will success)

cqlsh:myblog> DESCRIBE TABLE posts;

CREATE TABLE myblog.posts (
    id uuid PRIMARY KEY,
    author text,
    content text,
    title text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

4.Insert another sample data

cqlsh:myblog> INSERT INTO posts (id, title, content, author) VALUES (uuid(), 'Second blog', 'Hello', 'Mila');

5.Select records from the table

cqlsh:myblog> SELECT * FROM posts;

 id                                   | author | content | title
--------------------------------------+--------+---------+-------------
 74de1f36-8749-49f9-a5c1-de69832ea80f |   John | My blog |  First blog
 6f2a5531-85a8-4317-8459-2c3e94e38bc5 |   Mila |   Hello | Second blog

(2 rows)

6.Select records from the table by column value(This will fail)

cqlsh:myblog> SELECT * FROM posts WHERE author = 'Mila';

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

7.Select records from the table by id(This will success)

# Please input the id instead of ???
SELECT * FROM posts WHERE id = ???;

Ex)
cqlsh:myblog> SELECT * FROM posts WHERE id = 6f2a5531-85a8-4317-8459-2c3e94e38bc5;

 id                                   | author | content | title
--------------------------------------+--------+---------+-------------
 6f2a5531-85a8-4317-8459-2c3e94e38bc5 |   Mila |   Hello | Second blog

(1 rows)

8.Select records from the table by column value with ‘ALLOW FILTERING'(This will success)

cqlsh:myblog> SELECT * FROM posts WHERE author = 'Mila' ALLOW FILTERING;

 id                                   | author | content | title
--------------------------------------+--------+---------+-------------
 6f2a5531-85a8-4317-8459-2c3e94e38bc5 |   Mila |   Hello | Second blog

(1 rows)

9.Update a record

# Please input the id instead of ???
cqlsh:myblog> UPDATE posts SET content = 'Updated!' WHERE id = ???;

Ex)
cqlsh:myblog> UPDATE posts SET content = 'Updated!' WHERE id = 74de1f36-8749-49f9-a5c1-de69832ea80f;

# And check the updated record
cqlsh:myblog> SELECT * FROM posts;

Ex)
cqlsh:myblog> SELECT * FROM posts;

 id                                   | author | content  | title
--------------------------------------+--------+----------+-------------
 74de1f36-8749-49f9-a5c1-de69832ea80f |   John | Updated! |  First blog
 6f2a5531-85a8-4317-8459-2c3e94e38bc5 |   Mila |    Hello | Second blog

(2 rows)

10.Delete a record

# Please input the id instead of ???
cqlsh:myblog> DELETE FROM posts WHERE id = ???;

Ex)
cqlsh:myblog> DELETE FROM posts WHERE id = 74de1f36-8749-49f9-a5c1-de69832ea80f;

# And check the deleted record
cqlsh:myblog> SELECT * FROM posts;

Ex)
cqlsh:myblog> SELECT * FROM posts;

 id                                   | author | content | title
--------------------------------------+--------+---------+-------------
 6f2a5531-85a8-4317-8459-2c3e94e38bc5 |   Mila |   Hello | Second blog

(1 rows)

11.Execute multiple SQL statement together

# Please input the id instead of ???
cqlsh:myblog> BEGIN BATCH
INSERT INTO posts (id, title, content, author) VALUES (uuid(), 'Third blog', 'I''m back', 'John');
UPDATE posts SET content = 'Good night.' WHERE id = ???;
APPLY BATCH;

Ex)
cqlsh:myblog> BEGIN BATCH
          ... INSERT INTO posts (id, title, content, author) VALUES (uuid(), 'Third blog', 'I''m back', 'John');
          ... UPDATE posts SET content = 'Good night.' WHERE id = 6f2a5531-85a8-4317-8459-2c3e94e38bc5;
          ... APPLY BATCH;

# Check the updates result
cqlsh:myblog> SELECT * FROM posts;

 id                                   | author | content     | title
--------------------------------------+--------+-------------+-------------
 6502fc83-c78d-49d9-8928-79a83cc6eb70 |   John |    I'm back |  Third blog
 6f2a5531-85a8-4317-8459-2c3e94e38bc5 |   Mila | Good night. | Second blog

(2 rows)

Step6. Explore how indexing works in Cassandra

1.Create Index
Create an index on the column you want to query.

cqlsh:myblog> CREATE INDEX ON posts (title);

2.Query data
You can now query the data based on the indexed column.

cqlsh:myblog> SELECT * FROM posts WHERE title = 'Third blog';

 id                                   | author | content  | title
--------------------------------------+--------+----------+------------
 6502fc83-c78d-49d9-8928-79a83cc6eb70 |   John | I'm back | Third blog

3.View Indexes
You can view the existing indexes in your table.

cqlsh:myblog> DESCRIBE TABLE posts;

CREATE TABLE myblog.posts (
    id uuid PRIMARY KEY,
    author text,
    content text,
    title text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
CREATE INDEX posts_title_idx ON myblog.posts (title);

-> The index name is “posts_title_idx”.

4.Drop the Indexes (Optional)
If you no longer need the index, drop it.

cqlsh:myblog> DROP INDEX posts_title_idx;

Summary

  • Indexing allows for efficient querying of data based on non-primary key columns, improving query performance for specific use cases.
  • Creating too many indexes can impact write performance and increase storage requirements.

Step7. Experiment with different consistency levels

1.Perform Read operation with CONSISTENCY ALL

# Set with CONSISTENCY ALL
cqlsh:myblog> CONSISTENCY ALL;

# And then execute queries(The below 3 queries together)
cqlsh:myblog> SELECT toTimestamp(now()) AS start_time FROM system.local; SELECT * FROM myblog.posts; SELECT toTimestamp(now()) AS end_time FROM system.local;

# and then subtract the start time from the end time to calculate the elapsed time.

2.Perform Read operation with CONSISTENCY ONE

# Set with CONSISTENCY ONE
cqlsh:myblog> CONSISTENCY ONE;

# And then execute queries(The below 3 queries together)
cqlsh:myblog> SELECT toTimestamp(now()) AS start_time FROM system.local; SELECT * FROM myblog.posts; SELECT toTimestamp(now()) AS end_time FROM system.local;

# and then subtract the start time from the end time to calculate the elapsed time.

3.Perform Read operation with CONSISTENCY QUORUM

# Set with CONSISTENCY QUORUM
cqlsh:myblog> CONSISTENCY QUORUM;

# And then execute queries(The below 3 queries together)
cqlsh:myblog> SELECT toTimestamp(now()) AS start_time FROM system.local; SELECT * FROM myblog.posts; SELECT toTimestamp(now()) AS end_time FROM system.local;

# and then subtract the start time from the end time to calculate the elapsed time.

Consistency levels dictate how many nodes must respond to a read operation before it is considered successful and determine the freshness of the data returned.

4.Perform Write operation with CONSISTENCY ALL

# Set with CONSISTENCY ALL
cqlsh:myblog> CONSISTENCY ALL;

# And then execute queries(The below 3 queries together)
cqlsh:myblog> SELECT toTimestamp(now()) AS start_time FROM system.local; INSERT INTO posts (id, title, content, author) VALUES (uuid(), 'ALL', 'ALL', 'Alice'); SELECT toTimestamp(now()) AS end_time FROM system.local;

# and then subtract the start time from the end time to calculate the elapsed time.

5.Perform Write operation with CONSISTENCY ONE

# Set with CONSISTENCY ONE
cqlsh:myblog> CONSISTENCY ONE;

# And then execute queries(The below 3 queries together)
cqlsh:myblog> SELECT toTimestamp(now()) AS start_time FROM system.local; INSERT INTO posts (id, title, content, author) VALUES (uuid(), 'ONE', 'ONE', 'Bob'); SELECT toTimestamp(now()) AS end_time FROM system.local;

# and then subtract the start time from the end time to calculate the elapsed time.

6.Perform Write operation with CONSISTENCY QUORUM

# Set with CONSISTENCY QUORUM
cqlsh:myblog> CONSISTENCY QUORUM;

# And then execute queries(The below 3 queries together)
cqlsh:myblog> SELECT toTimestamp(now()) AS start_time FROM system.local; INSERT INTO posts (id, title, content, author) VALUES (uuid(), 'QUORUM', 'QUORUM', 'Charlie'); SELECT toTimestamp(now()) AS end_time FROM system.local;

# and then subtract the start time from the end time to calculate the elapsed time.

Summary

  • Consider the trade-offs between consistency, availability, and partition tolerance when choosing a consistency level for your application.
  • Higher consistency levels provide stronger guarantees but may incur higher latency and lower availability, while lower consistency levels offer better performance but may lead to eventual consistency and potential data conflicts.

Step8. Explore tools and techniques for monitoring Cassandra performance

1.Use nodetool command to see the status

Displays the status of the nodes in the cluster.

$ docker exec -it cassandra-node1 nodetool status

Datacenter: se1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.18.0.2  364.01 KiB  256          100.0%            2943d0bd-aa7a-476d-ae22-83a3715c1681  rack1
UN  172.18.0.3  350.93 KiB  256          100.0%            254cb6b1-009b-40bb-a166-579fc33fad67  rack1
UN  172.18.0.4  371.56 KiB  256          100.0%            d2454910-64ee-4879-a1e9-56289fd6a7fe  rack1
  • Datacenter: Indicates the name of the datacenter to which the nodes belong.
  • Status: Indicates whether the node is currently up or down.
  • State: Provides additional information about the state of the node:
    • UN: Up and normal. The node is fully operational and handling requests.
    • /: Separator between the status and state.
  • Address: IP address or hostname of the node.
  • Load: The amount of data (in bytes or KiB) that the node is responsible for storing. This includes both the data and the replicas.
  • Tokens: The number of tokens assigned to the node. Tokens are assigned based on the partitioner and determine the range of data each node is responsible for.
  • Owns (effective): Indicates the percentage of the data range that the node owns and is responsible for storing. In a balanced cluster, this value should be close to 100% for each node.
  • Host ID: Unique identifier for the node.
  • Rack: The rack within the datacenter where the node is located.

2.Use nodetool command to see various metrics

Detailed information about the current node in the Cassandra cluster.

$ docker exec -it cassandra-node1 nodetool info
ID                     : 2943d0bd-aa7a-476d-ae22-83a3715c1681
Gossip active          : true
Thrift active          : false
Native Transport active: true
Load                   : 315.45 KiB
Generation No          : 1706948010
Uptime (seconds)       : 28565
Heap Memory (MB)       : 374.71 / 1864.25
Off Heap Memory (MB)   : 0.00
Data Center            : se1
Rack                   : rack1
Exceptions             : 0
Key Cache              : entries 27, size 2.15 KiB, capacity 93 MiB, 313 hits, 350 requests, 0.894 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 46 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache            : entries 24, size 1.5 MiB, capacity 434 MiB, 99 misses, 1236 requests, 0.920 recent hit rate, NaN microseconds miss latency
Percent Repaired       : 0.0%
Token                  : (invoke with -T/--tokens to see all 256 tokens)
  • ID: Unique identifier for the node.
  • Gossip active: Indicates whether the gossip protocol, which handles communication between nodes in the cluster, is active on this node (true or false).
  • Thrift active: Indicates whether the Thrift protocol, which is used for client communication in older versions of Cassandra, is active on this node (true or false). In modern versions of Cassandra, Thrift is often disabled by default.
  • Native Transport active: Indicates whether the native transport protocol, which is used for client communication in newer versions of Cassandra, is active on this node (true or false).
  • Load: The amount of data (in bytes or KiB) that the node is responsible for storing.
  • Generation No: A numeric identifier representing the generation of the node.
  • Uptime (seconds): The uptime of the node in seconds.
  • Heap Memory (MB): The current and maximum heap memory usage of the node in megabytes.
  • Off Heap Memory (MB): The off-heap memory usage of the node in megabytes.
  • Data Center: The name of the data center where the node is located.
  • Rack: The rack within the data center where the node is located.
  • Exceptions: The number of exceptions encountered by the node.
  • Key Cache: Information about the key cache, including the number of entries, size, hit rate, and save period.
  • Row Cache: Information about the row cache, including the number of entries, size, hit rate, and save period.
  • Counter Cache: Information about the counter cache, including the number of entries, size, hit rate, and save period.
  • Chunk Cache: Information about the chunk cache, including the number of entries, size, hit rate, and miss latency.
  • Percent Repaired: The percentage of repaired data on the node.
  • Token: The token assigned to the node, which determines its position in the Cassandra ring.

Information about the thread pools and message types processed by the Cassandra node

$ docker exec -it cassandra-node1 nodetool tpstats
Pool Name                         Active   Pending      Completed   Blocked  All time blocked
ReadStage                              0         0            478         0                 0
MiscStage                              0         0              0         0                 0
CompactionExecutor                     0         0          23064         0                 0
MutationStage                          0         0            223         0                 0
MemtableReclaimMemory                  0         0             71         0                 0
PendingRangeCalculator                 0         0              3         0                 0
GossipStage                            0         0         144820         0                 0
SecondaryIndexManagement               0         0              1         0                 0
HintsDispatcher                        0         0              0         0                 0
RequestResponseStage                   0         0             35         0                 0
Native-Transport-Requests              0         0           1721         0                 0
ReadRepairStage                        0         0              1         0                 0
CounterMutationStage                   0         0              0         0                 0
MigrationStage                         0         0              4         0                 0
MemtablePostFlush                      0         0            112         0                 0
PerDiskMemtableFlushWriter_0           0         0             69         0                 0
ValidationExecutor                     0         0              0         0                 0
Sampler                                0         0              0         0                 0
MemtableFlushWriter                    0         0             70         0                 0
InternalResponseStage                  0         0              0         0                 0
ViewMutationStage                      0         0              0         0                 0
AntiEntropyStage                       0         0              0         0                 0
CacheCleanupExecutor                   0         0              0         0                 0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

Thread Pool Stats:

  • Pool Name: The name of the thread pool.
  • Active: The number of active threads in the pool.
  • Pending: The number of pending tasks in the pool (tasks waiting to be executed).
  • Completed: The number of tasks completed by the pool since the last reset.
  • Blocked: The number of times a thread was blocked while trying to execute a task in the pool.
  • All time blocked: The total number of times a thread has been blocked since the pool was created.

Each row in the thread pool stats section represents a different thread pool in the Cassandra node. Here are some commonly seen thread pools:

  • ReadStage: Handles read requests.
  • CompactionExecutor: Executes compaction tasks (merging and cleaning up SSTables).
  • MutationStage: Handles mutation (write) requests.
  • GossipStage: Handles gossip communication between nodes.
  • RequestResponseStage: Handles request-response communication between clients and nodes.
  • Native-Transport-Requests: Handles native transport (CQL) requests.

Message Type Stats:

  • Message type: The type of message processed by the node.
  • Dropped: The number of messages dropped of that type.

Each row in the message type stats section represents a different message type processed by the Cassandra node. Some common message types include reads, mutations, request responses, and hints.

Statistics about a specific column family (table) in a keyspace.

$ docker exec -it cassandra-node1 nodetool cfstats myblog
Total number of tables: 37
----------------
Keyspace : myblog
        Read Count: 4
        Read Latency: 5.25075 ms
        Write Count: 9
        Write Latency: 0.3881111111111111 ms
        Pending Flushes: 0
                Table: posts
                SSTable count: 1
                Space used (live): 5278
                Space used (total): 5278
                Space used by snapshots (total): 0
                Off heap memory used (total): 44
                SSTable Compression Ratio: 0.9827586206896551
                Number of partitions (estimate): 6
                Memtable cell count: 3
                Memtable data size: 270
                Memtable off heap memory used: 0
                Memtable switch count: 2
                Local read count: 4
                Local read latency: NaN ms
                Local write count: 9
                Local write latency: NaN ms
                Pending flushes: 0
                Percent repaired: 0.0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 16
                Bloom filter off heap memory used: 8
                Index summary off heap memory used: 28
                Compression metadata off heap memory used: 8
                Compacted partition minimum bytes: 30
                Compacted partition maximum bytes: 86
                Compacted partition mean bytes: 65
                Average live cells per slice (last five minutes): NaN
                Maximum live cells per slice (last five minutes): 0
                Average tombstones per slice (last five minutes): NaN
                Maximum tombstones per slice (last five minutes): 0
                Dropped Mutations: 0

Keyspace Stats:

  • Read Count: The total number of read operations performed on the keyspace.
  • Read Latency: The average latency (in milliseconds) for read operations.
  • Write Count: The total number of write operations performed on the keyspace.
  • Write Latency: The average latency (in milliseconds) for write operations.
  • Pending Flushes: The number of pending flushes for the keyspace.

Table Stats:

  • Table: The name of the table (posts in this case).
  • SSTable count: The number of SSTables (sorted string tables) for the table.
  • Space used (live): The amount of space (in bytes) used by live data.
  • Space used (total): The total amount of space (in bytes) used by the table.
  • Space used by snapshots (total): The total amount of space (in bytes) used by snapshots of the table.
  • Off heap memory used (total): The total amount of off-heap memory used by the table.
  • SSTable Compression Ratio: The compression ratio of the SSTables.
  • Number of partitions (estimate): An estimate of the number of partitions in the table.
  • Memtable cell count: The number of cells in the memtable.
  • Memtable data size: The size (in bytes) of data stored in the memtable.
  • Memtable off heap memory used: The amount of off-heap memory used by the memtable.
  • Memtable switch count: The number of times the memtable has been switched.
  • Local read count: The number of read operations performed locally.
  • Local read latency: The average latency (in milliseconds) for local read operations.
  • Local write count: The number of write operations performed locally.
  • Local write latency: The average latency (in milliseconds) for local write operations.
  • Pending flushes: The number of pending flushes for the table.
  • Percent repaired: The percentage of repaired data in the table.
  • Bloom filter false positives: The number of false positives in the Bloom filter.
  • Bloom filter false ratio: The ratio of false positives in the Bloom filter.
  • Bloom filter space used: The space (in bytes) used by the Bloom filter.
  • Bloom filter off heap memory used: The amount of off-heap memory used by the Bloom filter.
  • Index summary off heap memory used: The amount of off-heap memory used by the index summary.
  • Compression metadata off heap memory used: The amount of off-heap memory used by compression metadata.
  • Compacted partition minimum bytes: The minimum size (in bytes) of compacted partitions.
  • Compacted partition maximum bytes: The maximum size (in bytes) of compacted partitions.
  • Compacted partition mean bytes: The mean size (in bytes) of compacted partitions.
  • Average live cells per slice (last five minutes): The average number of live cells per slice in the last five minutes.
  • Maximum live cells per slice (last five minutes): The maximum number of live cells per slice in the last five minutes.
  • Average tombstones per slice (last five minutes): The average number of tombstones per slice in the last five minutes.
  • Maximum tombstones per slice (last five minutes): The maximum number of tombstones per slice in the last five minutes.
  • Dropped Mutations: The number of dropped mutations for the table.

Information about garbage collection (GC) statistics for the Cassandra node

$ docker exec -it cassandra-node1 nodetool gcstats
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
     39392432                 327                1264                  86          4367252032                  11                       -1
  • Interval (ms): The time interval (in milliseconds) covered by the statistics.
  • Max GC Elapsed (ms): The maximum elapsed time (in milliseconds) for a single garbage collection cycle during the interval.
  • Total GC Elapsed (ms): The total elapsed time (in milliseconds) spent on garbage collection during the interval.
  • Stdev GC Elapsed (ms): The standard deviation of the elapsed time (in milliseconds) for garbage collection cycles during the interval.
  • GC Reclaimed (MB): The total amount of memory (in megabytes) reclaimed by garbage collection during the interval.
  • Collections: The total number of garbage collection cycles that occurred during the interval.
  • Direct Memory Bytes: The amount of direct memory (in bytes) used by the JVM. This value may be -1 if the information is not available.

Compaction tasks that are pending or in progress on the Cassandra node

$ docker exec -it cassandra-node1 nodetool compactionstats
pending tasks: 0
  • Pending Tasks: The number of compaction tasks that are currently pending. A value of 0 indicates that there are no pending compaction tasks.

Compaction is the process by which multiple SSTables (sorted string tables) are merged together to optimize storage and improve read performance. It helps to reduce the number of SSTables on disk and improve the efficiency of data retrieval operations.

3.Use nodetool command to monitor query performance

Histograms of latency metrics for different types of operations served by the proxy layer of Cassandra.

$ docker exec -it cassandra-node1 nodetool proxyhistograms
proxy histograms
Percentile       Read Latency      Write Latency      Range Latency   CAS Read Latency  CAS Write Latency View Write Latency
                     (micros)           (micros)           (micros)           (micros)           (micros)           (micros)
50%                      0.00               0.00               0.00               0.00               0.00               0.00
75%                      0.00               0.00               0.00               0.00               0.00               0.00
95%                      0.00               0.00               0.00               0.00               0.00               0.00
98%                      0.00               0.00               0.00               0.00               0.00               0.00
99%                      0.00               0.00               0.00               0.00               0.00               0.00
Min                      0.00               0.00               0.00               0.00               0.00               0.00
Max                      0.00               0.00               0.00               0.00               0.00               0.00
  • Percentile: The percentile value for which the latency is reported.
  • Read Latency (micros): The latency (in microseconds) for read operations.
  • Write Latency (micros): The latency (in microseconds) for write operations.
  • Range Latency (micros): The latency (in microseconds) for range operations.
  • CAS Read Latency (micros): The latency (in microseconds) for Compare-and-Set (CAS) read operations.
  • CAS Write Latency (micros): The latency (in microseconds) for Compare-and-Set (CAS) write operations.
  • View Write Latency (micros): The latency (in microseconds) for view write operations.

For each latency metric, the output provides percentile values ranging from 50% to 99%, as well as minimum (Min) and maximum (Max) values. In the output you provided, all latency values are reported as 0.00 microseconds, indicating that there were no latency measurements recorded during the sampling period.

Histograms of various metrics related to the specified table.

$ docker exec -it cassandra-node1 nodetool tablehistograms myblog.posts
myblog/posts histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             0.00              0.00              0.00                72                 3
75%             0.00              0.00              0.00                86                 3
95%             0.00              0.00              0.00                86                 3
98%             0.00              0.00              0.00                86                 3
99%             0.00              0.00              0.00                86                 3
Min             0.00              0.00              0.00                30                 0
Max             0.00              0.00              0.00                86                 3
  • Percentile: The percentile value for which the metrics are reported.
  • SSTables: The number of SSTables (sorted string tables) involved in the operations.
  • Write Latency (micros): The latency (in microseconds) for write operations on the table.
  • Read Latency (micros): The latency (in microseconds) for read operations on the table.
  • Partition Size (bytes): The size (in bytes) of partitions (rows) in the table.
  • Cell Count: The number of cells (columns) in the partitions.

Individual client sessions that have been traced

$ docker exec -it cassandra-node1 cqlsh -e "SELECT * FROM system_traces.sessions;"
 session_id | client | command | coordinator | coordinator_port | duration | parameters | request | started_at
------------+--------+---------+-------------+------------------+----------+------------+---------+------------
  • session_id: A unique identifier for the traced session.
  • client: The IP address or hostname of the client initiating the traced session.
  • command: The type of command or query executed in the session.
  • coordinator: The IP address or hostname of the node that acted as the coordinator for the session.
  • coordinator_port: The port number of the coordinator node.
  • duration: The duration of the session in milliseconds.
  • parameters: Additional parameters or settings associated with the session.
  • request: The text of the query or operation executed in the session.
  • started_at: The timestamp indicating when the session started.

If you’re not seeing any records in the system_traces.sessions table, it means that tracing has not been enabled, or there have been no traced sessions since tracing was enabled. Tracing is typically enabled on a per-session basis using the TRACING ON command in cqlsh or programmatically through the driver used to interact with Cassandra.

Step9. How to specify the heap_size

https://cassandra.apache.org/doc/3.11/cassandra/configuration/cass_yaml_file.html

1.Locate the ‘jvm.options’ file

Navigate to Cassandra installation directory.
The jvm.options file should be located in this directory.

$ docker exec -it cassandra-node1 bash

# Open the jvm.options file
root@xxxxxxxxxxxx:/# vim /etc/cassandra/jvm.options

2.Edit the ‘jvm.options’ file and set the Heap Size parameters

Find the HEAP SETTINGS part in the file and set the value.
For example, -Xms (initial heap size) and -Xmx (maximum heap size).

# It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
# the same value to avoid stop-the-world GC pauses during resize, and
# so that we can lock the heap in memory on startup to prevent any
# of it from being swapped out.
-Xms2G
-Xmx2G

In default these parameters may be not set.
In that case, the Heap size is automatically calculated by cassandra.

3.Restart Cassandra

$ docker-compose restart cassandra-node1

4.Verify the Heap Size with nodetool

$ docker exec -it cassandra-node1 nodetool info | grep 'Heap'
Heap Memory (MB)       : 321.38 / 2000.25
Off Heap Memory (MB)   : 0.00

Step10. How to specify the Concurrent Reads and Writes Parameters

1.Locate the ‘cassandra.yaml’ file

$ docker exec -it cassandra-node1 bash

# Open the cassandra.yaml file
root@xxxxxxxxxxxx:/# vim /etc/cassandra/cassandra.yaml

Navigate to the directory where Cassandra is installed.
Typically, the cassandra.yaml file is located in the conf directory within the Cassandra installation directory.

2.Edit the ‘cassandra.yaml’ file and set the Concurrent Reads and Writes Parameters

  • concurrent_reads: Specifies the number of simultaneous read operations that Cassandra can perform per data file.
  • concurrent_writes: Specifies the number of simultaneous write operations that Cassandra can perform per data file.
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
concurrent_reads: 40
concurrent_writes: 40

By default, these parameters are set to 32. You can adjust these values based on your workload requirements and hardware resources.

3.Restart Cassandra

$ docker-compose restart cassandra-node1

Step11. How to specify the Compaction Strategy

1.Understand current Compaction Strategy

cqlsh> DESCRIBE TABLE myblog.posts;

CREATE TABLE myblog.posts (
    id uuid PRIMARY KEY,
    author text,
    content text,
    title text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

-> SizeTieredCompactionStrategy
The default value for compaction_strategy is typically SizeTieredCompactionStrategy.

2.Change the CompactionStrategy

cqlsh> ALTER TABLE myblog.posts WITH compaction = {'class': 'LeveledCompactionStrategy'};
cqlsh> DESCRIBE TABLE myblog.posts;

CREATE TABLE myblog.posts (
    id uuid PRIMARY KEY,
    author text,
    content text,
    title text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Cassandra supports multiple compaction strategies, including:

  • SizeTieredCompactionStrategy: Merges SSTables based on their size, with larger SSTables being compacted first.
  • LeveledCompactionStrategy: Divides SSTables into levels and compacts SSTables within each level independently.
  • TimeWindowCompactionStrategy: Groups SSTables based on their creation time and compacts SSTables within each time window.

Step12. How to specify the Memtable Configuration

1.Locate the ‘cassandra.yaml’ file

$ docker exec -it cassandra-node1 bash

# Open the cassandra.yaml file
root@xxxxxxxxxxxx:/# vim /etc/cassandra/cassandra.yaml

2.Edit the ‘cassandra.yaml’ file and set the Memtable Configuration Prameters

memtable_flush_writers: 4
  • memtable_flush_writers: Specifies the number of threads used for flushing memtables to disk. Increasing this value can improve the performance of write operations by allowing multiple memtables to be flushed concurrently.

3.Restart Cassandra

$ docker-compose restart cassandra-node1

Step13. How to specify the Cache Configuration

1.Locate the ‘cassandra.yaml’ file

$ docker exec -it cassandra-node1 bash

# Open the cassandra.yaml file
root@xxxxxxxxxxxx:/# vim /etc/cassandra/cassandra.yaml

2.Edit the ‘cassandra.yaml’ file and set the Cache Configuration Parameters

# Default value is empty to make it "auto" (min(5% of Heap (in MB), 100MB)). Set to 0 to disable key cache.
key_cache_size_in_mb: 256

# Default value is 0, to disable row caching.
row_cache_size_in_mb: 512

# Default value is empty to make it "auto" (min(2.5% of Heap (in MB), 50MB)). Set to 0 to disable counter cache.
# NOTE: if you perform counter deletes and rely on low gcgs, you should disable the counter cache.
counter_cache_size_in_mb: 128

configuration:

  • key_cache_size_in_mb: Specifies the size of the key cache in megabytes. The key cache stores the keys of frequently accessed rows to speed up read operations.
  • row_cache_size_in_mb: Specifies the size of the row cache in megabytes. The row cache stores entire rows of frequently accessed tables to speed up read operations.
  • counter_cache_size_in_mb: Specifies the size of the counter cache in megabytes. The counter cache stores frequently accessed counter values to speed up read operations for counter columns.

3.Restart Cassandra

$ docker-compose restart cassandra-node1

Step14. How to specify the Native Transport Configuration

1.Locate the ‘cassandra.yaml’ file

$ docker exec -it cassandra-node1 bash

# Open the cassandra.yaml file
root@xxxxxxxxxxxx:/# vim /etc/cassandra/cassandra.yaml

2.Edit the ‘cassandra.yaml’ file and set the Native Transport Configuration

# port for the CQL native transport to listen for clients on
# For security reasons, you should not expose this port to the internet.  Firewall it if needed.
native_transport_port: 9042

# The maximum threads for handling requests when the native transport is used.
# This is similar to rpc_max_threads though the default differs slightly (and
# there is no native_transport_min_threads, idle threads will always be stopped
# after 30 seconds).
native_transport_max_threads: 16

# The maximum size of allowed frame. Frame (requests) larger than this will
# be rejected as invalid. The default is 256MB. If you're changing this parameter,
# you may want to adjust max_value_size_in_mb accordingly. This should be positive and less than 2048.
native_transport_max_frame_size_in_mb: 256

transport:

  • native_transport_port: Specifies the port number on which the native transport listens for client connections. Clients connect to this port to communicate with the Cassandra cluster using the native protocol.
  • native_transport_max_threads: Specifies the maximum number of threads allowed for handling native transport requests concurrently. Increasing this value can improve the throughput of client requests.
  • native_transport_max_frame_size_in_mb: Specifies the maximum frame size allowed for incoming messages over the native transport. Messages larger than this size will be rejected.

3.Restart Cassandra

$ docker-compose restart cassandra-node1

Step15. Learn about authentication and authorization

1.Edit the cassandra.yaml configuration file to enable authentication and authorization

$ docker exec -it cassandra-node1 bash

# Open the cassandra.yaml file
root@xxxxxxxxxxxx:/# vim /etc/cassandra/cassandra.yaml

authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer

This configuration sets the PasswordAuthenticator as the authentication mechanism.

CassandraAuthorizer as the authorizer.

2.Restart the Cassandra container

$ docker-compose restart cassandra-node1

3.Create a Superuser

$ docker exec -it cassandra-node1 cqlsh -u cassandra -p cassandra
cassandra@cqlsh> CREATE ROLE admin WITH SUPERUSER = true AND LOGIN = true AND PASSWORD = 'password';

4.Create a Regular User and Assign Permissions

cassandra@cqlsh> CREATE ROLE user1 WITH LOGIN = true AND PASSWORD = 'password';
cassandra@cqlsh> GRANT SELECT ON KEYSPACE myblog TO user1;
cassandra@cqlsh> exit

5.Try to use a Regular User (Can SELECT query, but can not INSERT query)

$ docker exec -it cassandra-node1 cqlsh -u user1 -p password
user1@cqlsh> USE myblog;
user1@cqlsh:myblog> INSERT INTO posts (id, title, content, author) VALUES (uuid(), 'User blog', 'Hello', 'User');
user1@cqlsh:myblog> SELECT * FROM posts;

 id                                   | author  | content     | title
--------------------------------------+---------+-------------+-------------
 6502fc83-c78d-49d9-8928-79a83cc6eb70 |    John |    I'm back |  Third blog
 81e302a7-cca0-4745-b002-14b9d6453acb | Charlie |      QUORUM |      QUORUM
 94b50ea4-f07b-4679-b04e-2e6a982187cb |     Bob |         ONE |         ONE
 1f13251d-7dd8-4d3d-8ce4-ccd667e7f458 |   Alice |         ALL |         ALL
 6f2a5531-85a8-4317-8459-2c3e94e38bc5 |    Mila | Good night. | Second blog

(5 rows)

user1@cqlsh:myblog> INSERT INTO posts (id, title, content, author) VALUES (uuid(), 'User blog', 'Hello', 'User');
Unauthorized: Error from server: code=2100 [Unauthorized] message="User user1 has no MODIFY permission on <table myblog.posts> or any of its parents"