Understanding Shards and Replicas in Elasticsearch

Sarcastic Writer April 2, 2024No CommentsTech

Elasticsearch, a distributed search server, employs a concept known as a Shard to distribute index documents across all nodes. This approach is particularly useful when an index potentially stores a large amount of data that can exceed the hardware limits of a single node.

For instance, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

To address this issue, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Documents are stored in shards, and shards are allocated to nodes in your cluster. As your cluster grows or shrinks, Elasticsearch will automatically migrate shards between nodes so that the cluster remains balanced.

A shard can be either a primary shard or a replica shard. Each document in your index belongs to a single primary shard, so the number of primary shards that you have determines the maximum amount of data that your index can hold. A replica shard is just a copy of a primary shard.

A Replica Shard is the copy of a primary Shard, designed to prevent data loss in case of hardware failure. Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. An index can also be replicated zero (meaning no replicas) or more times.

The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the-fact.

By default, each index in Elasticsearch is allocated 5 primary Shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.

In conclusion, the concepts of shards and replicas are fundamental to understanding how Elasticsearch ensures efficient data storage and retrieval across nodes, while also providing a safeguard against data loss.