How to determine how many disks can be lost in a Storage Spaces Direct Cluster S2d WSSD

 

One of my biggest concerns when I go to storage spaces direct disks is “how many disks can I loose safely”. Of course the answer is not entirely obvious.  I wanted to make myself some rules to go by, since I see many configurations.

Right off the top, I could see there are things to consider as decision points. But before  I define that,  I need to specify what commands return results I will use to see the Fault tolerance of Storage Spaces Direct. I settled on these commands:

 

  1. Get-ClusterFaultDomain
  2. Show-PrettyVolume.PS1
  3. Show-Prettypool.Ps1

 

Two of these Items are scripts. If you don’t use the scripts, you will need to use the individual commands.

 

The Decision Points:

 

#1. Single Parity is generally not good. If a Disk goes bad in the mirror, and then you lose the parity disk from another server, you may be in trouble. Requires 3 servers and covers one failure of the fault domain. three fault domains required. use a three way mirror instead

#2 If you have 2 Servers, you can only have a 2 way mirror.

#3. If you have three servers, you should use a three way mirror. Check the command results for show-prettyvolume.ps1

#4 Unless you have 2 servers only, Three way Mirror or Dual Parity are recommended as they are the only ones that can support 2 faults at a time. That would be two disks, or two servers failing.

#5 Dual parity and Mixed Parity+Mirror requires 4 servers. (3 way mirror ok too)

#6. Three Nodes Must use a Three way mirror to sustain 2 failures.

#7 Four Nodes can use three way mirror or dual parity to sustain 2 failures or mirror parity

These did not come from my own brain. They are generally all in Fault tolerance and storage efficiency in Storage Spaces Direct

 

So those are some guidelines. The bottom line is if you have less than 4 servers, I would get very nervous over one failure. I would fix one disk and complete that, before moving on. If you have 4 servers or more, Your generally OK with 2 failures , with robust support for resiliency. Nested resilience for 2019 is not the topic of this article. Review that topic here.

Just use this link and look at the examples near the bottom. This should give you a good idea what kind of situation you are in.

If you look at the charts, against your pool composition and Volumes, you should be able to understand. You can generally have one or Two faults, across your fault domains.

I wish I could be more specific. I know there is a calculator which could be made. Wait a minute! There is a calculator! Use this do get a picture of your situation. In addition, If you have reserve space, you can loose drives! They will rebuild in the reserve space. This is a key point. You can review that topic here. look under reserve capacity  You look to see the total size of the virtual disk is smaller then the pool. If there is reserved space, then you have a rebuild taking place after a drive failure. Read more about Reserve Capacity here.

 

I hope this helps some. I will be using this for the calculator and the Fault tolerance links. In addition, the show-PrettyVolume.ps1 comes in handy!

 

Thank you,

 

Louis Reeves

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s