BIG DISCLAIMER: This is post from a software engineer who is interested in the cryptocurrencies area but not an expert. I am not a cryptographer and the following advice may be inaccurate and even wrong in some important way. Having said that, it is written with the best of the intentions and to the best of my knowledge. Comments, suggestions and corrections are welcome.
Cryptocurrencies and other security oriented distributed systems (PGP keys management, SSH keys, SQRL, KeeypassXC password manager, etc) rely on some sort of secret that users must keep safe and outside of the reach of others as a foundation of their security model. As our world increasingly moves towards digital, these keys are becoming more and more valuable and we need to make sure we manage them responsibly.
Over the years, our industry has thrown that responsibility onto users and then it has asked them to use complex secrets and also asked them to change them frequently… This has been a huge industry mistake with obvious consequences and let me provide these two references as quick examples from a quick search: https://audit.wa.gov.au/reports-and-publications/reports/information-systems-audit-report-2018/audit-findings/ , https://digitalguardian.com/blog/uncovering-password-habits-are-users-password-security-habits-improving-infographic. This post is not close to fix this fundamental issue, but hopefully it gives a tool to help managing high-valuable secrets. Eg: one of these can be the unlocking key for your password manager 😉.
What are the fundamental problems to solve?
- Human brain is bad at storing many precise pieces of information.
- Provide a model that can be used to recover the secret in case of emergency (natural disaster, serious injury or even the death of the individual).
- Individuals must be able to decide exactly who can access to the secret and under which circumstances.
- Being able to operate without relying on any third party. Third parties may break the interfaces over the years or even disappear. They can also limit access to the secret or even become evil and not trustworthy.
- If possible, it should be based on well-known mechanisms and algorithms which guarantee that the secret can be recovered after years if the specific tooling used to create it is lost or not functioning anymore in the future.
What is this not trying to achieve?
- A daily tool for password management. There are great password managers out there. You can use this approach to store base secrets which encrypt those password stores.
- A secret generator mechanism. This is just about managing those secrets, but they must have been generated by other mechanism (crypto wallet, a trusted random number generator, etc)
- A mechanism to store large size data in a secure manner. We expect the length of information to be stored to be no longer than a paragraph of text.
What properties we want from the solution?
Beyond solving the fundamental problems we want to find a solution which has the following properties:
- We want to be able to hand-off some shares of the key to somewhat trusted individuals (eg: a close friend or relative) but we don’t want them to have access to the secret without our knowledge.
- In case of death we want those trusted individuals to be able to access the secret somehow.
- Our trust in given individuals may change over-time, so we need to prepare for one of them or even multiple of them becoming adversaries or even collude against us.
Proposed solution
Fortunately, we can stand on the shoulders of giants here and leverage mechanisms that are already invented and proven. We are going to use Shamir’s secrets sharing scheme to be able to split the secret in multiple pieces.
This scheme allows us to recover the original secret from a any subset of pieces that count up to a given number configured when creating the shares. For example, we can decide to split the key in 3 pieces and that any of two pieces among them will be able to recover the original secret. We will call that number of pieces to recover the original secret “threshold”.
We can recover the secret using any 2 of them:
Selecting the right number of shares and threshold
The number of pieces and threshold we want to use depends greatly on the use-case we want to fulfill. Given the properties we want to have, I’ll explain why I suggest to split the secret in 4 shares and make it recoverable with a threshold of 2. That is, any 2 of the 4 pieces will be able to recover the original secret.
Using a large number of shares can become hard to track and eventually lead to mistakes, so we will try to use the lowest number of shares that fulfills our requirements.
Lets follow the very unscientific method of going through the different options and identifying their pitfalls:
Threshold 1
1 share: This is just what we’ve got.
2 shares or more: You could use this for backup purposes, but as long as you use a threshold of 1, you can’t give the share to a third party without giving them full control of the key. Increasing the number of shares will always suffer from the same issue.
Threshold 2
2 shares: When the number of shares is equal to the threshold we can’t lose any share, so this approach doesn’t suits.
3 shares: This would be a simple, easy to track solution where we can keep 2 shares in our full control and give the remaining one to 1 or more people we trust (by copying it). If we lose one of our shares, we can ask them for the other, recover the secret and create fresh shares again.
4 shares or more: You can generate more shares for resiliency purposes so you can afford to lose more of them before having to ask another person for the share under their control. This also allows to store one of the extra shares in a escrow or a testament (* BIG ASTERISK) so the trusted party would be able to access the secret if something happens to you.
Threshold 3 and more
3 shares: Same situation as using 2 shares and a threshold of 2
4 shares and more: This allows for similar patterns as with a threshold of 2 and 3 shares. You may be able to elaborate more complicated trust models (eg: make sure that more than one trusted party have access to the secret if something happens to you) but it is also easy to increase complexity and make it hard to track all the possible scenarios. This added complexity can lead to combinations where others may access the key without you knowing or not being able to access all needed shares in a timely fashion if needed. So I strongly recommend you to think twice before increasing the threshold and number of shares.
Evaluating the proposal
As mentioned before, my recommendation would be to use 4 shares and a threshold of 2.
Lets go through the problems we want to solve and the properties we expect from the solution and see how well this approach fits.
- Human brain is bad at storing many precise pieces of information
- In this schema we don’t rely on any information stored in a particular human brain other than the location of the shares and instructions for recovery.
- Recovery instructions can be openly documented.
- The risk of losing secret’s location information can be mitigated by relying on several brains (giving a share to different individuals) or storing it in escrow documents or in vaults.
- Recover secret in case of emergency
- Having shares distributed among multiple individuals and geographies mitigates this risk.
- If we happen to lose our house or storage of the shares under our control, we can ask the trusted individuals to hand-over their shares to us and use the share stored in escrow or in the testament (* BIG ASTERISK). This would be the extreme case as we should have taken precautions to prevent losing the 2 keys under our full control at the same time.
- In case of our death or permanent disability, the individuals that we have given a share, can access the secret from the missing share unlocked by the escrow trustee or on testament (* BIG ASTERISK) read.
- Being able to decide exactly who can access to the secret and under which circumstances
- We decide who we provide one share of the secret.
- They won’t have access to the secret unless they get to know a second share. This is were we decide under which circumstances they get access to it.
- Being able to operate without relying on any third party
- We don’t rely on any third-party to operate
- If we store one share in a escrow or in our testament (* BIG ASTERISK) we would be relying in a third party to some point, but this is optional and we can also disclose one of the other shares under other conditions such that we remove any reliance on third-parties (eg: store the share in a very personal belonging).
- Based on well-known mechanisms and algorithms which guarantee that the secret can be recovered after years
- Shamir’s secret sharing scheme is a well-known mathematical construction.
- I have also built this ISO image which contains all needed tooling to create/reconstruct these keys. Even if amd64 architectures are deprecated in the future, chances are that the tooling is migrated to new paradigms. Even if it is not the case, it is almost certain that there will be emulators available to run this code.
- If we enter an era without technology available, chances are that the secrets would not be very useful anymore.
- We don’t want individuals with shares of the key to have access to the secret without our knowledge
- This is the main reason to give always the same share to those diferent trusted individuals.
- If we give them different shares, they could eventually collude against us, put their shares together and then they would have access to the secret without our knowledge.
- In case of death we want those trusted individuals to be able to access the secret somehow
- We achieve this by setting the right conditions under which we make the other share available to trusted individuals.
- Our trust in given individuals may change over-time, so we need to prepare for one of them or even multiple of them becoming adversaries or even collude against us
- We can create additional copies of the share that we have given to initially trusted individuals and give it to other trusted parties and change the conditions that would give them access to the additional needed piece.
- If we suspect the original trusted individuals may have been able to eventually get access to the other shares, we would need to stop using the secret, generate a new secret and generate new shares. We don’t want to be in this scenario.
Discussing common questions
Should I protect the secret with a password?
You should think twice as this may put the secret at risk of losing it if you are not able to remember the password or suffer some damage.
In a following section, we elaborate on a related pattern that leverages key derivation mechanisms to provide additional features as generating additional keys from a passphrase.
Why not give different shares to different trusted individuals?
The main reason is to prevent the risk of distributing enough shares against third parties so they are eventually able to reconstruct the secret using the different shares they own without our knowledge. We could distribute two different shares safely if we use a threshold of three with 4 or more pieces, but this have some drawbacks. For one, it introduces complexity that can lead to mistakes. It also makes the recovery scheme harder as those individuals now would require two additional shares instead of one, so what if one of them lose their piece?
Should I use steganography techniques?
According to Wikipedia steganography is:
the practice of concealing a message within another message or a physical object.
Wikipedia quote
In plain English, it is any technique to hide a secret within other message or object. For example, hide the secret within a picture or within the pages of a book.
This can be a powerful technique to hide the shares from non-trusted parties. I think it can provide great value. The main risk of its usage would be losing the ability to recover the share from its hidden form.
Doesn’t this creates a single point of failure for my digital life?
It does and that is why you need to take this very seriously. You need to handle the shares responsibly and make sure they are stored in an offline hard and resistant form (steel pieces, grave in jewels, grave in stone, etc).
You could create multiple keys and store them separately a for different purposes, this would be a good idea to reduce the impact of losing or getting the shares compromised. However, humans are not good at managing complexity and this can quickly become hard to manage. That could introduce greatest risks that the one we are trying to mitigate. Be very conscious of this issue.
One big theme of this post is to recognize that we may want the secret to survive us in a way that trusted parties can use it if something happens to us or we become disabled. Creating too many other keys may prevent this from happening.
How to reduce the usage of the key and generate additional keys
We may rightly think that using the same key for multiple applications and knowing it will be a key that we can’t easily change is troublesome (ie: we’ll need to regenerate physical shares, distribute them, etc.). To solve this problem, I propose the use the key we are managing as a master key and then combine it with different passphrases and a key derivation algorithm (we will use scrypt) to generate different keys.
This brings great properties:
- The key derivation algorithm is deterministic. This means that using the same key and passphrase, you will get the same derived key
- If a derived key is compromised, you can keep the same master key, change the passphrase and get a new key that you can use without having to recreate and redistribute the master key shares
- You can share different set of passphrases with different trusted individuals so they get access to different set of derived keys when they get are granted access to all needed shares
- It also removes the criticality of losing control over the master key shares as without the passphrases, a holder with all required shares won’t be able to obtain derived keys. You should still regenerate the master key if this happens
- Key derivation algorithm is able to generate arbitrary length keys, so we can generate keys for multiple purposes
Following this model, we are adding an additional piece of information (passphrase and the key derivation parameters) that is needed to recover the keys and we need to manage it carefully. We need to make sure we keep these passphrases and key derivation parameters are well protected against lost and unauthorized access.
Scrypt key derivation algorithm is recommended in this post mainly because the more modern Argon2 and winner of 2015 Password Hashing Competition has multiple variants that introduces concerns around usability and ease of use.
Keys hierarchy proposal
Now that we know how to generate secure derived keys, the following setup allows to reduce the use of the master key as much as possible and also allows to recover if the key we use to derive the day-to-day keys gets compromise.
We can call the ‘Master key ‘ a ‘Frozen key ‘. Once it is generated, it should never ever more be needed in a lifetime unless the ‘Cold key’ is compromised or we pass away and someone else reconstructs it form the different shares.
Having the concept of a ‘cold key ‘ allows us to generate a new key if it gets compromised without having to update the hard to update ‘Frozen key ‘. However, we would still need to regenerate all the application’s keys. The cold key is still a highly valuable key that we need to protect. Having to regenerate all of the ‘app keys ‘ in case of a compromise is a tedious process.
The app keys can be regenerated without having to regenerate the ‘cold key’, which allows for easier management of app keys.
Storing the crypto wallets seed words
Up to this point, this post has treated secrets in a generic way, I have made no assumptions on what that secret is.
Given that these days crypto wallets are one of the important use-cases that requires us to store high-value secrets, I want to address it here as well.
The de-facto standard to represent the private key is the BIP39 seed phrase, so this is going to be our target piece of information to store. The process to store these keys would be updated as follows:
You’ll might wonder: “Can’t we just store the words list?”
The words list have some great properties like checksum capabilities and being actual dictionary words also provides a level of human-driven error correction characteristics. However, lets see what happens trying to apply Shamir’s secret scheme to a 24 words list using
ssss-
rs tool:
This would be the sequence of commands:
$ ssss-rs split -t2 -s4 -i "consider scale during lesson again dust digital deal term broccoli north insect indoor spray chicken erode unit face skull inch fan movie oval staff" 010ab06f828319abcd4be461a591f2ff2e5bdaad7f0f66ed6ca778007c5115f5a32fd9aaa186b9edee92e0315a1a8c0b7fa02b0bf895c1349d72527d88c86a1c6be6d740ff88660e1904824083ec5b23b93477022aba6012d9ed570fd8b822598a79ec20ef34c388933c1928333bf6da48d98b88e39654c945dbe5c58c8743755f883bc7b0f5e0870583dcaa8fe05320f2f50633c6
02b1ca6c8aa69ee217f64667f28d5085f02939fa4cb7ac7577c065b14ac28958fee51b2ff588fc5da79360cb0fa8a0a29ef7f9b55f5105c7b753c45c9d3a719d67630ee057ba5a808a68a432886c13da09d35ca8e5de56443c5138bd200be10ab4577eef7708329d8cd49d30f9c44c33f003aeae72573d34151965f1b8a72352dea1d5271b466a8fb1b2c3fe9f7812206a6dafcc3d
03d8156d7b4ce32ca89dd1653670c75aba07913e5ddfeaf47e146ede58b3fdca3ca3aca5307b36c46965e99d3cc64dc5c133b7dfcbe4b096584cb643679d78e263e9b080c65d4efafb4c4f1c78e52b8d908e45cea00b447696cc1dd38193a03b574df9aaf61c9467708ce138bf91d39d98bc4445f4e11a9625aeec145d4e034fa14f8f8e8bdee57edd543f3b66f92d20ebecc8999d
04dc3e6a9aec8b70b897196b5cb50f7157cde4542adc235e410e5fc826ffaa19446a843e5d94762635917b24a5d7f8eb475946d20ac2963ae311f31eb7c547847f72a7bb1cde2287b7b0e8d69e77833372060ae760163ae8ed32e6c2cb767cacc80b416a5c70cbb7b21f8e00762123fa9bace4e24bceefd5b5867e99d0e7e31cc7f312fc563b659fc2d0fd56bf5390204146e629d0
Looking at one of the shares (share 1):
010ab06f828319abcd4be461a591f2ff2e5bdaad7f0f66ed6ca778007c5115f5a32fd9aaa186b9edee92e0315a1a8c0b7fa02b0bf895c1349d72527d88c86a1c6be6d740ff88660e1904824083ec5b23b93477022aba6012d9ed570fd8b822598a79ec20ef34c388933c1928333bf6da48d98b88e39654c945dbe5c58c8743755f883bc7b0f5e0870583dcaa8fe05320f2f50633c6
For example, if we wanted to engrave this key in steel we would need to engrave around 300 characters per share. That is a lot of work for a manual process. We want to be able to use a manual process to reduce the risk of exposing the key to vulnerable technology systems.
How could we solve this problem?
As we have mentioned, the words list is a representation of an underlying secret, usually referred with the term “entropy”. For a words list of 24 words, the entropy value has a length of 64 characters (Hex representation of the key). For this example it would be:
2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e
Which we can split in a much shorter share:
$ ssss-rs split -t2 -s4 -i 2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e
01ebc0feff3808c88b50f41f562a1d58b31d19fe9d435cd65e09c6c80c099f07d46b2fc534c9fb2fa3b10aacfb54f557e4d0175022e81fa5e7802c4a790aab0da5
029b31b8ad2043d85d05a36bfc0899f8357298be6a23eb1b1648c2c3b74d6e5b109e07c43b2ab2fb16d6b719b1075d0d70ee7bf6e29e75078a4df4c2a1bb1440fe
034297736a287a21e63667479a16e598be57e777ce0386a92e773733de71c86fa5cd1f323e827cb78c02dc837e36cc3bf50d5f94a245539058ffbcbae9d4887b3e
047bc8340910d5f8eaaf0d83b34c8aa322ac813e9fe39e9a86cacad5dac597e3836f57c625f720486718d66825a116b94392a3a17972a15850cc5fc90ac271da48
This are 131 characters per share, which is a great improvement and a lot of work saved. In addition, as we humans tend to be prone to make mistakes, we give less opportunities for them to happen during the transcription and reading of the share.
It is true that when we use the entropy value, we lose all the error correction capabilities of the seed, so it is very important to validate that the values we have stored in the shares are correct by performing the recovery process at shares creation time. One of the benefits of having multiple shares and copies of them is that if one share becomes unreadable we can still use the other available shares. One additional measure to protect against errors during recovery of the entropy value is to record a short version of the hash alongside with the shares:
$ echo "2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e" | sha256sum | cut -c-6
7618e2
This hash value (ie:
7618e2
in the example) allows us to validate that the value returned by
ssss-rs combine
is the original entropy value. Note that if we made a mistake entering the shares
ssss-rs combine
will still return a result. Chances are that this value is in a binary form which would give us a clue that something was off, but the hash provides us with much more certainty.
Commands to create shares from words list
This section summarizes previous point in a list of commands that you can run to generate the shares from the words list. We will use the hal command to obtain the entropy value from the words list and to obtain the words list from the entropy:
$ hal bip39 get-seed "your BIP words" | jq -r .entropy
<entropy value>
$ echo "<entropy value>" | sha256sum | cut -c-6
<truncated sha256sum>
$ ssss-rs split -t2 -s4 -i <entropy value>
<shares list>
You can recover the original words list with the following commands:
$ ssss-rs combine <share x> <share y>
Recovered key: <recovered entropy>
Recovered key in base64: <base64 encoded entropy value>
BIP39 words list representation: <entropy words list representation>
$ echo "<entropy value from previous command>" | sha256sum | cut -c-6
<truncated sha256sum>
$ hal bip39 generate -w <number of words> --entropy <entropy value>
<words list>
During the process, it is important to validate that the
<truncate sha256>
value is the same when generating the shares and at time of recovering the words list.
Lets not use the master key
In the previous sections we have stored and load a BIP39 seed assuming it was the master key. The problem of these approach is that we are not protecting our wallet with a passphrase and we are linking our wallet to the master key. Having to regenerate shares and impacting any other uses of the key if we happen to expose the wallet seed.
To mitigate this issue, I propose to reverse the flow in some way. Good hardware wallets makers spend a lot of effort in creating good random keys, so it is a good idea to keep using them to generate the random key. However, after that, I propose to not use that key as a wallet key at all and use a derived key from it and a passphrase instead.
It is very important that you store the passphrases and key derivation parameters in a safe place as without them, you won’t be able to obtain the derived key.
To complete the toolbox, I have wrapped Rust crypto scrypt library in a simple cli tool that we can use to generate derived keys. Due to the nature of scrypt, the default parameters may take a while to generate the derived key.
To generate a derived key, we pass the master key as input to the
scrypt-rs
command and set the passphrase as the
salt
. We also need to set the key derivation parameters accordingly to the type of key we want to generate. For example, if we want to generate a new key for a new crypto wallet we want to store in our phone we could use the following command (remember to check that the master key is correct checking it against the hash we stored when generating the shares):
$ echo 2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e | scrypt-rs -l32 -s "My phone wallet is cool"
Input | Salt: "My phone wallet is cool"
Input | Normalized passphrase: "2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e"
Input | Scrypt parameters: cost factor 19 - blocksize 8 - parallelization 2 - key length in bytes 32
Output| Scrypt derived key in hexadecimal: f3ebf69d0d70315717b746f05b1a2ab7c9686a79a7f5494aedbcb3c834750949
Output| Scrypt derived key in base64: 8+v2nQ1wMVcXt0bwWxoqt8loanmn9UlK7byzyDR1CUk=
Output| Scrypt BIP39 words list representation: view garden point brain adapt process gain trip utility sugar melt hurdle notable cry track wrong enable first hundred guide local dentist cement cargo
I also suggest keeping a record of the hashes of the combination of the master key, passphrase and key derivation parameters combined. Eg:
$ echo "2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e|19|8|2|32|My phone wallet is cool" | sha256sum | cut -c-6
3dc63d
Then we could keep a record of these results without the master key in a confidential notebook, file, steel plate, etc. Treat it as confidential and keep multiple copies in different locations:
"<cost factor>|<block size>|<parallelism>|<key size>|<passphrase>": <short sha256 hash>
"19|8|2|32|My phone wallet is cool": 3dc63d
Wrapping up
I hope this post helps people storing high valuable and long-lived keys in a way that is resilient and that scales to the multiple applications we are expected to use during our life-time which are backed by a long-living secret.
As mentioned in the post, all needed tooling proposed in the post is provided in the ISO I release in this project.
I am more than happy to discuss any ideas/concerns around this proposal in the comments (bare with me if I take some time to approve the comments).
* BIG ASTERISK: Beware that in many jurisdictions testaments are public, therefore you would need to assume that share is going to be exposed. In these situations this is not a good idea.
UPDATES:
- 2021/04/22
- Replace usage of
ssss-split
andssss-combine
commands withssss-rs
to enable further developments - Replace the term
part
forshare
as it is a more commonly used term
- Replace usage of
- 2021/04/24
- Add note to warn about those jurisdictions where testaments are public records
- 2022/05/09
- Add paragraph describing hierarchical keys scheme