BIG DISCLAIMER: This is post from a software engineer who is interested in the cryptocurrencies area but not an expert. I am not a cryptographer and the following advice may be inaccurate and even wrong in some important way. Having said that, it is written with the best of the intentions and to the best of my knowledge. Comments, suggestions and corrections are welcome.
I expect to update this post with new learnings and results of the conversations with the community.
Cryptocurrencies and other security oriented distributed systems (PGP keys management, SSH keys, SQRL, KeeypassXC password manager, etc) rely on some sort of secret that users must keep safe and outside of the reach of others as a foundation of their security model. As our world increasingly moves towards digital, these keys are becoming more and more valuable and we need to make sure we manage them responsibly.
Over the years, our industry has thrown that responsibility onto users and then it has asked them to use complex secrets and also asked them to change them frequently… This has been a huge industry mistake with obvious consequences and let me provide these two references as quick examples from a quick search: https://audit.wa.gov.au/reports-and-publications/reports/information-systems-audit-report-2018/audit-findings/ , https://digitalguardian.com/blog/uncovering-password-habits-are-users-password-security-habits-improving-infographic. This post is not close to fix this fundamental issue, but hopefully it gives a tool to help managing high-valuable secrets. Eg: one of these can be the unlocking key for your password manager 😉.
What are the fundamental problems to solve?
- Human brain is bad at storing many precise pieces of information.
- Provide a model that can be used to recover the secret in case of emergency (natural disaster, serious injury or even the death of the individual).
- Individuals must be able to decide exactly who can access to the secret and under which circumstances.
- Being able to operate without relying on any third party. Third parties may break the interfaces over the years or even disappear. They can also limit access to the secret or even become evil and not trustworthy.
- If possible, it should be based on well-known mechanisms and algorithms which guarantee that the secret can be recovered after years if the specific tooling used to create it is lost or not functioning anymore in the future.
What is this not trying to achieve?
- A daily tool for password management. There are great password managers out there. You can use this approach to store base secrets which encrypt those password stores.
- A secret generator mechanism. This is just about managing those secrets, but they must have been generated by other mechanism (crypto wallet, a trusted random number generator, etc)
- A mechanism to store large size data in a secure manner. We expect the length of information to be stored to be no longer than a paragraph of text.
What properties we want from the solution?
Beyond solving the fundamental problems we want to find a solution which has the following properties:
- We want to be able to hand-off some parts of the key to somewhat trusted individuals (eg: a close friend or relative) but we don’t want them to have access to the secret without our knowledge.
- In case of death we want those trusted individuals to be able to access the secret somehow.
- Our trust in given individuals may change over-time, so we need to prepare for one of them or even multiple of them becoming adversaries or even collude against us.
Fortunately, we can stand on the shoulders of giants here and leverage mechanisms that are already invented and proven. We are going to use Shamir’s secrets sharing scheme to be able to split the secret in multiple pieces.
This scheme allows us to recover the original secret from a any subset of pieces that count up to a given number configured when creating the parts. For example, we can decide to split the key in 3 pieces and that any of two pieces among them will be able to recover the original secret. We will call that number of pieces to recover the original secret “threshold”.
We can recover the secret using any 2 of them:
Selecting the right number of parts and threshold
The number of pieces and threshold we want to use depends greatly on the use-case we want to fulfill. Given the properties we want to have, I’ll explain why I suggest to split the secret in 4 parts and make it recoverable with a threshold of 2. That is, any 2 of the 4 pieces will be able to recover the original secret.
Using a large number of parts can become hard to track and eventually lead to mistakes, so we will try to use the lowest number of parts that fulfills our requirements.
Lets follow the very unscientific method of going through the different options and identifying their pitfalls:
1 part: This is just what we’ve got.
2 parts or more: You could use this for backup purposes, but as long as you use a threshold of 1, you can’t give the part to a third party without giving them full control of the key. Increasing the number of parts will always suffer from the same issue.
2 parts: When the number of parts is equal to the threshold we can’t lose any part, so this approach doesn’t suits.
3 parts: This would be a simple, easy to track solution where we can keep 2 parts in our full control and give the remaining one to 1 or more people we trust (by copying it). If we lose one of our parts, we can ask them for the other, recover the secret and create fresh parts again.
4 parts or more: You can generate more parts for resiliency purposes so you can afford to lose more of them before having to ask another person for the part under their control. This also allows to store one of the extra parts in a escrow or a testament so the trusted party would be able to access the secret if something happens to you.
Threshold 3 and more
3 parts: Same situation as using 2 parts and a threshold of 2
4 parts and more: This allows for similar patterns as with a threshold of 2 and 3 parts. You may be able to elaborate more complicated trust models (eg: make sure that more than one trusted party have access to the secret if something happens to you) but it is also easy to increase complexity and make it hard to track all the possible scenarios. This added complexity can lead to combinations where others may access the key without you knowing or not being able to access all needed parts in a timely fashion if needed. So I strongly recommend you to think twice before increasing the threshold and number of parts.
Evaluating the proposal
As mentioned before, my recommendation would be to use 4 parts and a threshold of 2.
Lets go through the problems we want to solve and the properties we expect from the solution and see how well this approach fits.
- Human brain is bad at storing many precise pieces of information
- In this schema we don’t rely on any information stored in a particular human brain other than the location of the parts and instructions for recovery.
- Recovery instructions can be openly documented.
- The risk of losing secret’s location information can be mitigated by relying on several brains (giving a part to different individuals) or storing it in escrow documents or in vaults.
- Recover secret in case of emergency
- Having parts distributed among multiple individuals and geographies mitigates this risk.
- If we happen to lose our house or storage of the parts under our control, we can ask the trusted individuals to hand-over their parts to us and use the part stored in escrow or in the testament. This would be the extreme case as we should have taken precautions to prevent losing the 2 keys under our full control at the same time.
- In case of our death or permanent disability, the individuals that we have given a part, can access the secret from the missing part unlocked by the escrow trustee or on testament read.
- Being able to decide exactly who can access to the secret and under which circumstances
- We decide who we provide one part of the secret.
- They won’t have access to the secret unless they get to know a second part. This is were we decide under which circumstances they get access to it.
- Being able to operate without relying on any third party
- We don’t rely on any third-party to operate
- If we store one part in a escrow or in our testament we would be relying in a third party to some point, but this is optional and we can also disclose one of the other parts under other conditions such that we remove any reliance on third-parties (eg: store the part in a very personal belonging).
- Based on well-known mechanisms and algorithms which guarantee that the secret can be recovered after years
- Shamir’s secret sharing scheme is a well-known mathematical construction.
- I have also built this ISO image which contains all needed tooling to create/reconstruct these keys. Even if amd64 architectures are deprecated in the future, chances are that the tooling is migrated to new paradigms. Even if it is not the case, it is almost certain that there will be emulators available to run this code.
- If we enter an era without technology available, chances are that the secrets would not be very useful anymore.
- We don’t want individuals with parts of the key to have access to the secret without our knowledge
- This is the main reason to give always the same part to those diferent trusted individuals.
- If we give them different parts, they could eventually collude against us, put their parts together and then they would have access to the secret without our knowledge.
- In case of death we want those trusted individuals to be able to access the secret somehow
- We achieve this by setting the right conditions under which we make the other part available to trusted individuals.
- Our trust in given individuals may change over-time, so we need to prepare for one of them or even multiple of them becoming adversaries or even collude against us
- We can create additional copies of the part that we have given to initially trusted individuals and give it to other trusted parties and change the conditions that would give them access to the additional needed piece.
- If we suspect the original trusted individuals may have been able to eventually get access to the other parts, we would need to stop using the secret, generate a new secret and generate new parts. We don’t want to be in this scenario.
Discussing common questions
Should I protect the secret with a password?
You should think twice as this may put the secret at risk of losing it if you are not able to remember the password or suffer some damage.
In a following section, we elaborate on a related pattern that leverages key derivation mechanisms to provide additional features as generating additional keys from a passphrase.
Why not give different parts to different trusted individuals?
The main reason is to prevent the risk of distributing enough parts against third parties so they are eventually able to reconstruct the secret using the different parts they own without our knowledge. We could distribute two different parts safely if we use a threshold of three with 4 or more pieces, but this have some drawbacks. For one, it introduces complexity that can lead to mistakes. It also makes the recovery scheme harder as those individuals now would require two additional parts instead of one, so what if one of them lose their piece?
Should I use steganography techniques?
According to Wikipedia steganography is:
the practice of concealing a message within another message or a physical object.Wikipedia quote
In plain English, it is any technique to hide a secret within other message or object. For example, hide the secret within a picture or within the pages of a book.
This can be a powerful technique to hide the parts from non-trusted parties. I think it can provide great value. The main risk of its usage would be losing the ability to recover the part from its hidden form.
Doesn’t this creates a single point of failure for my digital life?
It does and that is why you need to take this very seriously. You need to handle the parts responsibly and make sure they are stored in an offline hard and resistant form (steel pieces, grave in jewels, grave in stone, etc).
You could create multiple keys and store them separately a for different purposes, this would be a good idea to reduce the impact of losing or getting the parts compromised. However, humans are not good at managing complexity and this can quickly become hard to manage. That could introduce greatest risks that the one we are trying to mitigate. Be very conscious of this issue.
One big theme of this post is to recognize that we may want the secret to survive us in a way that trusted parties can use it if something happens to us or we become disabled. Creating too many other keys may prevent this from happening.
How to reduce the usage of the key and generate additional keys
We may rightly think that using the same key for multiple applications and knowing it will be a key that we can’t easily change is troublesome (ie: we’ll need to regenerate physical parts, distribute them, etc.). To solve this problem, I propose the use the key we are managing as a master key and then combine it with different passphrases and a key derivation algorithm (we will use scrypt) to generate different keys.
This brings great properties:
- The key derivation algorithm is deterministic. This means that using the same key and passphrase, you will get the same derived key
- If a derived key is compromised, you can keep the same master key, change the passphrase and get a new key that you can use without having to recreate and redistribute the master key parts
- You can share different set of passphrases with different trusted individuals so they get access to different set of derived keys when they get are granted access to all needed parts
- It also removes the criticality of losing control over the master key parts as without the passphrases, a holder with all required parts won’t be able to obtain derived keys. You should still regenerate the master key if this happens
- Key derivation algorithm is able to generate arbitrary length keys, so we can generate keys for multiple purposes
Following this model, we are adding an additional piece of information (passphrase and the key derivation parameters) that is needed to recover the keys and we need to manage it carefully. We need to make sure we keep these passphrases and key derivation parameters are well protected against lost and unauthorized access.
Scrypt key derivation algorithm is recommended in this post mainly because the more modern Argon2 and winner of 2015 Password Hashing Competition has multiple variants that introduces concerns around usability and ease of use.
Storing the crypto wallets seed words
Up to this point, this post has treated secrets in a generic way, I have made no assumptions on what that secret is.
Given that these days crypto wallets are one of the important use-cases that requires us to store high-value secrets, I want to address it here as well.
The de-facto standard to represent the private key is the BIP39 seed phrase, so this is going to be our target piece of information to store. The process to store these keys would be updated as follows:
You’ll might wonder: “Can’t we just store the words list?”
The words list have some great properties like checksum capabilities and being actual dictionary words also provides a level of human-driven error correction characteristics. However, lets see what happens trying to apply Shamir’s secret scheme to a 24 words list using
First thing, we need to split the words list in two as
can split a max of 128 ASCII characters at a time.
This would be the sequence of commands:
$ cat words_list.txt consider scale during lesson again dust digital deal term broccoli north insect indoor spray chicken erode unit face skull inch fan movie oval staff $ split -n2 words_list.txt $ ls words_list.txt xaa xab $ cat xaa consider scale during lesson again dust digital deal term broccoli north i $ cat xab nsect indoor spray chicken erode unit face skull inch fan movie oval staff $ cat xaa | ssss-split -n4 -t2 Generating shares using a (2,4) scheme with dynamic security level. Enter the secret, at most 128 ASCII characters: Using a 592 bit security level. 1-3d1e7db34a985f9a4774f234545ba0c0406da40716a7f9c9526ecb912ace748a0f4f8012437c64af8d65483a32e51f82781ad8a3d0779bb0df4642d5776e48108c422f0f3f7e1df0df84 2-942785a66d9a5d40dd5221a8dbffd391cd79c10aa133bbf8886be6675b4b0867fcfcce27e3af0773ad152765c45553b8e73fd242ce7020ce1256aaf096ab8b7e9023e4e32f0c7b423655 3-f330d255709ba30954b06f235e9c02a1498a1df1cc407a17c19702ca8bc823c3526df43483e1d9c7b2c50250963a97ae92232be23b8d49e456a6f2ec361735a49bfca24720dda6d39118 4-c654758c239e58f5e91f8691c4b73532d7510b11ce1b3f9b3c61bd8bb841f1bc1b9a524ca209c0cbedf5f9da2935cbcdd975c780f27f563388777abb55200da2a8e0733b0fe8b627c5a0 $ cat xab | ssss-split -n4 -t2 Generating shares using a (2,4) scheme with dynamic security level. Enter the secret, at most 128 ASCII characters: Using a 592 bit security level. 1-d23be382d0c769244f5d9949bf667d319104308c88d595b9373d076005600b49fb493b8fe63c48c58bcd47bba02f99f52452eba125add3d0bc9824a4fa6cd358000b21c4dfd0e67df228 2-725e29b91f0564d6dd2be35f7ddd1f8bf9258912fd4ed31522c222437f1d458d992899d6b5852a56cbe499c5bc13a654e833e87010c9f13f124196e243d8b602f0d2df2118a1badb2b9c 3-ed8290505a449f8753063552c3b43e1ddec51e672e3811712e68c15da9367fce470807e184edf427f403d3efb7f84ccbac1316c0fc15ef65880907202b4b6acb5f658a825a7171468335 4-3295bdce80817f33f9c71772f8abdaff2966fa2e16785e4d093c68058be7d8055debdd6412f7ef704bb72539846bd91770f1efd27a01b4e04ff2f26f30b07cb7116122ea96430396b8a3
Looking at one of the parts (part 1), it would be comprise of these strings:
For example, if we wanted to engrave this key in steel we would need to engrave around 300 characters per part. That is a lot of work for a manual process. We want to be able to use a manual process to reduce the risk of exposing the key to vulnerable technology systems.
How could we solve this problem?
As we have mentioned, the words list is a representation of an underlying secret, usually referred with the term “entropy”. For a words list of 24 words, the entropy value has a length of 64 characters (Hex representation of the key). For this example it would be:
Which we can split in a much shorter parts:
$ echo "2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e" | ssss-split -n4 -t2 Generating shares using a (2,4) scheme with dynamic security level. Enter the secret, at most 128 ASCII characters: Using a 512 bit security level. 1-f52d46629f1ee483a921de79e7d2fb571d72b4d011176dbbe1d70e608be7e5bf1abec14c315de86682910125c1bc5e7c9dbc55e21bfb0d4462119d6195a060a3 2-1df2456964e8572bc3cbadd8934874bd41a93401e4a9943f4e4cb3f15e63ff1e887f2b811640cf1ce8775ca422b06ddb11e4daae34bd3b3e74ef335c78d02350 3-45b8bb903245c64c1a6d8347bf3e0e1b75e04bb1483c3cbcd4c5d881ed1ff681f9c0723a0b4bd23531d5682483b47cb995d35f95d180d6e8794556b723ffe203 4-cc4c437e9305307b161f4a9a7a7d6b69f81e35a20fd46736117bc8d2f56bca5dadfcfe1b587a81e83dbbe7a7e4a80a940955c4366a3157ca59126f27a230a58d
This are 130 characters per part, which is a great improvement and a lot of work saved. Also, as we humans tend to be prone to make mistakes, we give less opportunities for them to happen during the transcription and reading of the parts.
It is true that when we use the entropy value, we lose all the error correction capabilities of the seed, so it is very important to validate that the values we have stored in the parts are correct by performing the recovery process at parts creation time. One of the benefits of having multiple parts and copies of them is that if one part becomes unreadable we can still use the other available parts. One additional measure to protect against errors during recovery of the entropy value is to record a short version of the hash alongside with the parts:
$ echo "2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e" | sha256sum | cut -c-6 7618e2
This hash value (ie:
in the example) allows us to validate that the value returned by
is the original entropy value. Note that if we made a mistake entering the parts
will still return a result. Chances are that this value is in a binary form which would give us a clue that something was off, but the hash provides us with much more certainty.
Commands to create parts from words list
This section summarizes previous point in a list of commands that you can run to generate the parts from the words list. We will use the hal command to obtain the entropy value from the words list and to obtain the words list from the entropy:
$ hal bip39 get-seed "your BIP words" | jq -r .entropy <entropy value> $ echo "<entropy value>" | sha256sum | cut -c-6 <truncated sha256sum> $ echo "<entropy value>" | ssss-split -t2 -n4 <parts list>
You can recover the original words list with the following commands:
$ ssss-combine -t2 <enter required parts> $ echo "<entropy value from previous command>" | sha256sum | cut -c-6 <truncated sha256sum> $ hal bip39 generate -w <number of words> --entropy <entropy value> <words list>
During the process, it is important to validate that the
value is the same when generating the parts and at time of recovering the words list.
Lets not use the master key
In the previous sections we have stored and load a BIP39 seed assuming it was the master key. The problem of these approach is that we are not protecting our wallet with a passphrase and we are linking our wallet to the master key. Having to regenerate parts and impacting any other uses of the key if we happen to expose the wallet seed.
To mitigate this issue, I propose to reverse the flow in some way. Good hardware wallets makers spend a lot of effort in creating good random keys, so it is a good idea to keep using them to generate the random key. However, after that, I propose to not use that key as a wallet key at all and use a derived key from it and a passphrase instead.
It is very important that you store the passphrases and key derivation parameters in a safe place as without them, you won’t be able to obtain the derived key.
To complete the toolbox, I have wrapped Rust crypto scrypt library in a simple cli tool that we can use to generate derived keys. Due to the nature of scrypt, the default parameters may take a while to generate the derived key.
To generate a derived key, we pass the master key as input to the
command and set the passphrase as the
. We also need to set the key derivation parameters accordingly to the type of key we want to generate. For example, if we want to generate a new key for a new crypto wallet we want to store in our phone we could use the following command (remember to check that the master key is correct checking it against the hash we stored when generating the parts):
$ echo 2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e | scrypt-rs -l32 -s "My phone wallet is cool" Input | Salt: "My phone wallet is cool" Input | Normalized passphrase: "2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e" Input | Scrypt parameters: cost factor 19 - blocksize 8 - parallelization 2 - key length in bytes 32 Output| Scrypt derived key in hexadecimal: f3ebf69d0d70315717b746f05b1a2ab7c9686a79a7f5494aedbcb3c834750949 Output| Scrypt derived key in base64: 8+v2nQ1wMVcXt0bwWxoqt8loanmn9UlK7byzyDR1CUk= Output| Scrypt BIP39 words list representation: view garden point brain adapt process gain trip utility sugar melt hurdle notable cry track wrong enable first hundred guide local dentist cement cargo
I also suggest keeping a record of the hashes of the combination of the master key, passphrase and key derivation parameters combined. Eg:
$ echo "2f580110c0304a888f79c1df638e593a8731a5c9ea64edaa332b39252d21e76e|19|8|2|32|My phone wallet is cool" | sha256sum | cut -c-6 3dc63d
Then we could keep a record of these results without the master key in a confidential notebook, file, steel plate, etc. Treat it as confidential and keep multiple copies in different locations:
"<cost factor>|<block size>|<parallelism>|<key size>|<passphrase>": <short sha256 hash> "19|8|2|32|My phone wallet is cool": 3dc63d
I hope this post helps people storing high valuable and long-lived keys in a way that is resilient and that scales to the multiple applications we are expected to use during our life-time which are backed by a long-living secret.
As mentioned in the post, all needed tooling proposed in the post is provided in the ISO I release in this project.
I am more than happy to discuss any ideas/concerns around this proposal in the comments (bare with me if I take some time to approve the comments).