December 5th, 2012, 09:15 AM
Backup of encrypted Data in the Cloud
The cloud does offer a great opportunity for storing backups, but from my point of view they should be encrypted locally before uploading.
Because it's recommended to encrypt your data using an always different key and also provide a MAC alongside, I am trying to achieve this by using the following scheme:
1) Generate a unique IV based on the SHA value of the plaintext and the Unix timestamp
t_H = H(P)
IV = H(timestamp || H(timestamp || t_H))
2) Calculate a keyed HMAC from IV and a given keyword
HMAC_k = H(keyword || H(keyword || IV))
3) Seed the keyed HMAC into the KSA of a stream cipher
4) Derive the encryption key and the key for the MAC out of the keystream
Enc_Key = next 64 keystream byte
MAC_Key = next 64 keystream byte
5) Encrypt the plaintext using the encryption key and concatenate the IV
C = IV || E(Enc_Key, P)
6) Calculate the MAC of the ciphertext using the MAC_Key
t_H = H(C)
MAC = H(MAC_Key || H(MAC_Key || t_H))
7) Upload the ciphertext and a separate file containing the MAC into the cloud
On the event of restoring the data after the download of ciphertext and MAC file:
1) Extract the IV from the ciphertext and derive the two keys
HMAC_k = H(keyword || H(keyword || IV))
Enc_Key = next 64 Keystream Byte
MAC_Key = next 64 Keystream Byte
2) Compare the calculated MAC' against the downloaded MAC
t_H = H(C)
MAC' = H(MAC_Key || H(MAC_Key || t_H))
MAC == MAC' ?
3) If both values are equal proceed the decryption after truncating the IV from the ciphertext
P = E(Enc_Key, C)
The one point I am unsure about is that, due to the fact of including the Unix timestamp, the ciphertext and MAC will be always different each time I encrypt the exact same plaintext using the exact same keyword. Does that could become a problem because it might reveal to much information about the used keyword?
Also you may wonder why I build the IV from the plaintext hash instead using random values drawn from /dev/random for example.
The reason for that is simply because I like to generate a absolutely unique IV and therefore using the two sources of real permutation available on every computer - the unique binary data of the plaintext and the timestamp.
Perhaps the timestamp can be dropped and the secure hash of the plaintext would be sufficiently unique, even if this mean that encrypting the exact same plaintext will always become the exact same ciphertext using the exact same keyword.
Any thoughts, hints, remarks and opinions are greatly appreciated.
Just in order to avoid unnecessary comments about better using GPG, etcetera ...
I am aware how to handle backup and restore with pgp and keyword, even on the fly ...
# simple Backup of folders using GnuPG
tar -cjf - ./folder-to-save | gpg -c | lftp -u ftp-username,ftp-passwd ftp://ftp.server.com -e "put /dev/stdin -o saved-folder.tbz.gpg; quit"
# simple Restore of folders using GnuPG
lftp -u ftp-username,ftp-passwd ftp://ftp.server.com -e "cat saved-folder.tbz.gpg > /dev/stdout; quit" | gpg -d | tar -xjf -
..., but I am mostly interested if the use of the Unix timestamp in the originally mentioned scheme for generating a file dependent IV (or nonce) would be useful or can securely be omitted.
Maybe someone can point me somewhere to read about that or can assure me that incuding the Unix timestamp doesn't hurt - or better perhaps that it will be perfectly sufficient and secure just to use the secure hash of each different plaintext for generating such IV (nonce).
Last edited by Karl-Uwe Frank; December 5th, 2012 at 01:52 PM.
December 6th, 2012, 12:59 AM
Use a random value for your IV. The method you're using does not guarantee an absolutely unique IV, and definitely does not guarantee more uniqueness than a completely random value. The number of bits in the IV is the same either way, you can't reduce the probability of repeating an IV more than using a cryptographically secure random number generator. Reusing an IV is bad from a security perspective.
December 6th, 2012, 11:24 AM
Perhaps you can tell me how to keep track of used IV in order to make sure never to re-use them?
Originally Posted by E-Oreo
And perhaps you know how to find good random values without a CSPRNG like /dev/random?
Clearly the SHA512 hash of the always and ever absolute unique binary file should not become a problem of IV re-use and therefore we do not need to keep tack of used IV nor de we need to worry about new once. Of course there might be a minimal chance of having a collision, but by integrating the timestamp when the encryption starts the chances of such collision get minimised even more.
December 9th, 2012, 02:07 PM
Howto calculate the IV from the plaintext
Below I describe an extended and changed version of the before-mentioned scheme. Now the hash value of the plaintext is used as a source of additional entropy for the underlying stream cipher which act as a CSPRNG in order to generate the IV and all further needed keys.
I propose the following scheme:
First of all the secure hash of the plaintext and the current time will still be our main source of moderate uniqueness, but we will now using a matrix lookup table to generate pseudo-randomness for creating the IV.
Also the re-initialising of the underlying stream cipher will happen three times. All this is to ensure that any correlation between plaintext hash, encryption key, signing key and master key get broken up as much as possible and keep them in a good distance.
Furthermore any attempt of re-calculating the IV will now need knowledge of the secret master key as well.
The whole part for generating the IV may look like some kind of overkill, but it should be considered that the only source of additional entropy is the secure hash of the plaintext. Of course the timestamp is involved also, but in the unlikely event that the clock does not move correctly forward only the plaintext hash will guarantee a different IV for each file encryption.
Please keep in mind that this scheme derives the IV not from an external source of cryptographically secure randomness but mainly from the plaintext itself as source. Therefore it need to do a lot of work in breaking up the relation between plaintext and IV. The use of the matrix lookup table is of course a very conservative ways having an additional layer of security involved. But I prefer it as precaution just for the perhaps unlikely event that any attack on the recommended SHA512 might be found which could put the whole scheme at risk somehow. Of course it might also be sufficient just XORing the plaintext hash (P_hash) with the CSPRNG keystrem output or simply the (Lookup_key), or only use the (Lookup_key) directly and then concatenate the timestamp, instead of all steps at 3), 4) and 5). But I prefer the below described more extended way of creating the needed pseudo-randomness. XORing the plaintext hash and the matrix lookup table result has the effect of masking both, hiding the plaintext hash and the keystream, it might be seen somehow as a encryption of the plaintext hash. Also it prevent revealing information about any correlation between (lookup_key) and internal state of the matrix lookup table.
1) no external source of cryptographically secure randomness is available
2) the only source of uncontrolled[*] external information is the (system) time
3) no stored keys should be use, only those which are memorable by the user
4) the encryption algorithm should be implemented with a simple to understand source code and has to be build without the use of external function library's
5) the encryption environment has no memory of its previous internal state after a loss of power
[*] uncontrolled and independent by algorithm and user in it's behaviour and appearance
The algorithm description:
A) Generation of the IV
1) Calculate a cryptographically secure hash of the plaintext (P_hash)
P_hash = SHA512(plaintext)
2) Build a first HMAC value from the secret memorable master keyword (master_key) and the plaintext hash (P_hash) as the (Lookup_key)
Lookup_key = HMAC(master_key, P_hash)
3) Initialise the stream cipher with the (Lookup_key) and fill a matrix lookup table out of the keystream the size of 256 rows where each row has 256 byte
4) Calculate a 32 byte hex value by using the byte values of the (Lookup_key) as index pointer into the matrix lookup table, where two consecutive byte values indicate row and column of the lookup table cell
Hex_1 = Matrix_Table_Lookup(Lookup_key)
5) Flush the matrix lookup table, refill it with fresh values and perform a second run with (Lookup_key) by calculating a second 32 byte hex value as described under 4).
Hex_2 = Matrix_Table_Lookup(Lookup_key)
6) Concatenate both byte values from 4) and 5) into one 64 byte hex value and XOR the plaintext hash (P_hash) and the matrix table lookup result, finally concatenate the current timestamp (1 second precision)
IV = XOR(P_hash, (Hex_1 || Hex_2)) || timestamp
7) Drop the stream ciphers initial state and the matrix lookup table
B) Encryption/Decryption of the plaintext
8) Use the secret master key (master_key) and the IV to build (IV_key), initialise the stream cipher with (IV_key) and calculate the new intermediate encryption key and the intermediate MAC signing key
IV_key = HMAC(master_key, IV)
Enc_key = 64 byte keystream
MAC_key = 64 byte keystream
9) Drop the stream ciphers initial state again and re-initialise it using the intermediate encryption key (Enc_key)
10) Encrypt the plaintext
C = Enc(Enc_key, P)
11) Calculate the MAC of ciphertext and IV
MAC = HMAC(MAC_key, (IV || C))
C) Characteristics of the encrypted file
1) The file name should not reveal information of the plaintext data type or the original file name
2) The plaintext could optionally be compressed or several files packed into one compressed file before encryption
3) The file should not preserve the position of plaintext in order to prevent attacks on equal structures between plaintext and ciphertext
4) The length might vary (achieved by compressing the plaintext) from that of the plaintext to prevent brute-fore-attacks based on the congruence of length
December 9th, 2012, 04:15 PM
Some additional remarks on the scheme of deriving the IV from the plaintext
This new extended algorithm of deriving the IV from the plaintext does not destroy the underlying permutation of the cryptographically secure hash function, but only transpose it into another one, away from the basic plaintext hash value into a secret user keyword depended permutation. If also the system clock works correctly the involved timestamp will just stretch the permutation range of possible unique IVs.
This effectively means, that the same binary plaintext file will produce different IVs using different secret user keywords with no external source of cryptographically randomness involved - even if the timestamp would not be included, it will still lead into a reliable IV permutation.
What are the advantages of this scheme?
1) it is building a self-sustained secure circle
2) it does not rely on any uncontrollable external source of randomness
3) it is based on the infinite permutation of binary files
4) the only preconditions are:
...a) a secure hash function
...b) a stream cipher algorithm
5) it's security rely only on the secrecy of the user key
December 10th, 2012, 09:57 AM
This offer might be considered also