Security and Cryptography
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsSystem AdministrationSecurity and Cryptography

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old August 30th, 2011, 09:05 AM
nulik nulik is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2011
Posts: 3 nulik User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 31 m 29 sec
Reputation Power: 0
Searchable encryption

Hi,
we have a management application in our office and I was asked to implement client side encryption with the ability to search data. I am not a cryptography expert, so I am asking for help.
The basic idea is to use symmetric encryption, and before the text (usually small, around 1K bytes on average) is sent to the server I will separate it into words (english words, separated by space), and each word will be padded with 0s to 16 bytes. Longer than 16byte words will be split into 16byte blocks and the last one will be padded. Then each word will be encrypted individually with AES 128bit and sent to the server. The server will have a dictionary of encrypted words and will associate the words with the record that has been inserted into the database, i.e. it will build a keyword index. (like MySQL Fulltext search option, or Sphinx)
When the user wants to search the texts for a particular keyword he/she will enter the keyword, the client will encrypt it the same way and send the server to query.
My question is (since I am not an expert on the subject) , is this method somehow bad ? Is there some way to break this encryption? I have been looking in Google and I found a lot of documents with a lot of mathematical formulas , some of them propose creating the index on the client, but I don't understand why don't they implement this method I am describing? It looks so simple.
The obvious disadvantage of the method is that storage for the keyword index can increase in size a lot, but I guess this is the price to pay for security + searcheable data. I don't see any problem with it provided that the disk cost is getting lower and lower every day. Another disadvantage is that you won't be able to use wildcards, you will have to search by exact keyword, but well, at least you can search.

Do you guys have any comments or ideas?
Will be very appreciated.
Regards

Reply With Quote
  #2  
Old August 30th, 2011, 12:29 PM
Scorpions4ever's Avatar
Scorpions4ever Scorpions4ever is offline
Banned ;)
Dev Shed God 9th Plane (9000 - 9499 posts)
 
Join Date: Nov 2001
Location: Woodland Hills, Los Angeles County, California, USA
Posts: 9,387 Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level)Scorpions4ever User rank is General 46th Grade (Above 100000 Reputation Level) 
Time spent in forums: 1 Month 4 Weeks 1 Day 21 h 39 m 3 sec
Reputation Power: 4080
Yes, the method is bad. For one, it is vulnerable to known plain text attack. Also, since you have a dictionary of keywords and there may not be so many of them, it is pretty easy to guess what's going on. Lemme explain: Say that I know your server uses the keyword "SELECT" for a command. When I see the word "LMURTZAXY" going across the network a lot and always at the beginning of the transaction, it is a pretty safe bet that LMURTZAXY = SELECT.

Worse, if I'm a customer of yours and have a copy of your client program, I can send a bunch of keywords to the server and record what the encrypted versions are and build a table of plaintext keywords and their encrypted versions. I could also potentially decompile the client and get my hands on the AES key. After that, decryption becomes easy and I can decode other people's transactions as well.

Therefore, what you need is something where the encryption key changes on a per session basis. (e.g.) currently, my keyword "FOO" encrypts to "BLARGH", on my next session, "FOO" encrypts to "ZRRTXBRE" because the key is different for the next session. Therefore, it becomes impractical to build a dictionary of known plaintexts and their encrypted equivalents.

Luckily, there are already technologies to do this (SSL and TLS), so you don't have to reinvent the wheel. Basically, when the initial connection is established, the protocol makes the client and server use asymmetric encryption (which takes more computing power) to exchange a one time random key for this session. Once the random key is exchanged, both client and server use symmetric encryption (which is less processor intensive) using this random key for the duration of the session.

All you need to do is then use a SSL or TLS library in your code and voila, your transaction is encrypted and you can proudly tell your customers that you use an industry-standard encryption method!
__________________
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne

Last edited by Scorpions4ever : August 30th, 2011 at 12:37 PM.

Reply With Quote
  #3  
Old August 31st, 2011, 07:01 AM
nulik nulik is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2011
Posts: 3 nulik User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 31 m 29 sec
Reputation Power: 0
Quote:
Originally Posted by Scorpions4ever
Yes, the method is bad. For one, it is vulnerable to known plain text attack. Also, since you have a dictionary of keywords and there may not be so many of them, it is pretty easy to guess what's going on. Lemme explain: Say that I know your server uses the keyword "SELECT" for a command. When I see the word "LMURTZAXY" going across the network a lot and always at the beginning of the transaction, it is a pretty safe bet that LMURTZAXY = SELECT.

Thanks for your reply, but I will fight back my position.
See, I agree, it is vulnerable to plain text attack, but I will not send the words of the protocol in encrypted form. I will send only the data. For example, if I am storing email on the untrusted server, I will not encrypt "Subject:" and "Body" field names, I will encrypt only the content of the field, i.e. the value of Subject and the value of Body. This way you will know I am storing an email, but it will be difficult to guess what is the content of the email. You might kind of guess that the Subject may have an abstract of what Body contains, but I don't think it will be easy to decrypt even knowing that.

Quote:
Worse, if I'm a customer of yours and have a copy of your client program, I can send a bunch of keywords to the server and record what the encrypted versions are and build a table of plaintext keywords and their encrypted versions. I could also potentially decompile the client and get my hands on the AES key. After that, decryption becomes easy and I can decode other people's transactions as well.

Well, the source code is open, it is a javascript and I am using "Movable Type Scripts"'s code to encrypt using 128 bits AES. The customer knows what the key is because he enters it, but you will never know it, unless you somehow hack the customer's machine and get it from the browser while the session is still open, because as soon as the window is closed, the variables (which are stored in memory and where the key is) are destroyed.
Quote:
Therefore, what you need is something where the encryption key changes on a per session basis. (e.g.) currently, my keyword "FOO" encrypts to "BLARGH", on my next session, "FOO" encrypts to "ZRRTXBRE" because the key is different for the next session. Therefore, it becomes impractical to build a dictionary of known plaintexts and their encrypted equivalents.

Luckily, there are already technologies to do this (SSL and TLS), so you don't have to reinvent the wheel. Basically, when the initial connection is established, the protocol makes the client and server use asymmetric encryption (which takes more computing power) to exchange a one time random key for this session. Once the random key is exchanged, both client and server use symmetric encryption (which is less processor intensive) using this random key for the duration of the session.

All you need to do is then use a SSL or TLS library in your code and voila, your transaction is encrypted and you can proudly tell your customers that you use an industry-standard encryption method!

Yes, I know SSL is using symmetric encryption, but the problem is, as soon as you begin using SSL this is no longer client side encryption. The admin of the webserver might modify the source code of the OpenSSL (or whatever library is in use) to secretly copy the unencrypted data to another location. This is the main reason I discarded SSL from the beginning.


Regards

Reply With Quote
  #4  
Old August 31st, 2011, 04:11 PM
OmegaZero OmegaZero is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: May 2007
Posts: 737 OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level) 
Time spent in forums: 3 Weeks 4 Days 23 h 23 m 50 sec
Reputation Power: 928
Quote:
Originally Posted by nulik
See, I agree, it is vulnerable to plain text attack, but I will not send the words of the protocol in encrypted form. I will send only the data. For example, if I am storing email on the untrusted server, I will not encrypt "Subject:" and "Body" field names, I will encrypt only the content of the field, i.e. the value of Subject and the value of Body. This way you will know I am storing an email, but it will be difficult to guess what is the content of the email. You might kind of guess that the Subject may have an abstract of what Body contains, but I don't think it will be easy to decrypt even knowing that.

If someone can send data to the user that they will then put into your application, they can break it. If they know what data the user is inputting, they can break it. If they can send data as the user (say with a cross-site scripting vulnerability), they can break it. They can do word-frequency analysis and break it (and on top of that section like "body" will often end with a name and a closing like "Thanks" or "Sincerely"). There is no way this can be secure.

Encrypting each word separately makes your system more vulnerable than using plain ECB encryption which is already considered insecure.

Quote:
Originally Posted by nulik
Yes, I know SSL is using symmetric encryption, but the problem is, as soon as you begin using SSL this is no longer client side encryption. The admin of the webserver might modify the source code of the OpenSSL (or whatever library is in use) to secretly copy the unencrypted data to another location. This is the main reason I discarded SSL from the beginning.

If you can't trust your server than you have much, MUCH bigger problems--that rogue admin could just replace your html & javascript files with ones that don't encrypt at all for instance.


On top of all that, you won't get an effective full-text search anyway. Files containing either "cat" or "cats" should both be returned from a search for "cat", but the encryption will render the word-stem algorithm in the db useless. You'll run into similar problems with capitalization, punctuation, spelling errors, composing characters and so on.
__________________
sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);

Reply With Quote
  #5  
Old August 31st, 2011, 05:16 PM
E-Oreo's Avatar
E-Oreo E-Oreo is offline
Lost in code
Click here for more information.
 
Join Date: Dec 2004
Posts: 7,931 E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)  Folding Points: 945 Folding Title: Novice Folder
Time spent in forums: 2 Months 7 h 48 m 54 sec
Reputation Power: 7053
Quote:
If you can't trust your server than you have much, MUCH bigger problems--that rogue admin could just replace your html & javascript files with ones that don't encrypt at all for instance.

To add to this: even though you're using client side encryption, since the encryption is done in the browser using JavaScript someone with control of the server could still inject JavaScript into your page that steals the password and therefore negates the fact that the encryption is done client side. For this reason, your approach really has no benefits over SSL + server side encryption.

As far as I know, what you're trying to do with searching is not mathematically possible to do securely. I could very well be wrong though, and if so I would be extremely interested in learning how to do this.
__________________
PHP FAQ
How to program a basic, secure login system using PHP

Quote:
Originally Posted by Spad
Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around

Reply With Quote
  #6  
Old August 31st, 2011, 08:58 PM
nulik nulik is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2011
Posts: 3 nulik User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 31 m 29 sec
Reputation Power: 0
Quote:
Originally Posted by E-Oreo
As far as I know, what you're trying to do with searching is not mathematically possible to do securely. I could very well be wrong though, and if so I would be extremely interested in learning how to do this.


well, then check this thread:
google query: searching encrypted data sphinx

basically you have to use SHA256 + HMAC to create hashes on each word. This way you will send the hashes to the "untrusted" server (well it is not really untrusted, it is like you trust Google as a company, but some employee may occasionally do a 'select' on your gmail account) and it will search fine without knowing what data is it.
And you have to use SSL of course, to somehow ensure it is YOUR java script that is being executed. I believe a complete solution will be finally found, but it is very difficult.

Quote:
Originally Posted by OmegaZero
On top of all that, you won't get an effective full-text search anyway. Files containing either "cat" or "cats" should both be returned from a search for "cat", but the encryption will render the word-stem algorithm in the db useless. You'll run into similar problems with capitalization, punctuation, spelling errors, composing characters and so on

It can be (check the reference i posted above) . But it will require a lot of work though on the client side, specially with foreing languages. As for capitalization it is easy, you just need to convert to CAPS before you generate the hash of the word.....

Reply With Quote
  #7  
Old August 31st, 2011, 11:25 PM
E-Oreo's Avatar
E-Oreo E-Oreo is offline
Lost in code
Click here for more information.
 
Join Date: Dec 2004
Posts: 7,931 E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)E-Oreo User rank is General 91st Grade (Above 100000 Reputation Level)  Folding Points: 945 Folding Title: Novice Folder
Time spent in forums: 2 Months 7 h 48 m 54 sec
Reputation Power: 7053
Quote:
well, then check this thread:
google query: searching encrypted data sphinx

The people in that thread told you pretty much exactly the same thing that we did.

Reply With Quote
Reply

Viewing: Dev Shed ForumsSystem AdministrationSecurity and Cryptography > Searchable encryption

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap