November 9th, 2011, 09:20 AM
Barcode Encryption of Identifiers: is it possible?
Dear Sirs et Madames,
I am trying to create a health application of a rather sensitive nature which will require some form of cryptography/obfuscation. There is a health study in which once a year, known individuals with permanent and recognisable identifier numbers (eg KIG0005001 as an individuals identifier) walk into the clinic, are identified, have their blood tested as part of a study. Next year, the same happens again, as this is a longitudinal study. Now the results of the blood test should NOT be able to be traceable to an actual individual (HIV status, etc are highly sensitive bits of information that should not be linkable with actual individuals due to their right to privacy), but it is IMPERATIVE that we can identify year on year which blood samples belong to one unique individual (without knowing WHO the individual actually is, the emphasis is on the blood samples being traceable to one individual, not the individual).
My idea (and here is where am asking for your expertise in cryptography and obfuscation) is that when the individual visits the clinic they come with an identifying card with their regular id number KIG0005001 . This number is entered into a system where via an algorithm/encryption it spits out a barcode (based on the original id KIG0005001 , therefore any future visits should produce the SAME barcode for a particular individual) which can be printed out as stickers. These barcode stickers are the ones to be used to identify the samples (stick em on the samples). The stickers should have the following information in them: unique identifier (via barcode?), the round number that the sample was taken (samples will be taken once a year, so year 1= round 1) and date sample taken.
Is this possible? What are the alternatives? How/What should I do in terms of transforming KIG0005001 into an encrypted barcode which is repeatable year on year (so blood sample can always be traced back to the same source). Am programming in Java.
Thanks in advance,
November 9th, 2011, 02:31 PM
Some thoughts about the problem:
1) To begin at the end, will the adhesive labels go onto glass vials? If so, how large? I'm thinking about barcode options. A 2D barcode holds lots of data in a small area, but in order for it to work, you must have a suitable scanner available (the easy part) and the scanner must be able to work with the label geometry -- maybe not feasible, if the labels will be applied to cylindrical containers.
I'll assume that you have the most difficult case -- labels applied to small vials.
2) Maintaining secrecy isn't very easy, especially if someone has a "guess" about a subject's identity: in other words, they know a person's ID code, and want to test whether the lab sample matches that. You can protect against this using a secret, but it is a secret that must be stored or remembered permanently, and if the secret is discovered, then a subject's identity could be revealed by the guessing attack.
Let's suppose that the secret is a phrase (sequence of characters). The simplest way to handle this is to use the same phrase for every subject; if you do it that way, you could "hard code" the secret into the software that makes the labels, or alternatively require the person labelling the samples to type the phrase when running the software (but if the operator makes a typing mistake, it would become impossible to connect the label with the person, spoiling the study data; there are ways to protect against that).
3) By way of example, suppose the secret is "Rmv04kh8a". You can simply concatenate this with the person's ID, so the result would be "KIG0005001Rmv04kh8a".
4) Compute the hash of the concatenated phrase using a cryptographic hash function. If you use a 1D barcode (see above), then you must also limit the size (number of bits) in the hash -- otherwise, the label will be impractically long. So in the case of 1D barcodes, I would recommend MD5. Although MD5 has long been considered insecure, it is still very difficult to invert (that is, to discover what the input was). The output of the hash function may be binary (a sequence of raw bytes), or a string in hexadecimal. For your purposes, binary is preferable -- if you get the hash in hex, you will want to convert it to binary.
Very important: this procedure is case sensitive! The person's ID, and the secret phrase, must be entered in EXACTLY the same combination of upper & lower case characters, or else the barcodes from different rounds won't match.
5) Convert the binary hash to ASCII using the mime base64 encoding. Why insist on binary, and then convert to ASCII when you could simply use the ASCII hex result from the hash function? Because the base64 encoding of the binary hash will be shorter. For MD5, you will end up with about 24 ASCII characters.
6) Print the base64 encoded hash as a barcode symbol. I suggest using the "Code 128" barcode symbology: if you have to use a 1D barcode, this will give you a relatively short (but still quite lengthy) barcode. If the resulting symbol is just too long for your application, you will have to be a little creative; for example, you could break the base64 code into two parts, and print two Code 128 symbols that run parallel to each other.
I don't do much with Java, but I think there are likely packages or free sources you can find to compute the hash function (MD5, e.g.), the base64 encoding, and probably printing the barcode symbol.
If you CAN safely use a 2D barcode like DataMatrix, then the length of your data isn't much of an issue -- you could skip step 5, and simply make the symbol from the ASCII hex output of the hash function. Also, you could use a more modern hash function like SHA-1 or SHA-256.
Whatever solution you choose, please think about the following:
A. Test your solution very thoroughly, using real vials (or whatever physical container will be used for the samples).
B. I hope that the scheme above provides a decent level of security. It is not bulletproof. You can ask the medical people involved what level of security is required.
C. If the secret is lost/forgotten, the longitudinal study is ruined.
D. If you don't use a secret, then it will become possible to identify study participants from the sample label by the "guessing attack."
E. If you use a secret, and the secret is discovered, then the same guessing attack can be used.
January 29th, 2013, 09:46 PM
To insert barcodes using Java, you may refer to the java barcode tutorial on tarcode.com