March 21st, 2011, 06:17 AM
Where to find spoken languages statistics?
I'm doing a hobby project, decrypting ciphers. One powerful method is to use frequency analysis of letters, letter-combinations and words.
There is a lot information to find on the internet about the English language, but other languages seems harder to find (other than single letter frequencies).
I guess there must be research done in this area, but I haven't been able to find any. Writing your own algorithm, extracting the information, isn't a hard task, but finding the right composition of texts to analyze probably requires a language specialist.
Does anyone know if there is any open publications or other resources on this subject?
September 4th, 2011, 01:44 PM
September 7th, 2011, 09:22 AM
Too bad there aren't more languages listed..
September 8th, 2011, 06:40 PM
You know what, you ought to read the original document that talks about using frequency analysis to crack ciphers. It was written by an ancient Iraqi chap named Al-Kindi. In his very illuminating document, he says (translating it to modern equivalent from memory here):
"First, find a book that is in the same language of the cipher and count the frequency of the letters from a few pages. Then count the frequency of the letters in the cipher text....".
So, there's your solution in the very first sentence of his paper. You could probably go quite far by getting your hands on a novel, online newspaper, magazine etc. in another language and building your own frequency tables.
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne
"I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
October 21st, 2011, 04:26 PM