Code-Breaker
From dandi08
Paul here.
My goal is to use the language tools we have worked with to "break" the simple ciphers we've seen in Python.
An example: In ciphered texts (which pair one character to another with no ambiguation), NLTK can use frequency analysis to identify short, commonly occuring words. If the ciphered word "vb" appears 50 times in a text, it is likely that it unciphers to "in" "on" "to" or "he". Commonly appearing one-letter words are an indication that that letter represents I or A in the original text. Wordnet might be involved past this point, just as a dictionary reference. After the simple steps on short, common words get done, there will be a lot of words that look like _t__a_ that will need to be filled in, and I'm thinking there should be a way to consult wordnet for a list of words that could fill in the blanks.
I'm worried that this project will be very hard or unsuccessful, but want to try anyway. The beginning of this sort of method (frequency analysis and guesses at the common, short words represented) is a classic way to tackle simple ciphers, but it relies on the guess-and-check intuition of a human codebreaker, generally. The logic needed in python will be highly conditional in order to incorporate the different possibilities of a code-break-in-progress. It might get cumbersome.
Anyway, my goal is to take some samples of natural english text, run them through the various simple ciphers we've seen in Python, and see if my code can crack any of them without the decryption info.