Language Identifier
From dandi08
[edit] Description
General frequency distributions tell us that the most frequently used words are functions words. By programming the function words of several languages into a Python library, it would be possible to then program Python to read a text file, find the most frequently used words, match them with a set in the library, and then tell the user in which language the text file is written.
[edit] Project Scale
This would only work with Romantic languages, but those are certainly in abundance. The Python program itself would be manageable in size and not too overwhelming of an undertaking. A good deal of the work would be linguistic research - identifying the function words, and general high-frequency words of other languages.
This project can be scaled. The size of the final product is directly related to how many languages we decide we want to research and put into the Python library. The beauty of this is two-fold: We can focus on the quality of the final product rather than tackling an massive project and struggling to finish on time. Or, if the process if moving efficiently, there will always be more languages to research and add!
[edit] Those Interested
Edit this page and add your name here if you're interested. Feedback and suggestions are also welcome.
Sincerely, Anthony