Posts: 273 Location: Behind a sphere Joined: 27.08.05 Rank: Active User
Posted on 17-06-11 22:05
Lulzsec has released a large password list, and previous such lists already exist.
I was wondering whether any scientific research has been conducted on this data. For example, I am thinking of turning the data into an n-gram model. An n-gram is a statistical model of string occurrences. A unigram model is appearance of a single word or character. Bigram is for sequences of characters or words. And so on until n-gram. After tallying up results, you can "smooth" the counts to give better estimations of the actual data in the world (there are different ways to do this).
I am not exactly sure how the info would be used, but it could facilitate password guessing.
Furthermore, there are machine learning models which can be used to extract patterns in raw data (called Boltzmann Machines). I was wondering whether any scientific statistical ideas have been applied to speed up password guessing. As I learn more about these models, I will try to apply them to the password data out there.
Wisdom spared is wisdom squared.
Hellbound Hackers is the collective work of the staff and the community and is therefore licensed under the CC BY-NC-SA license.