When you're looking for a nice password generator there are lots of strong, open-source password generators out there, but the passwords they generate are not very user-friendly. If you want your passwords to be strong, easy to remember, and to be somehow meaningful, none of the options really work.
I'll show you how to make a password generator that gives you just that.
The concept
An overview of how it works:
- Create a dictionary
- Use word-vectors to generate a meaning-aware dictionary
- filter the list to keep only the words of the desired lengths
- pos-tag the words, and group them by tag
- convert adjectives to adverbs and the other way around, for more balance between words types
- Generate a password
- get a random number-verb-adjective-noun or verb-number-adjective-noun combination from the dictionary
- pluralize the words where needed
- add semi-random upper-casing
All of the above would be a bit much to cover in one blog post, so in this tutorial, we'll create a simplified version of meaningful-passwords.
What we'll build
- Create a dictionary
- Use word-vectors to generate a meaning-aware dictionary
- filter the list to keep only the words of the desired lengths
- Generate a password
- combine three random words from the dictionary
The DictionaryBuilder
context.json
First, we create a file context.json that contains the words on which we'll base our dictionary. It should look something like this:
{ "similar": [ "awesome", "helpful" ], "negative": [ "bad", "urine" ], "wordlist": [ "awesome", "dope", "phat", "great", "nice", "pretty", "humble", "friend", "sister", "helping", "helpful", "supportive", "good", "interesting", "beautiful", "rich", "amazing", "happy", "tasteful", "brave", "bravery", "magnificent" ]}
The logic:
Then we create a file called dictionary_builder.py, this is where we will create the DictionaryBuilder class:
import gensim.downloader as apiclass DictionaryBuilder: def __init__(self, num_similar_words=100, vector_type="glove-wiki-gigaword-50"): self.model = api.load(vector_type) self.num_similar_words = num_similar_words
Now we create a function called create_dictionary which will find a list of relevant words:
def create_dictionary(self): with open('context.json', 'r') as file: context = json.load(file) dictionary = {} for word in context['wordlist']: if word in self.model: dictionary = {**dictionary, **dict( self.model.most_similar(positive=[self.model[word]] + context['similar'], negative=context['negative'], topn=self.num_similar_words))} else: print(word, "is not in the word-vector model, skipping") return self.clean(dictionary)
What we just did:
- We load context.json and store the lists in separate variables
- Combine each word in wordlist with the words in similar and negative. This creates an offset in the meaning of the word for which a given amount of similar words is found.
- Store these in an intermediate dictionary.
- Skip words that are not found in the model.
- Filter out some words that we don't want, using clean, we'll implement it soon
create_dictionary uses json, let's import it. Also, we'll add min_length and max_length to the constructor and import re:
import jsonimport reimport gensim.downloader as apiclass DictionaryBuilder: def __init__(self, num_similar_words=100, min_length=4, max_length=10, vector_type="glove-twitter-25"): self.pos_tagged_sets = {"adj": set(), "noun": set(), "verb": set()} self.model = api.load(vector_type) self.num_similar_words = num_similar_words self.min_length = min_length self.max_length = max_length
Now we still have strange characters in our words and we still have very short and very long words. To filter those out we'll create a function called clean:
def clean(self, dictionary): dictionary = dictionary.keys() dictionary = [ word for word in dictionary if self.min_length <= len(word) <= self.max_length and re.match('^[a-zA-Z]*$', word) ] return dictionary
Now let's write the dictionary to a file:
def write_dictionary(self): dictionary = self.create_dictionary() with open('dictionary.json', 'w') as outfile: json.dump(dictionary, outfile) return dictionary
That’s it for the DictionaryBuilder, it can generate dictionaries with words with meanings influenced by context.json, let’s move on to generating passwords.
Generating passwords
We can generate simple passwords from this dictionary like this:
from random import choicetry: with open('dictionary.json', 'r') as file: dictionary = json.load(file)except IOError: print("generating dictionary") builder = DictionaryBuilder() dictionary = builder.write_dictionary()print(choice(dictionary)+'-'+choice(dictionary)+'-'+choice(dictionary))
Putting all that together we get:
import jsonimport refrom random import choiceimport gensim.downloader as apiclass DictionaryBuilder: def __init__(self, num_similar_words=100, min_length=4, max_length=10, vector_type="glove-twitter-25"): self.pos_tagged_sets = {"adj": set(), "noun": set(), "verb": set()} self.model = api.load(vector_type) self.num_similar_words = num_similar_words self.min_length = min_length self.max_length = max_length def write_dictionary(self): dictionary = self.create_dictionary() with open('dictionary.json', 'w') as outfile: json.dump(dictionary, outfile) return dictionary def create_dictionary(self): with open('context.json', 'r') as file: context = json.load(file) dictionary = {} for word in context['wordlist']: if word in self.model: dictionary = {**dictionary, **dict( self.model.most_similar(positive=[self.model[word]] + context['similar'], negative=context['negative'], topn=self.num_similar_words))} else: print(word, "is not in the word-vector model, skipping") return self.clean(dictionary) def clean(self, dictionary): dictionary = dictionary.keys() dictionary = [ word for word in dictionary if self.min_length <= len(word) <= self.max_length and re.match('^[a-zA-Z]*$', word) ] return dictionarytry: with open('dictionary.json', 'r') as file: dictionary = json.load(file)except IOError: print("generating dictionary") builder = DictionaryBuilder() dictionary = builder.write_dictionary()print(choice(dictionary)+'-'+choice(dictionary)+'-'+choice(dictionary))
Now let's install gensim, and run the password generator!
$ pip instal gensim$ python simple_password_generator.pygenerating dictionarycompassion-sincerity-winning
This should take about 30 seconds. After the dictionary has been created, consecutive runs will be faster:
$ python simple_password_generator.pyeveryone-goodnight-heavenly
Now let's try it with a totally different context.json
{ "similar": [ "monkey" ], "negative": [ "engineering" ], "wordlist": [ "flower", "dog", "leopard", "elephant", "jungle", "water", "river", "mountain", "human", "insect", "butterfly", "termite", "ant", "cat", "lion" ]}
Delete dictionary.json and run the generator
$ rm dictionary.json$ python simple_password_generator.pygenerating dictionaryotter-oversized-dixie$ python simple_password_generator.pycali-shepherd-voices
Great! It works, we have meaningful passwords.
Improving the algorithm
The passwords can still be improved in the following ways:
- Create a dictionary
- pos-tag the words, and group them by tag
- convert adjectives to adverbs and the other way around for more balance between the word types
- Generate a password
- get a random number-verb-adjective-noun or verb-number-adjective-noun combination from the dictionary
- pluralize the words where needed
- add semi-random uppercasing
These improvements have been implemented in the open-source project on which this tutorial is based.
You can refer to the source code.
Conclusion
You've learned how to:
- use gensim to load a word-vector model based on Twitter
- use word-vectors to generate a meaning-aware dictionary.
- clean the dictionary to get rid of strange characters, short and long words
- generate passwords based on the dictionary