Coding a Game of Hangman in Python from Scratch

CS @ Illinois SAIL Logo

Over the weekend, I taught two courses at CS @ Illinois SAIL, an entirely student-run event for prospective computer science students. The first course is an introductory course to the Python programming language. It consisted of two parts: a crash course on Python syntax and basics, and a live coding demo of a game of Hangman. In this post, I will be covering the live coding demo portion of the course and showing the process of coding a Hangman game in Python from scratch.

Getting up to Speed

If you haven’t already done so, download Python 3. I used Python 3.5.2, but other versions of Python 3 should still work. If you are new to Python, head over to the Python 3 Basic tutorial at TutorialsPoint.

In addition, you will need to know these following Python-related concepts.

Documentation

It is good practice to document code that you write, and I will be demonstrating that in this post. The standard documentation syntax is three quotation marks enclosing the documentation, as demonstrated in PEP 257, one of the informational Python Enhancement Proposals that describe a design issue relating to Python.

An example Python module could look like this.

"""Here is the documentation for the module."""

def my_add_func(a, b):
    """Add two numbers together."""
    return a + b

def get_random_number():
    """Get a random number chosen by a dice roll. Guaranteed to be random.

    Source: https://xkcd.com/221/
    """
    return 4

Note that according to PEP 8, the maximum line length is 79 characters in many cases. The code that I will be showing will follow that convention.

Sets

The TutorialsPoint tutorial covered lists, which can contain an arbitrary number of elements in an order and can be iterated on. Sets are a similar concept in that it can contain an arbitrary number of elements and can be iterated on. However, each element in a set is guaranteed to be unique and the elements are not ordered. The following code snippet illustrates this.

my_list = []
my_set = set()

my_list.append('a')
my_set.add('a')
print(my_list)  # Output: ['a']
print(my_set)  # Output: {'a'}

my_list.append('b')
my_set.add('b')
print(my_list)  # Output: ['a', 'b']
print(my_set)  # Output: {'b', 'a'}

my_list.append('a')
my_set.add('a')
print(my_list)  # Output: ['a', 'b', 'a']
print(my_set)  # Output: {'b', 'a'}

The biggest reason why to use sets is to check whether an arbitrary element is contained in a specified set. If I run 'a' in my_list, Python would potentially have to iterate through every single element in my_list to determine whether it contains 'a'. But if I run 'a' in my_set, Python can use its internal hashing mechanism to quickly determine my_set contains 'a'. The performance difference becomes much more noticeable as you add many more elements to the list or set.

List Comprehension

Suppose you are given a list of numbers and you want to add 1 to each element in your list.

my_list = [1, 3, 3, 7]
my_list_plus_one = my_list + 1  # raises TypeError!

You would get this error: TypeError: can only concatenate list (not "int") to list. Python thinks that you want to add a list and an integer together, which doesn’t make any sense.

The correct way to approach this problem is to use list comprehension syntax like so.

my_list_plus_one = [num + 1 for num in my_list]
print(my_list_plus_one)  # Output: [2, 4, 4, 8]

Essentially, you can iterate through my_list using a condensed for-loop syntax. You can also add if and if-else statements like so.

filtered_out_low_nums = [num + 1 for num in my_list if num >= 5]
print(filtered_out_low_nums)  # Output: [8]
all_high_nums = [num + 1 if num >= 5 else 5 for num in my_list]
print(all_high_nums)  # Output: [5, 5, 5, 8]

The enumerate Function

This function is one of my most favorite functions in the Python standard library. If you ever have to iterate through a list while keeping track of the index of each element, enumerate will come in handy.

words = ['all', 'your', 'base']
for idx, word in enumerate(words):
    print(idx, word)
print(list(enumerate(words)))

# Output:
# 0 all
# 1 your
# 2 base
# [(0, 'all'), (1, 'your'), (2, 'base')]

The reason why we cast to list on the last line is because enumerate returns an iterator object instead of a list. The iterator serves to save memory, especially when your input list is large.

Considerations

The next step is to brainstorm any considerations for coding up a game of Hangman. Some things to think about include:

  • Given a list of words, how can I select a word uniformly at random? Can I do so without storing the entire list in memory?
  • How can let the player specify the game’s difficulty?
  • How can I verify that the player has inputted a valid letter?
  • How can I display a word with the unguessed letters censored?
  • How can I keep track of already guessed letters?
  • How can I detect whether the player has won or lost the game?

Using these considerations, take a moment and try to construct an outline or pseudocode. When you’re ready, you can open the spoiler box below.

Click me to reveal the pseudocode!

Ask for number of attempts, make sure it is between 1 and 25, inclusive
Ask for minimum word length, make sure it is between 4 and 16, inclusive [I will explain this later.]
Open the word list file & select a random word
Create a set of remaining letters and initialize it to contain the 26 ASCII lowercase character

While there are attempts remaining OR there are unguessed letters in the word remaining
    Print the word with the unguessed letters censored
    Ask for the next letter and make it lowercase
    If the "letter" has multiple characters
        Notify the player that the "letter" has multiple characters
    Else if the letter is not an ASCII lowercase character
        Notify the player that the letter is not an ASCII lowercase character
    Else if the letter is not in the remaining letter set (i.e. has been guessed before)
        Notify the player that the letter has been guessed before
    Else
        If letter is in the word
            Notify the player that the letter is in the word
        Else
            Decrement attempt counter
            Notify the player that the letter is not in the word
        Remove guessed letter from the remaining letter set

Reveal the word
If the word is solved
    Notify the player of victory
Else
    Notify the player of defeat

Give the player the option to try again

Selecting a Word Uniformly at Random

The first step is to download a list of words from here. If they aren’t available at that link, you can download it from my GitHub repository. Then, put the word list in a directory that you will be using for this Hangman game, and call it wordlist.txt.

Go ahead and inspect the list of words. You’ll notice they are all in lowercase and some words, such as “a-ok”, have hyphens. And other words have parentheses, such as “all(a)” and “adrift(p)”. Words that have hyphens are perfectly fine–we will handle displaying these types of words later on. However, it seems like the words with parentheses have part-of-speech metadata attached and are duplicates, anyway. Therefore, we will exclude those words.

Let’s create a file called words.py:

"""Function to fetch words."""

import random

WORDLIST = 'wordlist.txt'

def get_random_word(min_word_length):
    """Get a random word from the wordlist using no extra memory."""
    pass

Next, we will fill in the get_random_word function. It takes in a minimum word length and returns a word selected uniformly at random that fits that criteria, excluding words with parentheses.

So let’s first try a simple approach: gather all of the words into a list, and run random.choice to select one word from that list uniformly at random.

def get_random_word(min_word_length):
    """Get a random word from the wordlist using no extra memory."""
    words = []
    with open(WORDLIST, 'r') as f:
        for word in f:
            if '(' or ')' in word:
                continue  # Skip the word because it contains parentheses.
            word = word.strip().lower()
            if len(word) < min_word_length:
                continue  # Skip the word because it is too short.
            words.append(word)
    return random.choice(words)

Since our list of words contains only about 69000 words (a small number in this context), this function would run perfectly fine, assuming that you have a few gigabytes of RAM. However, suppose the word list contains millions of valid words, or you have extremely limited RAM on your machine. There would be good possibility that you cannot fit the entire list of words in memory, and you would get an out-of-memory error. Therefore, you can only read the word list file one line at a time and only use one variable: the word that you will return.

The solution to this problem is reservoir sampling. Instead of gathering all of the words from the word list in the memory, we will iterate through the word list and keep count of the number of words we processed. For each word encountered, we select it with a probability that is inversely related to the number of words processed. The code looks something like this.

def get_random_word(min_word_length):
    """Get a random word from the wordlist using no extra memory."""
    num_words_processed = 0
    curr_word = None
    with open(WORDLIST, 'r') as f:
        for word in f:
            if '(' in word or ')' in word:
                continue
            word = word.strip().lower()
            if len(word) < min_word_length:
                continue
            num_words_processed += 1
            if random.randint(1, num_words_processed) == 1:
                curr_word = word
    return curr_word

This is much more memory efficient!

Game Difficulty

For the next step, we create a file called hangman.py. This will be our main file.

"""Main hangman game.

Use Python 3.
"""

from string import ascii_lowercase
from words import get_random_word

We will need the two imports later. And note that it is especially important to be using Python 3 rather than Python 2. The main reason is because of the difference in the print syntax. In Python 3, it’s a function; but in Python 2, it’s a statement, like return.

We will let the player specify difficulty in two ways.

  • By specifying the number of incorrect attempts he/she is allowed. The less attempts there are, the harder the difficulty.
  • By specifying the minimum word length. The longer the minimum word length is, the harder the difficulty.

We can reasonably allow the player between 1 and 25 incorrect attempts, inclusive. The lower bound of 1 means that the number of incorrect attempts must be a positive integer, and the upper bound of 25 is selected because there are 26 letters in the alphabet.

However, selecting a lower and upper bound for the minimum word length is harder. To help determine which bounds are reasonable, I constructed a histogram of the word lengths in the word list.

Word length histogram, ranging from 1 to 30.

The word lengths range from 1 to 30, inclusive. It looks like the bulk of the words contain between 4 and 16 characters, inclusive. Therefore, for this demonstration, I will set the minimum word length bounds to 4 and 16.

We need to consider two invalid cases when getting these inputs.

  • The player’s input is not an integer. We can wrap a try-except block around the code to cast the string received from the input to an integer. If the casting operation raises a ValueError, it means that the string cannot be converted to an integer, and we notify the player and ask for input again.
  • The player’s input is an integer, but is not within the bounds. We can simply notify the player and ask for input again.

Here is the code for get_num_attempts and get_min_word_length.

def get_num_attempts():
    """Get user-inputted number of incorrect attempts for the game."""
    while True:
        num_attempts = input(
            'How many incorrect attempts do you want? [1-25] ')
        try:
            num_attempts = int(num_attempts)
            if 1 <= num_attempts <= 25:
                return num_attempts
            else:
                print('{0} is not between 1 and 25'.format(num_attempts))
        except ValueError:
            print('{0} is not an integer between 1 and 25'.format(
                num_attempts))

def get_min_word_length():
    """Get user-inputted minimum word length for the game."""
    while True:
        min_word_length = input(
            'What minimum word length do you want? [4-16] ')
        try:
            min_word_length = int(min_word_length)
            if 4 <= min_word_length <= 16:
                return min_word_length
            else:
                print('{0} is not between 4 and 16'.format(min_word_length))
        except ValueError:
            print('{0} is not an integer between 4 and 16'.format(
                min_word_length))

Normally, having an infinite loop is bad practice, but we can make an exception here because we have a straightforward way of breaking out of that loop by returning an integer from the function.

Displaying the Word

The next problem is to display the word while censoring letters that have not been guessed before. Now we have to make a design decision. Ideally, we shouldn’t modify the word that the player has to guess, nor should we have to needlessly copy the word just to censor certain letters.

I chose to represent censored letters as a list of booleans that has the same length as the word. A True in the list means that the corresponding character in the word should be displayed, else if False the corresponding character should be censored.

Suppose the word that the player must guess is “hangman”. If the player has already guessed ‘a’, ’m’, and ‘n’, the boolean list would be [False, True, True, False, True, True, True] and the displayed word would look like *an*man.

def get_display_word(word, idxs):
    """Get the word suitable for display."""
    if len(word) != len(idxs):
        raise ValueError('Word length and indices length are not the same')
    displayed_word = ''.join(
        [letter if idxs[i] else '*' for i, letter in enumerate(word)])
    return displayed_word.strip()

We employ both offensive programming and defensive programming strategies here. If the word and idxs, the list of booleans, do not have the same length, we immediately throw an error rather than gracefully handle this inconsistency. We want to do that because we should try to detect these type of inconsistencies as soon as possible in order to make debugging the program easier. This is an example of offensive programming.

At the end of the function before we return the word for display, we run the str.strip method on displayed_word in case there exists some leading or training whitespace characters. This type of error is less fatal, so if we happen to encounter it, we can gracefully correct it without much consequence. That is an example of defensive programming.

Also, we see the first and only appearance of the enumerate function. This is because we must keep track of the letter and the index (to find the corresponding boolean value in the boolean list) as we iterate through word. After the list comprehension expression is complete, we use the str.join method to convert that list of characters to a string. The syntax looks strange, but it is the Pythonic way of programming.

Asking the Player to Input the Next Letter

Now we arrive at the core of Hangman: guessing letters. Remember that we have three conditions that this input must satisfy.

  • The input “letter” must be a single character. We could take only the first character of a multi-character input, but according to the principles of offensive programming, we should instead notify the user and ask again.
  • The input letter must be an ASCII lowercase character. Note that we have already run the str.lower method on the input, so if the player inputted a single uppercase character, it would be converted to the corresponding lowercase character. The principles of defensive programming recommend correcting trivial errors like these.
  • The input letter cannot have been guessed before. We will keep track of that using a set of remaining letters.

The get_next_letter function will take a set of remaining letters, remaining_letters, and repeatedly ask for input until that input satisfies the above three conditions. Then, it removes the inputted letter from the remaining_letters set and returns it.

def get_next_letter(remaining_letters):
    """Get the user-inputted next letter."""
    if len(remaining_letters) == 0:
        raise ValueError('There are no remaining letters')
    while True:
        next_letter = input('Choose the next letter: ').lower()
        if len(next_letter) != 1:
            print('{0} is not a single character'.format(next_letter))
        elif next_letter not in ascii_lowercase:
            print('{0} is not a letter'.format(next_letter))
        elif next_letter not in remaining_letters:
            print('{0} has been guessed before'.format(next_letter))
        else:
            remaining_letters.remove(next_letter)
            return next_letter

Stitching It All Together

Finally, we arrive at the most important part: combining the functions we’ve coded to make a functional game of Hangman. We start our play_hangman function by letting the player specify difficulties.

def play_hangman():
    """Play a game of hangman.

    At the end of the game, returns if the player wants to retry.
    """
    # Let player specify difficulty
    print('Starting a game of Hangman...')
    attempts_remaining = get_num_attempts()
    min_word_length = get_min_word_length()

    # Randomly select a word
    print('Selecting a word...')
    word = get_random_word(min_word_length)
    print()

Then, we initialize the game state variables.

# Initialize game state variables
idxs = [letter not in ascii_lowercase for letter in word]
remaining_letters = set(ascii_lowercase)
wrong_letters = []
word_solved = False

Take a look at how idxs is initialized. Recall that the get_display_word function takes in the word and a list of booleans, and that some words contain hyphens (and perhaps other punctuation marks). The list comprehension expression to initialize idxs will set booleans corresponding to ASCII lowercase letters to True and punctuation marks to False. We also initialize the set of remaining letters (remaining_letters), a list of wrong guesses (wrong_letters), and a flag that is updated after every attempt (word_solved).

Next comes the main game loop. Recall that our pseudocode for that looks like this.

While there are attempts remaining OR there are unguessed letters in the word remaining
    Print the word with the unguessed letters censored
    Ask for the next letter and make it lowercase
    If the "letter" has multiple characters
        Notify the player that the "letter" has multiple characters
    Else if the letter is not an ASCII lowercase character
        Notify the player that the letter is not an ASCII lowercase character
    Else if the letter is not in the remaining letter set (i.e. has been guessed before)
        Notify the player that the letter has been guessed before
    Else
        If letter is in the word
            Notify the player that the letter is in the word
        Else
            Decrement attempt counter
            Notify the player that the letter is not in the word
        Remove guessed letter from the remaining letter set

So the main game loop looks like:

# Main game loop
while attempts_remaining > 0 and not word_solved:
    # Print current game state
    print('Word: {0}'.format(get_display_word(word, idxs)))
    print('Attempts Remaining: {0}'.format(attempts_remaining))
    print('Previous Guesses: {0}'.format(' '.join(wrong_letters)))

    # Get player's next letter guess
    next_letter = get_next_letter(remaining_letters)

    # Check if letter guess is in word
    if next_letter in word:
        # Guessed correctly
        print('{0} is in the word!'.format(next_letter))

        # Reveal matching letters
        for i in range(len(word)):
            if word[i] == next_letter:
                idxs[i] = True
    else:
        # Guessed incorrectly
        print('{0} is NOT in the word!'.format(next_letter))

        # Decrement num of attempts left and append guess to wrong guesses
        attempts_remaining -= 1
        wrong_letters.append(next_letter)

    # Check if word is completely solved
    if False not in idxs:
        word_solved = True
    print()

Finally, when the game is over, we have to reveal the word and notify the player of victory or defeat. Then, we have to offer the player a chance to try again. We will do that by asking for input from player. If that input is 'y' or 'Y', then we return True indicating that the game should be restarted, else False.

# The game is over: reveal the word
print('The word is {0}'.format(word))

# Notify player of victory or defeat
if word_solved:
    print('Congratulations! You won!')
else:
    print('Try again next time!')

# Ask player if he/she wants to try again
try_again = input('Would you like to try again? [y/Y] ')
return try_again.lower() == 'y'

The Completed Product

At the end of hangman.py, we write the “main” function.

if __name__ == '__main__':
    while play_hangman():
        print()

Now run the completed game by typing python3 hangman.py in the terminal.

Gameplay of Our Hangman Game
The finished product in action!

If everything was done correctly, you should be able to play a functional game of Hangman in the terminal. You can compare your code with the completed code, which is available on GitHub.

I hope you all enjoyed this tutorial post. I will be writing another one on the other SAIL class that I taught–Building a Website with Python & Flask– so stay tuned!