2.3. Mad Libs Revisited

2.3.1. A Function to Ease the Creation of Mad Libs

The versions so far of the Mad Lib program have been fairly easy to edit to contain a different mad lib:

  1. Come up with a new mad lib story as a format string
  2. Produce the list of cues to prompt the user with.

The first is a creative process. The second is a pretty mechanical process of looking at the story string and copying out the embedded cues. The first is best left to humans. The second can be turned over to a Python function to do automatically, as many times as we like, with any story - if we write the code once.

Writing the Python code also takes a different sort of creativity! We shall illustrate a creative process. This is a bigger problem than any we have taken on so far. It is hard to illustrate a creative process if the overall problem is too simple.

Try and follow along. Read the sample code and pseudo-code.

There is nothing to try in the Shell or editor until further notice.

If we follow the last version of the mad lib program, we had a loop iterating through the keys in the story, and making a dictionary entry for each key. The main idea we follow here is to use the format string to automatically generate the sequence of keys. Let us plan this unified task as a new function:

def getKeys(formatString):
    '''formatString is a format string with embedded dictionary keys.
    Return a list containing all the keys from the format string.'''
    # more to come

The keys we want are embedded like {animal}. There may be any number of them in the format string. This indeterminacy suggests a loop to extract them. At this point we have only considered for loops. There is no obvious useful sequence to iterate through in the loop. (We are trying to create such a sequence!) The only pattern we have discussed that does not actively process each element of a significant list is a repeat-loop, where we just use the loop to repeat the correct number of times. This will work in this case. (There is a more efficient approach after we introduce While Statements, suggested in Mad Lib While Exercise.)

First: how many times do we want to pull out a key - once for each embedded format. So how do we count those?

The count method is obviously a way to count. However we must count a fixed string, and the whole embedded formats vary, with different keys in the middle. A common part is ‘{‘, and this should not appear in the regular text of the story, so it will serve our purpose:

repetitions = formatString.count('{')
for i in range(repetitions):
    ...

This is certainly the most challenging code to date. Before jumping into writing it all precisely, we can give an overall plan in pseudo-code. For a plan we need an idea of what quantities we are keeping track of, and name them, and outline the sequence of operations with them.

Think about data to name:

In this case we are trying to find a list. We will need to extract one element at a time and add it to the list, so we need a list, say keyList.

The central task is to identifying the individual keys. When we find a key we can call it key.

Think about identifying the text of individual keys. This may be too hard to think of in the abstract, so let us use as a concrete example, and let us keep it simple for the moment. Suppose the data in formatString starts off as follows. The lines with numbers are added to help us refer to the indices. Display of possible data:

#           1111111111222222222233333333
# 01234567890123456789012345678901234567
 'blah {animal} blah blah {food} ...'    # start of formatString

The first key is 'animal' at formatString[6:12]. The next key is 'food' at formatString[25:29]. To identify each key as part of formatString we need not only the variable formatString, but also index variables to locate the start and end of the slices. Obvious names for the indices are start and end. We want to keep them current so the next key slice will always be

key = formatString[start : end]

Let us now put this all in an overall plan. We will have to continuously modify the start and end indices, the key, and the list. We have a basic pattern for accumulating a list, involving initializing it and appending to it. We can organize a plan, partly fleshed out, with a couple of approximations still to be worked out. The parts that are not yet in Python are emphasized:

def getKeys(formatString):
keyList = list()
?? other initializations ??
repetitions = formatString.count('{')
for i in range(repetitions):
find the start and end of the next key
key = formatString[start : end]
keyList.append(key)
return keyList

We can see that the main piece left is to find the start and end indices for each key. The important word is find: the method we consider is find. As with the plan for using count above, the beginnings of keys are identified by the specific string '{'. We can look first at

formatString.find('{')

but that is not the full solution. If we look at our concrete example, the value returned is 5, not 6. How in general would we locate the beginning of the slice we want?

We do not want the position of the ‘{‘, but the position just after the ‘{‘. Since the length of ‘{‘ is 1, the correct position is 5+1 = 6. We can generalize this to

start = formatString.find('{') + 1

OK, what about end? Clearly it is at the '}'. In this example,

formatString.find('}')

gives us 12, exactly the right place for the end of the slice (one place past the actual end).

There is a subtle issue here that will be even more important later: We will keep wanting to find the next brace, and not keep finding the first brace. How do we fix that?

Recall there was an alternate syntax for find, specifying the first place to search! That is what we need. Where should we start? Well, the end must come after the start of the key, our variable start:

start = formatString.find('{') + 1
end = formatString.find('}', start)

Figuring out how to find the first key is important, but we are not home free yet. We need to come up with code that works in a loop for the later keys. This code will not work for the next one. Why?

As written, the search for ‘{‘ will again start from the beginning of the format string, and will find the first key again. So what code will work for the second search? We search for the start of the next key going from the end of the last one:

start = formatString.find('{', end) + 1
end = formatString.find('}', start)

This code will also work for later times through the loop: each time uses the end from the previous time through the loop.

So now what do we do for finding the first key? We could separate the treatment of the first key from all the others, but an easier approach would be to see if we can use the same code that already works for the later repetitions, and initialize variables right to make it work. If we are to find the first key with

start = formatString.find('{', end) + 1

then what do we need? Clearly end needs to have a value. (There will not be a previous loop to give it a value.) What value should we initialize it to? The first search starts from the beginning of the string at index 0. The initialization goes before the loop, so the full code for this function is

def getKeys(formatString):
    '''formatString is a format string with embedded dictionary keys.
    Return a list containing all the keys from the format string.'''

    keyList = list()
    end = 0
    repetitions = formatString.count('{')
    for i in range(repetitions):
        start = formatString.find('{', end) + 1
        end = formatString.find('}', start)
        key = formatString[start : end]
        keyList.append(key)

    return keyList

Look the code over and see that it makes sense. See how we continuously modify start, end, key, and keyList. Since we have coded this new part as a function, it is easy to test without running a whole revised mad lib program. We can just run this function on some test data, like the original story, and see what it does. Run the example program testGetKeys.py:

'''Test the function to extract keys from a format string for a dictionary.'''

def getKeys(formatString):
    '''formatString is a format string with embedded dictionary keys.
    Return a list containing all the keys from the format string.'''

    keyList = list()
    end = 0
    repetitions = formatString.count('{')
    for i in range(repetitions):
        start = formatString.find('{', end) + 1
        end = formatString.find('}', start)
        key = formatString[start : end]
        keyList.append(key)
    return keyList

originalStory = """
Once upon a time, deep in an ancient jungle,
there lived a {animal}.  This {animal}
liked to eat {food}, but the jungle had
very little {food} to offer.  One day, an
explorer found the {animal} and discovered
it liked {food}.  The explorer took the
{animal} back to {city}, where it could
eat as much {food} as it wanted.  However,
the {animal} became homesick, so the
explorer brought it back to the jungle,
leaving a large supply of {food}.

The End
"""

print(getKeys(originalStory))

The functions should behave as advertised.

Look back on the process described to come up with the getKeys function. One way of approaching the creative process of coding this function was provided. There are many other results and approaches possible, but the discussion did illustrate a number of useful ideas which you might adapt to other problems, in different orders and proportions, that are summarized in the next section.

2.3.2. Creative Problem Solving Steps

  • Clearly define the problem. Encapsulating the problem in a function is useful, with inputs as parameters and results returned. Include a complete documentation string, and a clear example (or examples) of what it is to do.
  • If the problem is too complicated to just solve easily, straight away, it is often useful to construct a representative concrete case and write down concrete steps appropriate to this problem.
  • Think of the data in the problem, and give names to the pieces you will need to refer to. Clearly identify the ideas that the names correspond to. When using sequences like lists or strings, you generally need names not only for the whole collection, but also parts like items and characters or substrings, and often indices that locate parts of the collection.
  • Plan the overall approach to the problem using a mixture of Python and suggestive phrases (called pseudo-code). The idea is to refine it to a place where you can fairly easily figure how to replace the phrases with Python.
  • Replace your pseudo-code parts with Python. If you had a concrete example to guide you, you may want to test with one of more further concrete examples with different specific data, to make sure you come up with code for a generalization that works in all cases. This is the process of abstraction.
  • Recognize where something is being repeated over and over, and think how to structure appropriate loops. Can you incorporate any patterns you have seen before?
  • If you need to create a successive modification loop, think of how to approach the first repetition and then how to modify the data for the later times through the loop. Usually you can make the first time through the loop fit the more general pattern needed for the repetitions by making appropriate initializations before the loop.
  • Check and test your code, and correct as necessary.

2.3.3. The Revised Mad Lib Program

There is still an issue for use of getKeys in the mad lib program: the returned list has unwanted repetitions in it. We can easily create a collection without repetitions, how?

One approach is to make a set from the list returned. A neater approach would be to just have the getKeys function return a set in the first place. We need to slightly change to getKeys‘ documentation string and the final return line. This will be included in a new version of the mad lib program, which makes it easy to substitute a new story. We will make the story’s format string be a parameter to the central method, tellStory. We will also put the clearly identified step of filling the dictionary with the user’s picks in a separate function. We will test tellStory with the original story. Note the changes included in madlib2.py and run:

"""
madlib2.py
Interactive display of a mad lib, which is provided as a Python format string,
with all the cues being dictionary formats, in the form {cue}.

In this version, the cues are extracted from the story automatically,
and the user is prompted for the replacements.

Original verison adapted from code of Kirby Urner
"""

def getKeys(formatString):
    '''formatString is a format string with embedded dictionary keys.
    Return a set containing all the keys from the format string.'''

    keyList = list()
    end = 0
    repetitions = formatString.count('{')
    for i in range(repetitions):
        start = formatString.find('{', end) + 1 # pass the '{'
        end = formatString.find('}', start)
        key = formatString[start : end]
        keyList.append(key) # may add duplicates

    return set(keyList) # removes duplicates: no duplicates in a set

def addPick(cue, dictionary): # from madlib.py
    '''Prompt for a user response using the cue string,
    and place the cue-response pair in the dictionary.
    '''
    promptFormat = "Enter a specific example for {name}: "
    prompt = promptFormat.format(name=cue)
    response = input(prompt)
    dictionary[cue] = response                                                             


def getUserPicks(cues):
    '''Loop through the collection of cue keys and get user choices.
    Return the resulting dictionary.
    '''
    userPicks = dict()
    for cue in cues:
        addPick(cue, userPicks)
    return userPicks   

def tellStory(storyFormat):
    '''storyFormat is a string with Python dictionary references embedded,
    in the form {cue}.  Prompt the user for the mad lib substitutions
    and then print the resulting story with the substitutions.
    '''
    cues = getKeys(storyFormat)
    userPicks = getUserPicks(cues)
    story = storyFormat.format(**userPicks)
    print(story)

def main():
    originalStoryFormat = '''
Once upon a time, deep in an ancient jungle,
there lived a {animal}.  This {animal}
liked to eat {food}, but the jungle had
very little {food} to offer.  One day, an
explorer found the {animal} and discovered
it liked {food}.  The explorer took the
{animal} back to {city}, where it could
eat as much {food} as it wanted.  However,
the {animal} became homesick, so the
explorer brought it back to the jungle,
leaving a large supply of {food}.

The End
'''
    tellStory(originalStoryFormat)
    input("Press Enter to end the program.")        


main()

Does the use of well-named functions make it easier to follow this code? We have broken the large overall problem into many smaller steps. Make sure you follow the flow of execution and data and see how all the pieces fit together.

2.3.3.1. Substring Locations Exercise

* Rename the example file locationsStub.py to be locations.py, and complete the function printLocations, to print the index of each location in the string s where target is located. For example,

printLocations('This is a dish', 'is')

would go through the string 'This is a dish' looking for the index of places where 'is' appears, and would print:

2
5
11

Similarly

printLocations('This is a dish', 'h')

would print:

1
13

The program stub already uses the string method count. You will need to add code using the more general form of find.