Lab 4 Strings

Task 1, String and Sequence Indexing

This task relates to sections 4.1 - 4.3 in Zelle. You should have read it!

a. Since strings are sequences they can be indexed by integers. Try the following in the shell.

s = “Mississippi”

b. Now, suppose we wanted to know the value of the character at location 7 (i.e. the 8th character). Try the following and record your answers. Raise your hand if there are any you don’t understand. Remember negative induces count from the right.

print s[7]
print s[-4]

c. We can also try out the string “slicing”. Guess what these mean and then check:

print s[0:3]
print s[3:]
print s[5:8]
print s[4:-3]
print s[1:]
print s[1:50]

d. And we can try concatenating two strings (do it):

t = “The State of ”
print t + s

e. Also "multiplying": What do you expect from the following? (Then try them.) As in arithmetic, there can be several operations in an expression, and as in arithmetic, '*' operations have higher precedence than '+' operations.

"!" * 17
"no " * 4 + "yes " * 3
("no" * 4 + "yes") * 3

f. Accessing elements by index or a slice and combining sequences by concatenating or multiplying, all work for sequences in general, not just strings. Predict and then test what happens in the shell, one line at a time:

s = [1, 3, 7, 9, 11]
s[3]
s[1:-2]
s[3:4] + s[1:-2]
s[1:3] * 4
s[-2:]

Task 2, The string Module:

The remaining tasks relate to Zelle from section 4 onward in Chapter 4.

Let’s look at some useful string module functions.

upper(s)
lower(s)
Produces an upper or lower case version of s. Predict the results and check in the Python shell:

import string
greeting = "Hello!"
string.upper(greeting)
string.lower(greeting)

find(s, sub)
s is a string. sub is another string. The function returns the index of the first occurrence of sub in s. (We assume “first” means “first from left”.) The location of a substring is the location (nonnegative index of) its first character. If sub does not occur in s, the result is -1. In this case -1 is called a sentinel value – a value we can recognize as having a special meaning.

The rfind function works just like find but searches s from right to left, so it will find the last occurrence of sub in s. (Note, sub is not reversed.)

What will be the resulting value? Check after you decide.

string.find("semester", "e")
string.find("semester", "me")
string.rfind("semester", "e")
string.find("semester", "se")

count(s, sub)
In the count function s and sub have the same meaning as in find. The difference is that count returns the number of occurrences of sub in s.

What will the resulting value be? Check.

string.count("semester", "me")
string.count("semester", "e")
string.count("mississippi", "si")

lstrip(s)
By default lstrip removes leading white space from s (blanks, newlines, and tabs). ‘\n’ is a code for a new line, ‘\t’ for a tab.

Try the following in the shell. (You should already have imported string.)

s = “\n \n\This is a test of the string features”
s
print s
s = string.lstrip(s)
s

strip(s)
rstrip(s)
Same idea a lstrip, except strip removes white space from both the left and right end, and rstrip only removes it from the right end.

split(s)
split(s, sep)
s is the string to split. The result is a list of parts of s. In the first version the breaks come at any sequence of whitespace, except that whitespace at the very beginning or the end of s is ignored(as if the parameter s were replaced by string.strip(s)). In the second version the string is split at each occurrence of the string sep. This allows for empty strings in the list.

What type of value is acadRegalia? What is the type of the result of its split?

acadRegalia = "medieval origin, pompous, nylon construction"
string.split(acadRegalia)
string.split(acadRegalia, ",")

string.split("in next class")

s = "#AB#CDE##F#"
string.split(s, "#")
string.split(s, "##")

join(wordList)
join(wordList, sep)
join() is the inverse operation of split. What split() takes apart, join puts back together. wordList must be a list of strings. In the first version a blank is inserted between each string in the list when they are joined into one string. The second version inserts sep instead of a blank. Predict what the result is and check:

words = ["in", "next", "class"]
string.join(words)
string.join(words,"!!!")

Task 3: String Objects

At the very end of chapter 4, the book introduces the idea on an object. Objects have have data associated with them as well as actions or functions associated directly with them. Strings are objects. All but one of the string module methods above have a string as the first parameter, which is the string being acted on. An alternate shorter and more modern notation is illustrated by the change from
string.find(s, "e")
to
s.find("e")
The general pattern for changing a call to a string module function reference to a call to a string object reference is to move the first string parameter from the parameter list of the string module function and put it in place of "string" at the beginning, to get string object notation. The way to think of the syntax s.find("e") is that the object s has an operation (or the computer jargon is method) allowed on it called find that uses the added parameter "e".

Check the the two versions of these string operations produce the same results.

s = "hello"
string.find(s, "e")
s.find("e")

word = "class"
string.upper(word)
word.upper()

string.join is a special case, since its first parameter is not a string, but the second parameter is. In this case it is the separator that is the string being acted on in the string object formulation. Test:

words = ['a', 'an', 'the']
string.join(words)
string.join(words, " ")
" ".join(words)
"##".join(words)

Translate one of the examples for at least two of the operations in the string module above into object notation and test them.

There are many more operations on strings than we have discussed, and there are further variations of the ones above with more parameters. If you want to find more, go to Help-> Python Docs, select Library Reference, and Section 2.3 Built-in Types, and then Section 2.3.6.1, String Methods.

Task 4: Use String Operations

Write a program that would input a phrase from the user and print out the phrase with the white space between words replaced by underscores. For instance if the input is "the best one", then it would print "the_best_one". The conversion can be done in one or two statements using string functions discussed so far.

You may use either the form using the string module or the more compact version using string object methods.

The remaining tasks are based on problems in your text. Zelle, pp 119-120

Task 5, Creating Acronyms

An acronym is a word formed by taking the first letter from a phrase and making a word from them. For example, SIGCSE is an acronym for Special Interest Group Computer Science Education. Note that the acronym should be composed of all capital letters even if the original words are not. Write a program that has the user input a phrase and then prints the corresponding acronym.

To get you started, here are some things you will need to do. Indicate what Python function or operation will allow you to accomplish each task.

What type of data will the input be?
What type of data will the output be?

get the phrase
divide the phrase into words
initialize an accumulator with theempty string, "", to hold the result
get the first letter of each word
add it to the accumulator
make sure the accumulator is all caps
print the accumulator

Which of these steps is in a loop?
What for statement controls this loop?

Write and test your program

Task 6, Average word length

Write a program that will ask a user to enter a phrase. The program should then compute the average number of letters in each word of the phrase. You may assume there is no punctuation.

For example, if I enter “this is it”, the program should count 4 for the first word and two for each of the other two for a total of 8. The average would be 8/3 or 2.667

Remember that the built in function len( ) will return the number of characters in a string or the number of items in a list. So len(“this”) is 4, and len(['this', 'is', 'it']) is 3.

Outline the steps in this program as in Task 5.

Then write and test the program.

Extra Credit:

1. Write a program that asks the user for a phrase and then writes it backwards. Thus if the user entered “this is it” the program would print “it is this”
2. Modify Task 6 to allow punctuation,
!; : ' " , . ? ( )
so the phrase "this is 'it'!" would have the same average word length as "this is it" in Task 6. You may assume that all words still have space between them and that punctuation always has a word directly adjacent to it. This should mean that your count of words should work the same way as in Task 6, but your character count is likely to be off, because you counted all the punctuation. You should subtract the count of each punctuation character from your total of the characters in the words. It would be possible to explicitly write out a special term or statement for each individual punctuation symbol, but for the extra credit, put all the common English punctuation symbols listed above in a sequence and have a loop iterate through them, and make a count correction for each one using the same code. To create a string that includes both a literal single quote and double quote, enclose the string in triple quotes: """!;:'",.?()"""

Get checked off on Tasks 4, 5, 6, and optionally some of the Extra Credit

If a TA has not checked you by right after class in the next period, submit lab4ID_ID.py to Blackboard. Remember to put each task in a function definition if you submit to Blackboard.

Lab Index Course Home Page