2.1. Strings, Part III

2.1.1. Object Orientation

Python is an object-oriented language. Every piece of data and even functions and types are objects. The term object-oriented is used to distinguish Python from earlier languages, classified as procedural languages, where types of data and the operations on them were not connected in the language. The functions we have used so far follow the older procedural programming syntax. In the newer paradigm of object-oriented programming, all data are in objects, and a core group of operations that can be done on some particular type of object are tightly bound to the object and called the object’s methods.

For example, strings are objects, and strings “know how” to produce an uppercase version of themselves. Try in the Shell:

s = 'Hello!'
s.upper()

Here upper is a method associated with strings. This means upper is a function that is bound to the string before the dot. This function is bound both logically, and as we see in the new notation, also syntactically. One way to think about it is that each type of data knows operations (methods) that can be applied to it. The expression s.upper() calls the method upper that is bound to the string s and returns a new uppercase string result based on s.

Strings are immutable, so no string method can change the original string, it can only return a new string. Confirm this by entering each line individually in the Shell to see the original s is unchanged:

s
s2 = s.upper()
s2
s

We are using the new object syntax:

object.method( )

meaning that the method associated with the object’s type is applied to the object. This is just a special syntax for a function call with an object.

Another string method is lower, analogous to upper, but producing a lowercase result.

Test yourself: How would you write the expression to produce a lowercase version of the string s? Answer: [1]

Try it in the Shell.

Test yourself in the Shell: How would you use this string s and both the lower and upper methods to create the string 'hello!HELLO!'? Hint: [2] Answer: [3]

Many methods also take additional parameters between the parentheses, using the more general syntax:

object.method(parameters)

The first of many such methods we will introduce is count:

Syntax for count:

s.count(sub)

Count and return the number of repetitions of a string sub that appear as substrings inside the string s.

Read and make sure you see the answers are correct:

>>> tale = 'This is the best of times.'
>>> tale.count('i')
3
>>> tale.count('is')
2
>>> tale.count('That')
0
>>> tale.count(' ')
5

There is a blank between the quotes in the line above. Blanks are characters like any other (except you can’t see them)!

Just as the parameter can be replaced by a literal or any expression, the object to which a method is bound with the dot may also be given by a literal, or a variable name, or any expression that evaluates to the right kind of object in its place. This is true for any method call.

Technically the dot between the object and the method name is an operator, and operators have different levels of precedence. It is important to realize that this dot operator has the highest possible precedence. Read and see the difference parentheses make in the expressions:

>>> 'hello ' + 'there'.upper()
'hello THERE'
>>> ('hello ' + 'there').upper()
'HELLO THERE'

To see if you understand this precedence, predict the results of each line and then test in the Shell:

3 * 'X'.count('XXX')
(3 * 'X').count('XXX')

There are 0 ‘XXX’s in ‘X’, but 1 ‘XXX’ in ‘XXX’.

Python lets you see all the methods that are bound to an object (and any object of its type) with the built-in function dir. To see all string methods, supply the dir function with any string. For example, try in the Shell:

dir('')

Many of the names in the list start and end with two underscores, like __add__. These are all associated with methods and pieces of data used internally by the Python interpreter. You can ignore them for now. The remaining entries in the list are all user-level methods for strings. You should see lower and upper among them. Some of the methods are much more commonly used than others.

Object notation

object.method(parameters)

has been illustrated so far with just the object type str, but it applies to all types. Later in the tutorial methods such as the following will be discussed:

If seq is a list, seq.append(element) appends element to the end of the list.

If myData is a file, myData.read() will read and return the entire contents of the file....

2.1.2. String Indices

A string is a sequence of smaller components (individual characters), and it is often useful to deal with parts of strings. Python indexes the characters in a string, starting from 0, so for instance, the characters in the string 'computer' have indices:

character c o m p u t e r
index 0 1 2 3 4 5 6 7

Each index is associated with a character, and you reference the individual characters much like in a dictionary. Try the following. (You can skip the comments that make the indices explicit.) Enter in the Shell:

#    01234567
s = 'computer'
s[0]
s[5]
s[8]

You cannot refer directly to a character that is not there. Indices only go to 7 in the example above.

Recall the len function, which gives the length of a sequence. It works on strings. Guess the following value, and test in the Shell:

len(s)

A common error is to think the last index will be the same as the length of the string, but as you saw above, that leads to an execution error. If the length of some string is 5, what is the index of its last character? What if the length is 35?

Hopefully you did not count by ones all the way from 0. The indices for a string of length n are the elements of the sequence range(n), which goes from 0 through n-1, or the length of the string minus one, which is 5-1=4 or 35-1 = 34 in these examples.

Sometimes you are interested in the last few elements of a string and do not want to do calculations like this. Python makes it easy. You can index from the right end of the string. Since positive integers are used to index from the front, negative integers are used to index from the right end, so the more complete table of indices for 'computer' gives two alternatives for each character:

character c o m p u t e r
index 0 1 2 3 4 5 6 7
index from the right end -8 -7 -6 -5 -4 -3 -2 -1

Predict and test each individual line, continuing in the Shell:

s[-1]
s[-3]
s[-10]

it = 'horse'
len(it)
it[-1]
it[1]

Be careful - remember what the initial index is!

2.1.3. String Slices

It is also useful to extract larger pieces of a string than a single character. That brings us to slices. Try this expression using slice notation, continuing in the Shell:

s[0:4]

Note that s[4] is the first character past the slice. The simplest syntax for a slice of a string s is:

s[ startIndex : pastIndex ]

This refers to the substring of s starting at index startIndex and stopping just before index pastIndex.

Warning

It confuses many people that the index after the colon is not the index of the final character in the slice. The second index is past the end of the slice.

Predict and try each line individually in the Shell:

s[2:5]
s[1:3]

If you omit the first index, the slice starts from the beginning. If you omit the second index, the slice goes all the way to the end. Predict and try each line individually in the Shell:

s[:3]
s[5:]
s[:]

Predict and try each line individually in the Shell:

word = 'program'
word[2:4]
word[1:-3]
word[3:]
word[3:3]
word[:1] + word[4:]

Python evaluates slices in a more forgiving manner than when indexing single characters. In a slice, if you give an index past a limit of where it could be, Python assumes you mean the actual end. Predict and try each line individually in the Shell:

word[:9]
word[8:10]

Enter a slice expression using the variable word from above that produces 'gra'.

A useful string method that uses the ideas of indices and slices is find.

Syntax options for find method with a string s:

s.find(sub)
s.find(sub, start)
s.find(sub, start, end)

Return the integer index in the string s of the beginning of the first complete occurrence of the substring sub. If sub does not appear inside s, return -1. The value -1 would be an impossible result if sub were found, so if -1 is returned, sub must not have been found. If parameters start and end are not included in the parameter list, the search is through the whole string s. If an integer value is given for start, the search starts at index start. If an integer value is given for end, the search ends before index end. In other words if start and end appear, then the search is through the slice s[start : end], but the index returned is still counted from the beginning of s.

For example, check that the following make sense. The comment line is just there to help you count:

>>> #    01234567890
>>> s = 'Mississippi'
>>> s.find('i')
1
>>> s.find('si')
3
>>> s.find('sa')
-1
>>> s.find('si', 4)
6

Predict and try each line in the Shell:

#       0123456789012
line = 'Hello, there!'
line.find('e')
line.find('he')
line.find('e', 10)
line.find('he', 10)

We will consider more string methods later, but we can already do useful things with the ones introduced.

Inside the Shell, you can look up documentation on any of the methods listed with the dir function. Here is a place that you want to refer to the method itself, not invoke the method, so note that you get help for s.find not for s.find(). Assuming you defined the string s in the Shell earlier, try in the Shell

help(s.find)

The Python documentation uses square brackets to indicate optional elements which get a default value if you leave them out. This shortens the syntax descriptions.

If you want method documentation when you do not have a variable of the type created, you can also use the type name. Try in the Shell:

dir(str)
help(str.capitalize)

In the help documentation for a function with one or more parameters, you may see what looks like a final parameter /. Ignore it. It documents a technical restriction on parameters. It is not actually a parameter.

Indexing and slicing works on any kind of Python sequence, so you can index or slice lists also.* Read* this Shell session:

>>> vals = [5, 7, 9, 22, 6, 8]
>>> vals[1]
7
>>> vals[-2]
6
>>> vals[1:4]
[7, 9, 22]

Unlike strings, lists are mutable, as you will see in Appending to a List. Indices and slices can also be used in assignment statements to change lists, but in this tutorial we not need list indexing, and we will not discuss this subject further.

2.1.4. Index Variables

All the concrete examples in the last two sections used literal numbers for the indices. That is fine for learning the idea, but in practice, variables or expressions are almost always used for indices. As usual the variable or expression is evaluated before being used. Try in Idle and see that the example program index1.py makes sense:

s = 'word'
print('The full string is: ', s)
n = len(s)
for i in range(n):
    print()
    print('i =', i)
    print('The letter at index i:', s[i])
    print('The part before index i (if any):', s[:i])
    print('The part before index i+2:', s[:i+2])

We will use index variables in more practical situations as we explain more operations with strings.

2.1.5. split

Syntax options for the split method with a string s:

s.split()
s.split(sep)

The first version splits s at any sequence of whitespace (blanks, newlines, tabs) and returns the remaining parts of s as a list. If a string sep is specified, it is the separator that gets removed from between the parts of the list.

For example, read and follow:

>>> tale = 'This is the best of times.'
>>> tale.split()
['This', 'is', 'the', 'best', 'of', 'times.']
>>> s = 'Mississippi'
>>> s.split('i')
['M', 'ss', 'ss', 'pp', '']
>>> s.split()  # no white space
['Mississippi']

Predict and test each line in the Shell:

line = 'Go:  Tear some strings apart!'
seq = line.split()
seq
line.split(':')
line.split('ar')
lines = 'This includes\\nsome new\\nlines.'
lines.split()

2.1.6. join

Join is roughly the reverse of split. It joins together a sequence of strings. The syntax is rather different. The separator sep comes first, since it has the right type (a string).

Syntax for the join method:

sep.join(sequence)

Return a new string obtained by joining together the sequence of strings into one string, interleaving the string sep between sequence elements.

For example (continuing in the Shell from the previous section, using seq), follow:

>>> ' '.join(seq)
'Go: Tear some strings apart!'
>>> ''.join(seq)
'Go:Tearsomestringsapart!'
>>> '//'.join(seq)
'Go://Tear//some//strings//apart!'

Predict and try each line, continuing in the Shell:

'##'.join(seq)
':'.join(['one', 'two', 'three'])

The methods split and join are often used in sequence:

2.1.6.1. Underscore Exercise

Write a program underscores.py that would input a phrase from the user and print out the phrase with the white space between words replaced by an underscore. For instance if the input is the best one, then it would print the_best_one. The conversion can be done in one or two statements using the recent string methods.

2.1.6.2. Acronym Exercise

* An acronym is a string of capital letters formed by taking the first letters from a phrase. For example, SADD is an acronym for ‘students against drunk driving’. Note that the acronym should be composed of all capital letters even if the original words are not. Write a program acronym.py that has the user input a phrase and then prints the corresponding acronym.

To get you started, here are some things you will need to do. First check that you understand the basic syntax to accomplish the different individual tasks: Indicate the proper syntax using a Python function or operation will allow you to accomplish each task. Invent appropriate variable names for the different parts. These are not complete instructions! The idea is to make sure you know the basic syntax to use in all these situations. See the questions after the list to help you put together the final program.

  1. What type of data will the input be? What type of data will the output be?
  2. Get the phrase from the user.
  3. Convert to upper case.
  4. Divide the phrase into words.
  5. Initialize a new empty list, letters.
  6. Get the first letter of each word.
  7. Append the first letter to the list letters.
  8. Join the letters together, with no space between them.
  9. Print the acronym.

Which of these steps is in a loop? What for statement controls this loop?

Put these ideas together and write and test your program acronym.py. Make sure you use names for the objects that are consistent from one line to the next! (You might not have done that when you first considered the syntax and ideas needed for 1-9 above individually.)

2.1.7. Further Exploration

As the dir('') list showed, there are many more operations on strings than we have discussed, and there are further variations of the ones above with more parameters. Methods startswith, endswith, and replace are discussed later in More String Methods. If you want to reach a systematic reference from inside Idle, go to Help ‣ Python Docs ‣ Library Reference, and find the section Built-in Types, and then the subsection for type str. Many methods use features we have not discussed yet, but currently accessible methods are capitalize, title, strip, rfind, ....

[1]s.lower()
[2]Use a plus sign to concatenate the pieces.
[3]s.lower() + s.upper()