LECTURE 7 The Standard Library THE STANDARD LIBRARY Python has a fantastically large standard library. Some modules are more useful than others (e.g. sys and strings). Some modules are relatively obscure. Some of the Standard Library is really just a collection of built-in functions. We’ll start by looking at the built-in functionality – what we can use without importing anything. BUILT-IN FUNCTIONS Python has about 80 built-in functions that are free to use without importing a module. Here’s a list. It would be hard to cover each function and its applications in a lecture period so I encourage you all to look at this list and see what’s available to you. There are additionally some non-essential built-in functions but they are deprecated or improved upon. BUILT-IN CONSTANTS • False • True • None: absence of a value. • NotImplemented >>> x = [1,2,3] >>> 1 in x True >>> if ((1 in x) == True): ... print "yay!" ... yay! >>> def myfunc(): ... pass ... >>> i = myfunc() >>> print i None BUILT-IN TYPES AND EXCEPTIONS We’ve already covered most of the built-in types and looked at some built-in exceptions. I recommend finding the complete lists in the official documentation. STRING SERVICES The string module (as opposed to the string data type ) and the re module provide some really useful string manipulation operations. There are a number of other modules that fall under the “string services” header, but these two are the most ubiquitous – any larger Python program will almost certainly use string and re is fairly common. THE STRING MODULE The string module provides a variety of useful string operations and constants. To use the string module, you have only to import string. The string module can roughly be broken down into the following sections: • Constants • Formatting • Templating • General Purpose Functions and Deprecated Functions STRING CONSTANTS The string constants are a number of built-in constants defined by the string module. >>> import string >>> string.ascii_letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> string.ascii_lowercase 'abcdefghijklmnopqrstuvwxyz' >>> string.digits '0123456789' >>> string.hexdigits '0123456789abcdefABCDEF' >>> string.printable '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQR STUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c' >>> string.whitespace '\t\n\x0b\x0c\r ' STRING CONSTANTS The string constants are a number of built-in constants defined by the string module. More listed in the docs. Now, I have predefined strings that can be used for a variety of useful operations. For example, … say I wanted to create a cipher… >>> import string >>> string.ascii_letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> string.ascii_lowercase 'abcdefghijklmnopqrstuvwxyz' >>> string.digits '0123456789' >>> string.hexdigits '0123456789abcdefABCDEF' >>> string.printable '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQR STUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c' >>> string.whitespace '\t\n\x0b\x0c\r ' STRING FORMATTING The str.format() method allows you to specify “replacement fields” within a string by using curly braces. Everything outside of the curly braces is literal text. • Positional arguments >>> import string >>> '{0} {1} {2}'.format('Hello', 'Python', 'Class!') 'Hello Python Class!' >>> '{} {} {}'.format('Hello', 'Python', 'Class!') 'Hello Python Class!' >>> '{3}, {0} {1} {2}'.format('My', 'name', 'is', 'Yoda') 'Yoda, My name is' >>> '{0} {2} {3}, {1} {2} {3}'.format('Walk', 'Talk', 'this', 'way') 'Walk this way, Talk this way' STRING FORMATTING The str.format() method allows you to specify “replacement fields” within a string by using curly braces. Everything outside of the curly braces is literal text. • Keyword arguments >>> 'Today is {day} and it is {temp} degrees Fahrenheit outside in {city}'.format(day='Jan 23', temp='65', city='Tallahassee') 'Today is Jan 23 and it is 65 degrees Fahrenheit outside in Tallahassee' STRING FORMATTING You can also access the attributes of an object inside of the format() method. >>> class Fraction: ... def __init__(self, num, denom): ... self.num = num ... self.denom = denom ... >>> myfrac = Fraction(3,5) >>> 'The numerator is {0.num} and the denominator is {0.denom}'.format(myfrac) 'The numerator is 3 and the denominator is 5' STRING FORMATTING You can also access items in a sequence object. >>> fibonacci = [0, 1, 1, 2, 3, 5, 8] >>> '{0[2]} + {0[3]} = {0[4]}'.format(fibonacci) '1 + 2 = 3' And align text. >>> '{:<50}'.format('To the left, to the left') 'To the left, to the left ' >>> '{:>50}'.format('All right, All right, All right') ' All right, All right, All right' >>> '{:^50}'.format('Let\'s center ourselves') " Let's center ourselves " >>> '{:*^50}'.format('Let\'s center ourselves') "**************Let's center ourselves**************" STRING FORMATTING You can also change the precision and display of your float values. >>> '{:.2f}'.format(3.4385) '3.44' >>> '{:.4f}'.format(3.4385) '3.4385' STRING FORMATTING Some fun things: • Add commas to your big numbers. • Define precision for your percentages. • Use formatting specific to a type. >>> '{:,}'.format(1234567890) '1,234,567,890' >>> points = 19.5 >>> total = 22 >>> 'Correct answers: {:.2%}'.format(points/total) 'Correct answers: 88.64%' >>> import datetime >>> d = datetime.datetime(2010, 7, 4, 12, 15, 58) >>> '{:%Y-%m-%d %H:%M:%S}'.format(d) '2010-07-04 12:15:58' STRING TEMPLATING Templating provides simpler substitution methods – there are also some very advanced uses for templates but we’ll skip that for now. >>> import string >>> mytemp = string.Template("$who likes $what") >>> mytemp.substitute(who='tim', what='kung pao') 'tim likes kung pao' STRING FUNCTIONS Probably the most useful and most used part of the string module are the numerous string functions. Note that these functions are considered deprecated but only in Python 3.x. The difference between Python 2.x and Python 3.x is this: >>> string.capitalize("word") 'Word’ >>> "word".capitalize() 'Word' Python 2.x only Python 2.x and 3.x STRING FUNCTIONS Here’s a sample of the many useful string functions predefined for you! >>> import string >>> string.atoi("345") 345 >>> string.capitalize("word") 'Word' >>> string.find("expression", "press") 2 >>> string.rfind("This is a sentence.", " ") 9 >>> string.count("mississippi", "iss") 2 >>> string.lower("WORD") 'word' STRING FUNCTIONS Here’s a sample of the many useful string functions predefined for you! >>> l = string.split("Look at all these words I have!") >>> l ['Look', 'at', 'all', 'these', 'words', 'I', 'have!'] >>> " ".join(l) 'Look at all these words I have!' >>> string.rstrip("exceptional", "al") 'exception' >>> string.strip("Temp: 59F", "Temp: ") '59F' >>> string.swapcase("HeLlO") 'hElLo' >>> string.replace("cout >> x >> endl;", ">>", "<<") 'cout << x << endl;' RE MODULE So you see now what makes Python support such rapid development – how long would it take you to do each of those things in C++ or C (using the standard library)? Let’s move on to the next popular string service: re re is the module that defines all of the built-in regular expression operations and supports Unicode as well as 8-bit strings. RE SYNTAX Let’s do a short intro to regular expressions for those of you who are not familiar. A regular expression is a set of characters with special meaning that defines an entire set of strings. A single character simply matches itself (e.g. ‘A’ matches the string “A”). Characters like ‘|’, ’.’, ‘$’, ‘*’, ‘+’ have special meaning and allow us to construct regular expressions concisely. RE SYNTAX Sybol Meaning Example | Alternation A|B = {‘A’, ‘B’} . Match any char but newline . = {‘a’, ‘b’, ‘c’, ‘%’, ‘#’, ‘ ‘, ‘(‘, ...} * Match 0 or more repetitions (str)*={‘’, ‘str’, ‘strstr’, ‘strstrstr’, ..} + Match 1 or more repetitions (str)+={‘str’, ‘strstr’, ‘strstrstr’, …} ? Preceding RE optional ab? = {‘a’, ‘ab’} {m} Match exactly m copies of previous RE A{6} = {‘AAAAAA’} [] Characters in a set [0-9a-z] = {‘0’, ‘a’, ‘1’, ‘b’, ‘2’, …} There are a lot of options – check out the official docs to see what you can do. Most of these examples are based off of the tutorialspoint page for RE in Python – very helpful! RAW STRINGS A note before we do some examples: Typically, regular expressions are passed to re methods as raw strings (e.g. r‘[0-9]+’). Strings appended with an ‘r’ in the front will have their escape characters suppressed. By keeping regular expressions raw, we can avoid these kinds of issues: r"\\“ regular expression matching literal backslash. "\\\\“ regular expression matching literal backslash. REGULAR EXPRESSIONS re.match(pattern, string, flags = 0) • Returns a MatchObject if it can match pattern in string. Else returns None. • We can access the matched expression using the method group(). • We can pass an index to group (i.e. group(1)) to access matched subgroups. These are denoted with parentheses in the regular expression. >>> import re >>> line = "Cats are smarter than dogs" >>> match_obj = re.match(r'(.*) are (.*?) .*', line, re.I) >>> if match_obj: ... print "match_obj.group() : ", match_obj.group() ... print "match_obj.group(1): ", match_obj.group(1) ... print "match_obj.group(2): ", match_obj.group(2) ... else: ... print "No match!" ... match_obj.group() : Cats are smarter than dogs match_obj.group(1): Cats match_obj.group(2): smarter REGULAR EXPRESSIONS • The match() function is just checking to match the regular expression from the beginning of the string. • Use search() to check for a match anywhere within the string. >>> import re >>> line = "Cats are smarter than dogs" >>> match_obj = re.match(r'dogs', line, re.I) >>> if match_obj: ... print "Match is: ", match_obj.group() ... else: ... print "No Match!" ... No Match! REGULAR EXPRESSIONS • The match() function is just checking to match the regular expression from the beginning of the string. • Use search() to check for a match anywhere within the string. >>> import re >>> line = "Cats are smarter than dogs" >>> search_obj = re.search(r'dogs', line, re.I) >>> if search_obj: ... print "Match is: ", search_obj.group() ... else: ... print "No Match!" ... Match is: dogs REGULAR EXPRESSIONS Use the sub() function to perform substitution. \d matches any digit. \D matches any non-digit. $ matches the end of a line. >>> import re >>> phone = "2004-959-559 # This is Phone Number" >>> num = re.sub(r'#.*$', "", phone) >>> print "Phone Num : ", num Phone Num : 2004-959-559 >>> num = re.sub(r'\D', "", phone) >>> print "Phone Num : ", num Phone Num : 2004959559 REGULAR EXPRESSIONS You can compile your regular expressions for future use if they are used multiple times within a program. Just call compile() with your re string. It creates a regular expression object, with the usual methods. >>> my_re = re.compile(r'([a-zA-Z])([a-zA-Z_0-9]*)') >>> match_obj = my_re.match("myint_1") >>> match_obj.group() 'myint_1' >>> match_obj.group(1) 'm' >>> match_obj.group(2) 'yint_1' >>> match_obj = my_re.match("totalBalance") >>> match_obj.group() 'totalBalance' >>> match_obj.group(1) 't' >>> match_obj.group(2) 'otalBalance' REGULAR EXPRESSIONS The official regular expressions docs have a ton of cool examples including implementing C’s scanf, making a phonebook, etc. Next time, we’ll be talking about specialized data types in the Python Standard Library.