Python Standard Library (command line arguments, string pattern matching, internet access and data compression) Module - 4, Weeks - 2 and 3 Shanti Chilukuri Department of Computer Science and Technology, GIT, Visakhapatnam GITAM (deemed to be University) Command Line Arguments ● ● Can be done using the argparse module in Python standard library. Can be used to specify: ○ ○ Arguments and Options for running: --help or -h is an option we get without doing anything import argparse parser = argparse.ArgumentParser(description="purpose of the script") args=parser.parse_args() Shanti Chilukuri, GITAM Positional Arguments ● ● Can be added as parser.add_argument(argnamestring, help=helpstring, type=datatype) Value can be accessed as args = parser.parse_args(), args.argnamestring. Shanti Chilukuri, GITAM Optional Arguments ● ● ● ● Can be added as parser.add_argument(--argnamestring, help=helpstring, type=datatype) Value can be accessed as args = parser.parse_args(), args.argnamestring. If no argument is given, the argument has a value None Action: ○ If an argument takes only True or False, action=”store_true” makes it ■ True if the option is specified, ■ False if no option is specified. No value needs to be given. ● action=”count” counts the number of occurrences of an optional argument ○ If an argument is given one or more times, args.argname returns the number of occurrences ○ If an argument is given zero times, argos.argname returns None. To make this 0, default=0 should be given Shanti Chilukuri, GITAM Optional Arguments (cont.) ● To restrict values taken by an argument, use parser.add_argument(--argnamestring, help=helpstring, type=datatype, choices=list of choices) ● Short options: Can be added as parser.add_argument(-argshortnamestring, --argnamestring, help=helpstring, type=datatype) Conflicting options : Mutually exclusive - both cannot be true at the same time. E.g., quiet and verbose ● ○ Can be specified by group = parser.add_mutually_exclusive_group() group.addargument(.......) Shanti Chilukuri, GITAM Combining positional and optional arguments ● ● The two types of arguments can be combined Order does not matter Shanti Chilukuri, GITAM Pattern Matching (re module) Shanti Chilukuri, GITAM Regular Expressions ● ● A regular expression ( RE) is a sequence of characters to be used as a search pattern. It has a set of strings that match it The re module has functions that check if a string matches a regular expression. ● Unicode strings (str) or 8-bit strings (bytes) can be the pattern or the string to be searched. ● The backslash character ('\') is used for special characters. E.g., \n is the new line character. ● r(str) is a raw string that treats special characters as two normal characters. ● The re module has many methods for pattern matching. Shanti Chilukuri, GITAM Compiling Regular Expressions vs Module level functions ● Compiling can be done to create pattern objects from strings. Functions can be be called on this object. pat = re.compile(“Tom”) pat.match(”Tom is a cat”) ● Alternately, the pattern can be passed as a string directly to a module level function. re.match(“Tom” , ”Tom is a cat”) Some Compilation Flags: 1. 2. 3. 4. DOTALL, S : Makes DOT match any character including the newline IGNORECASE, I : Case-sensitive matching MULTILINE, M : ^ and $ match beginning or end of multilines ASCII, A : \w, \W, \b, \B, \s and \S perform ASCII-only matching instead of full Unicode matching Shanti Chilukuri, GITAM Regular Expressions ● ● ● ● ● ● search(pattern,string) returns a Match object if there is a match to pattern anywhere in string. match(pattern,string) returns a Match object if there is a match to pattern at the beginning of string. findall(pattern,string) returns a list of all matches. split(pattern, string) returns list with the string split at each match sub(pattern1,pattern2, string) finds all substrings where the pattern1 matches, and replaces them with pattern2 subn(pattern1,pattern2, string) returns a new string where pattern1 is replaced with pattern2 in the string and number of replacements Shanti Chilukuri, GITAM Regular Expressions (cont.) ● ● ● Concatenation of two regular expressions gives another regular expression. If string p matches regex A and q matches regex B, then pq matches AB Regular expressions may contain: ○ Most ordinary characters like ‘A’, ‘a’, ‘0’ (simple RE that matches itself) Shanti Chilukuri, GITAM Match Object Methods ● Match objects can be queried for the matched string using the following methods ○ ○ ○ ○ group(): Return the string matched by the RE start() : Return the starting position of the match end() : Return the ending position of the match span() : Return a tuple containing the (start, end) positions of the match Shanti Chilukuri, GITAM Metacharacters ● ● ● ● ● ● ● ● ● ● ● Also called special characters. They represent classes of ordinary characters or effect how regex around them are interpreted. . (dot) matches any character except newline. To match any character including newline, use the DOTALL flag. ^ (carat) matches starting of any string. $ (dollar) matches end of a string or just before newline at the end of a string * (asterisk) matches 0 or more of repetitions of the RE before it + (plus) matches 1 or more of repetitions of the RE before it ? (question mark) matches 0 or 1 repetitions of the RE before it {m} matches exactly m repetitions of the RE preceding it. {m,n} matches m to n repetitions of the RE preceding it. \ {backslash} allows to match special characters like * or indicates a special sequence [] indicates a set of characters. To complement, use ^ at the beginning. Shanti Chilukuri, GITAM Special Sequences ● ● ● ● ● ● ● ● ● ● ● Like special characters, they represent classes of ordinary characters or effect how regex around them are interpreted. Start with ‘\’ \A : Matches if the characters are at the beginning of the string.e.g., re.match(“\Aap”, “apples”) \Z : Matches if the characters are at the end of the string.e.g., re.match(“es\Z”, “apples”) \b : Matches when the specified characters are at the beginning or at the end of a word e.g., re.match(r”\bor”,”oranges”) \B : Matches when the specified characters are NOT at the beginning or at the end of a word e.g., re.match(r”\bor”,”oranges”) \d : Matches when the string contains digits 0 to 9 \D : Matches when the string DOES NOT contain digits 0 to 9 \s : Matches when the string contains white space \S : Matches when the string DOES NOT contain white space \w : Matches if the string contains any word characters (characters from a to Z, digits from 0 to 9, underscore _) \W : Matches if the string DOES NOT contain any word characters (characters from a to Z, digits from 0 to 9, underscore _) Shanti Chilukuri, GITAM Internet Access (urllib and smtplib packages) Shanti Chilukuri, GITAM urllib Package ● Contains several modules for working with Uniform Resource Locators (URLs) ○ ○ ○ urllib.request : module for opening and reading URLs urllib.error : module with exceptions raised by urllib.request urllib.parse : module for parsing URLs Shanti Chilukuri, GITAM urllib package (cont.) ● urllib.request Module [3]: Contains several functions for opening, url authentication, cookies and redirection of URLs. Some important functions: ○ ○ ● urllib.request.urlopen() : To open a url in the form of a string or a Request object urllib.request.HTTPRedirectHandler : To handle redirections urllib.parse Module [4] : Contains functions for parsing and quoting urls. ○ ○ urllib.parse.urlparse(urlstring[,defaultscheme[, allow_fragments]]): Parses a URL into six parts, and returns a named tuple with six items., assuming the URL structure to be URL: scheme://netloc/path;parameters?query#fragment. ■ scheme - addressing scheme of the URL (the string before “:” .E.g., https) ■ netloc - network location part (the domain and subdomain, if any. E.g., “www.gitam.edu”) ■ path - hierarchical path ■ query - query component (after “;” or “?”after path) ● To parse the query string into components, use urllib.parse.parse_qs() ■ fragment - fragment identifier (refers to a section within a web page. For HTML page, the tag “anchor” is searched for. For more on fragments, please refer to this) urllib.parse.urlunparse(): constructs a string from a tuple returned by urlparse() Shanti Chilukuri, GITAM smtplib Module ● smtplib Module [5]: defines SMTP client session object that can be used to send email ● class smtplib.SMTP(host='', port=0, local_hostname=None, [timeout, ]source_address=None). Important methods ○ ○ ○ SMTP.connect(host='localhost', port=0) SMTP.sendmail(from_addr, to_addrs, msg, mail_options=(), rcpt_options=()) SMTP.quit() Shanti Chilukuri, GITAM Zlib, bz2 and gzip Modules ● zlib allows compression and decompression. ○ ○ ● bz2 allows compression into a bz2 file. ○ ● zlib.compress(data, /, level=-1) zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE) Just open the file and write using bz2.open(filename,”wb”) gzip allows compression into a gzip file. ○ Just open the file and write using gzip.open(filename,”wb”) Shanti Chilukuri, GITAM Zipfile Module ● Provides methods to create, read, write, append, and list a ZIP file ● class zipfile.ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, compresslevel=None, *, strict_timestamps=True) ● Important methods : ○ ○ ○ ○ ○ ○ ZipFile.open(name, mode='r', pwd=None, *, force_zip64=False): Access a member of the archive. Mode must be ‘r’ or ‘w’. ZipFile.write(filename, archname=None, compress_type=None, compresslevel=None): Write the file named filename to the archive, and name the archive “archname”. ZipFile.read(name, pwd=None): returns the number of bytes in file name ZipFile.extract(member, path=None, pwd=None) : Extracts member from the archive to the current directory ZipFile.extractall(path=None, members=None, pwd=None) : Extracts all files form the archive to the current directory. ZipFile.setpassword(pwd): Sets a password for the archive. Shanti Chilukuri, GITAM References 1) Python Documentation Argparse Tutorial 2) Regular Expressions HOW TO, Python Documentation Regular Expression operations 6) Python Documentation zipfile Shanti Chilukuri, GITAM