Uploaded by vinaykumar456575

Python Standard Library (command line arguments, string pattern matching, internet access and data compression) (EID451 OPEN SOURCE SOFTWARE DEVELOPMENT) - Module4 Weeks 2 and 3

advertisement
Python Standard Library
(command line arguments, string pattern matching,
internet access and data compression)
Module - 4, Weeks - 2 and 3
Shanti Chilukuri
Department of Computer Science and Technology,
GIT, Visakhapatnam
GITAM (deemed to be University)
Command Line Arguments
●
●
Can be done using the argparse module in Python standard library.
Can be used to specify:
○
○
Arguments and
Options for running: --help or -h is an option we get without doing anything
import argparse
parser = argparse.ArgumentParser(description="purpose of the script")
args=parser.parse_args()
Shanti Chilukuri, GITAM
Positional Arguments
●
●
Can be added as parser.add_argument(argnamestring, help=helpstring,
type=datatype)
Value can be accessed as args = parser.parse_args(), args.argnamestring.
Shanti Chilukuri, GITAM
Optional Arguments
●
●
●
●
Can be added as parser.add_argument(--argnamestring, help=helpstring,
type=datatype)
Value can be accessed as args = parser.parse_args(), args.argnamestring.
If no argument is given, the argument has a value None
Action:
○
If an argument takes only True or False, action=”store_true” makes it
■ True if the option is specified,
■ False if no option is specified.
No value needs to be given.
●
action=”count” counts the number of occurrences of an optional argument
○ If an argument is given one or more times, args.argname returns the number
of occurrences
○ If an argument is given zero times, argos.argname returns None. To make this
0, default=0 should be given
Shanti Chilukuri, GITAM
Optional Arguments (cont.)
●
To restrict values taken by an argument, use
parser.add_argument(--argnamestring, help=helpstring,
type=datatype, choices=list of choices)
●
Short options: Can be added as parser.add_argument(-argshortnamestring,
--argnamestring, help=helpstring,
type=datatype)
Conflicting options : Mutually exclusive - both cannot be true at the same time.
E.g., quiet and verbose
●
○
Can be specified by group = parser.add_mutually_exclusive_group()
group.addargument(.......)
Shanti Chilukuri, GITAM
Combining positional and optional arguments
●
●
The two types of arguments can be combined
Order does not matter
Shanti Chilukuri, GITAM
Pattern Matching (re module)
Shanti Chilukuri, GITAM
Regular Expressions
●
●
A regular expression ( RE) is a sequence of characters to be used as a search
pattern. It has a set of strings that match it
The re module has functions that check if a string matches a regular
expression.
● Unicode strings (str) or 8-bit strings (bytes) can be the pattern or the
string to be searched.
● The backslash character ('\') is used for special characters. E.g., \n is
the new line character.
● r(str) is a raw string that treats special characters as two normal
characters.
● The re module has many methods for pattern matching.
Shanti Chilukuri, GITAM
Compiling Regular Expressions vs Module level functions
●
Compiling can be done to create pattern objects from strings. Functions can be be called on
this object.
pat = re.compile(“Tom”)
pat.match(”Tom is a cat”)
●
Alternately, the pattern can be passed as a string directly to a module level function.
re.match(“Tom” , ”Tom is a cat”)
Some Compilation Flags:
1.
2.
3.
4.
DOTALL, S : Makes DOT match any character including the newline
IGNORECASE, I : Case-sensitive matching
MULTILINE, M : ^ and $ match beginning or end of multilines
ASCII, A : \w, \W, \b, \B, \s and \S perform ASCII-only matching instead of full Unicode
matching
Shanti Chilukuri, GITAM
Regular Expressions
●
●
●
●
●
●
search(pattern,string) returns a Match object if there is a match to pattern
anywhere in string.
match(pattern,string) returns a Match object if there is a match to pattern at
the beginning of string.
findall(pattern,string) returns a list of all matches.
split(pattern, string) returns list with the string split at each match
sub(pattern1,pattern2, string) finds all substrings where the pattern1 matches,
and replaces them with pattern2
subn(pattern1,pattern2, string) returns a new string where pattern1 is replaced
with pattern2 in the string and number of replacements
Shanti Chilukuri, GITAM
Regular Expressions (cont.)
●
●
●
Concatenation of two regular expressions gives another regular expression.
If string p matches regex A and q matches regex B, then pq matches AB
Regular expressions may contain:
○
Most ordinary characters like ‘A’, ‘a’, ‘0’ (simple RE that matches itself)
Shanti Chilukuri, GITAM
Match Object Methods
●
Match objects can be queried for the matched string using the following
methods
○
○
○
○
group(): Return the string matched by the RE
start() : Return the starting position of the match
end() : Return the ending position of the match
span() : Return a tuple containing the (start, end) positions of the match
Shanti Chilukuri, GITAM
Metacharacters
●
●
●
●
●
●
●
●
●
●
●
Also called special characters. They represent classes of ordinary characters or
effect how regex around them are interpreted.
. (dot) matches any character except newline. To match any character including
newline, use the DOTALL flag.
^ (carat) matches starting of any string.
$ (dollar) matches end of a string or just before newline at the end of a string
* (asterisk) matches 0 or more of repetitions of the RE before it
+ (plus) matches 1 or more of repetitions of the RE before it
? (question mark) matches 0 or 1 repetitions of the RE before it
{m} matches exactly m repetitions of the RE preceding it.
{m,n} matches m to n repetitions of the RE preceding it.
\ {backslash} allows to match special characters like * or indicates a special
sequence
[] indicates a set of characters. To complement, use ^ at the beginning.
Shanti Chilukuri, GITAM
Special Sequences
●
●
●
●
●
●
●
●
●
●
●
Like special characters, they represent classes of ordinary characters or effect how regex
around them are interpreted. Start with ‘\’
\A : Matches if the characters are at the beginning of the string.e.g., re.match(“\Aap”, “apples”)
\Z : Matches if the characters are at the end of the string.e.g., re.match(“es\Z”, “apples”)
\b : Matches when the specified characters are at the beginning or at the end of a word e.g.,
re.match(r”\bor”,”oranges”)
\B : Matches when the specified characters are NOT at the beginning or at the end of a word
e.g., re.match(r”\bor”,”oranges”)
\d : Matches when the string contains digits 0 to 9
\D : Matches when the string DOES NOT contain digits 0 to 9
\s : Matches when the string contains white space
\S : Matches when the string DOES NOT contain white space
\w : Matches if the string contains any word characters (characters from a to Z, digits from 0
to 9, underscore _)
\W : Matches if the string DOES NOT contain any word characters (characters from a to Z,
digits from 0 to 9, underscore _)
Shanti Chilukuri, GITAM
Internet Access (urllib and smtplib packages)
Shanti Chilukuri, GITAM
urllib Package
●
Contains several modules for working with Uniform Resource Locators (URLs)
○
○
○
urllib.request : module for opening and reading URLs
urllib.error : module with exceptions raised by urllib.request
urllib.parse : module for parsing URLs
Shanti Chilukuri, GITAM
urllib package (cont.)
● urllib.request Module [3]: Contains several functions for opening, url authentication,
cookies and redirection of URLs. Some important functions:
○
○
●
urllib.request.urlopen() : To open a url in the form of a string or a Request object
urllib.request.HTTPRedirectHandler : To handle redirections
urllib.parse Module [4] : Contains functions for parsing and quoting urls.
○
○
urllib.parse.urlparse(urlstring[,defaultscheme[, allow_fragments]]): Parses a URL into six parts, and returns a
named tuple with six items., assuming the URL structure to be URL:
scheme://netloc/path;parameters?query#fragment.
■ scheme - addressing scheme of the URL (the string before “:” .E.g., https)
■ netloc - network location part (the domain and subdomain, if any. E.g.,
“www.gitam.edu”)
■ path - hierarchical path
■ query - query component (after “;” or “?”after path)
● To parse the query string into components, use urllib.parse.parse_qs()
■ fragment - fragment identifier (refers to a section within a web page. For HTML page,
the tag “anchor” is searched for. For more on fragments, please refer to this)
urllib.parse.urlunparse(): constructs a string from a tuple returned by urlparse()
Shanti Chilukuri, GITAM
smtplib Module
● smtplib Module [5]: defines SMTP client session object that can be used to send
email
● class smtplib.SMTP(host='', port=0, local_hostname=None, [timeout,
]source_address=None). Important methods
○
○
○
SMTP.connect(host='localhost', port=0)
SMTP.sendmail(from_addr, to_addrs, msg, mail_options=(), rcpt_options=())
SMTP.quit()
Shanti Chilukuri, GITAM
Zlib, bz2 and gzip Modules
● zlib allows compression and decompression.
○
○
●
bz2 allows compression into a bz2 file.
○
●
zlib.compress(data, /, level=-1)
zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
Just open the file and write using bz2.open(filename,”wb”)
gzip allows compression into a gzip file.
○
Just open the file and write using gzip.open(filename,”wb”)
Shanti Chilukuri, GITAM
Zipfile Module
● Provides methods to create, read, write, append, and list a ZIP file
● class zipfile.ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True,
compresslevel=None, *, strict_timestamps=True)
● Important methods :
○
○
○
○
○
○
ZipFile.open(name, mode='r', pwd=None, *, force_zip64=False): Access a member of
the archive. Mode must be ‘r’ or ‘w’.
ZipFile.write(filename, archname=None, compress_type=None,
compresslevel=None): Write the file named filename to the archive, and name
the archive “archname”.
ZipFile.read(name, pwd=None): returns the number of bytes in file name
ZipFile.extract(member, path=None, pwd=None) : Extracts member from the
archive to the current directory
ZipFile.extractall(path=None, members=None, pwd=None) : Extracts all files form
the archive to the current directory.
ZipFile.setpassword(pwd): Sets a password for the archive.
Shanti Chilukuri, GITAM
References
1) Python Documentation Argparse Tutorial
2) Regular Expressions HOW TO, Python Documentation Regular Expression
operations
6) Python Documentation zipfile
Shanti Chilukuri, GITAM
Download