Module IV Python Programming: Classes, inheritance, generators, standard library (part I), command line arguments, string pattern matching, internet access, data compression. Classes concept Python is an object-oriented programming language. Object-oriented programming (OOP) focuses on creating reusable patterns of code, in contrast to procedural programming, which focuses on explicit sequenced instructions. When working on complex programs in particular, object-oriented programming lets you reuse code and write code that is more readable, which in turn makes it more maintainable. One of the most important concepts in object-oriented programming is the distinction between classes and objects, which are defined as follows: Class — A blueprint created by a programmer for an object. This defines a set of attributes that will characterize any object that is instantiated from this class. Object — An instance of a class. This is the realized version of the class, where the class is manifested in the program. These are used to create patterns (in the case of classes) and then make use of the patterns (in the case of objects). In this tutorial, we’ll go through creating classes, instantiating objects, initializing attributes with the constructor method, and working with more than one object of the same class. Classes Classes are like a blueprint or a prototype that you can define to use to create objects. We define classes by using the class keyword, similar to how we define functions by using the def keyword. Methods are a special kind of function that are defined within a class. The argument to these functions is the word self, which is a reference to objects that are made based on this class. To reference instances (or objects) of the class, self will always be the first parameter, but it need not be the only one. Objects An object is an instance of a class. Object is active entity and class is logical entity which means that object has state and it is changing dynamically so that object is active entity .class has no state , it is a logical entity. All data members and methods are public by default in python. The self Parameter The self parameter is a reference to the class itself, and is used to access variables that belongs to the class. It does not have to be named self , you can call it whatever you like, but it has to be the first parameter of any function in the class: SAMPLE PROGRAM: 1 Class Cse: Def __init__(self): Self.name=None //name is public data member Self. Strength=None //strength is public data member Def set(self,n,s): Self.name=n Self.strength=s Def get(self): Print(self.name,” \t”,self.strength) //Creation of Object and calling Cs=Cse() Cs.set(‘4B8’,67) Cs.get() Cs.name=’4B4’ //accessing public data members Cs.get() Note: Declaration of a data member name as public, protected, private Self.name--public data member Self._name--protected data member Self._ _name--private data member Accessing data members __init__() is a constructor which is invoked whenever object is created which initializes all data members of the class. Sample Program2: class Person: def __init__(self, name, age): self.name = name self.age = age def myfunc(self): print("Hello my name is " + self.name) p1 = Person("John", 36) p1.myfunc() Inheritance Inheritance is a feature of object-oriented programming. It specifies that one object acquires all the properties and behaviors of parent object. By using inheritance you can define a new class with a little or no changes to the existing class. The new class is known as derived class or child class and from which it inherits the properties is called base class or parent class. It provides re-usability of the code. Python Inheritance Terminologies 1. Superclass: The class from which attributes and methods will be inherited. 2. Subclass: The class which inherits the members from superclass. 3. Method Overloading: Redefining the definitions of methods in subclass which was already defined in superclass. Inheritance example Full example of Python inheritance: class User: name = "" def __init__(self, name): self.name = name def printName(self): print "Name = " + self.name class Programmer(User): def __init__(self, name): self.name = name def doPython(self): print "Programming Python" brian = User("brian") brian.printName() diana = Programmer("Diana") diana.printName() diana.doPython() The output: Name = brian Name = Diana Programming Python Brian is an instance of User and can only access the method printName. Diana is an instance of Programmer, a class with inheritance from User, and can access both the methods in Programmer and User. Multiple Inheritance In Python a class can inherit from more than one class. The resulting class will have all the methods and attributes from the parent classes. In essence, it’s called multiple inheritance because a class can inherit from multiple classes. In the example below class C inherits from both class A and class B. If an object is created with class C, it has the methods of class A,B and C. Keep in mind that if you create an object from class A or class B, they will only have the methods and attributes of those classes. class A: def A(self): print('A') class B(A): def B(self): print('B') class C(A,B): def C(self): print('C') obj = C() obj.A() obj.B() obj.C() Genarators in Python: There is a lot of overhead in building an iterator in Python; we have to implement a class with __iter__() and __next__() method, keep track of internal states, raise StopIterationwhen there was no values to be returned etc. This is both lengthy and counter intuitive. Generator comes into rescue in such situations. Python generators are a simple way of creating iterators. All the overhead we mentioned above are automatically handled by generators in Python. Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time). Creating A Genarators in Python: It is fairly simple to create a generator in Python. It is as easy as defining a normal function with yield statement instead of a return statement. If a function contains at least one yield statement (it may contain other yield or returnstatements), it becomes a generator function. Both yield and return will return some value from a function. The difference is that, while a return statement terminates a function entirely, yieldstatement pauses the function saving all its states and later continues from there on successive calls. # A simple generator function def my_gen(): n=1 print('This is printed first') # Generator function contains yield statements yield n n += 1 print('This is printed second') yield n n += 1 print('This is printed at last') yield n Python genarators with loop : def rev_str(my_str): length = len(my_str) for i in range(length - 1,-1,-1): yield my_str[i] # For loop to reverse the string # Output: #o #l #l #e #h for char in rev_str("hello"): print(char) Python Generator Expression Simple generators can be easily created on the fly using generator expressions. It makes building my_list = [1, 3, 6, 10] # square each term using list comprehension # Output: [1, 9, 36, 100] [x**2 for x in my_list] # same thing can be done using generator expression # Output: <generator object <genexpr> at 0x0000000002EBDAF8> (x**2 for x in my_list) #nerators easy. Why generators are used in Python? 1.Easy to Implement Generators can be implemented in a clear and concise way as compared to their iterator class counterpart 2.Memory Efficient A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill if the number of items in the sequence is very large. Generator implementation of such sequence is memory friendly and is preferred since it only produces one item at a time. 3. Represent Infinite Stream Generators are excellent medium to represent an infinite stream of data. Infinite streams cannot be stored in memory and since generators produce only one item at a time, it can represent infinite stream of data. 4. Pipelining Generators Generators can be used to pipeline a series of operations. Internet connection in python: There are a lot of python modules and packages available on the internet that allows developers to perform different things that are related to internet. But we see only urllib and SMTP as the import modules for basic internet usage. Urlliburllib is a Python module that can be used for opening URLs. It defines functions and classes to help in URL actions. With Python you can also access and retrieve data from the internet like XML, HTML, JSON, etc. You can also use Python to work with this data directly. We also have urlib2, which is also a python module similar to urllib. The urlopen method is used for opening the given link, the response it gives is an object that works as a context manager. The urlopen will sent the data as an object and it has functions like getcode(). We see the example below the urlopen is used for opening the youtube link. The getcode() will give a standard response codes, ‘200’ if successfully processed or any other codes related to the situation. Example- SMTPThe smtplib module defines an SMTP client session object that can be used to send mail to any Internet machine with an SMTP or ESMTP listener daemon. SMTP stands for Simple Mail Transfer Protocol. The smtplib modules is useful for communicating with mail servers to send mail. Sending mail is done with Python's smtplib using an SMTP server. We learn from the below example how the smtp module works. The first thing we have to create an object of smtp for the smtp server that is going to be used. This is done by the command smtplib.SMTP(‘server address’, port),line 2 in the program. Next we follow it by logging in to the senders account on the server. This is done by the command object.login(‘mailaddress’,’password’). The only thing left now is to send the mail. This done by the command object. sendmail(‘senders mail’, ‘receivers mail’, message) Exmaple- Data Compression: The zlib compression format is free to use, and is not covered by any patent, so you can safely use it in commercial products as well. It is a lossless compression format (which means you don't lose any data between compression and decompression), and has the advantage of being portable across different platforms. Another important benefit of this compression mechanism is that it doesn't expand the data. The main use of the zlib library is in applications that require compression and decompression of arbitrary data, whether it be a string, structured in-memory content, or files. The most important functionalities included in this library are compression and decompression. Compression and decompression can both be done as a one-off operations, or by splitting the data into chunks like you'd seem from a stream of data. Both modes of operation are explained in this article. One of the best things, in my opinion, about the zlib library is that it is compatible with the gzip file format/tool (which is also based on DEFLATE), which is one of the most widely used compression applications on Unix systems. Compression Compressing a String of Data The zlib library provides us with the compress function, which can be used to compress a string of data. The syntax of this function is very simple, taking only two arguments: compress(data, level=-1) Here the argument data contains the bytes to be compressed, and level is an integer value that can take the values -1 or 0 to 9. This parameter determines the level of compression, where level 1 is the fastest and yields the lowest level of compression. Level 9 is the slowest, yet it yields the highest level of compression. The value -1 represents the default, which is level 6. The default value has a balance between speed and compression. Level 0 yields no compression. An example of using the compress method on a simple string is shown below: import zlib import binascii data = 'Hello world' compressed_data = zlib.compress(data, 2) print('Original data: ' + data) print('Compressed data: ' + binascii.hexlify(compressed_data)) And the result is as follows: $ python compress_str.py Original data: Hello world Compressed data: 785ef348cdc9c95728cf2fca49010018ab043d Figure 1 If we change the level to 0 (no compression), then line 5 becomes: compressed_data = zlib.compress(data, 0) And the new result is: $ python compress_str.py Original data: Hello world Compressed data: 7801010b00f4ff48656c6c6f20776f726c6418ab043d Figure 2 You may notice a few differences comparing the outputs when using 0 or 2 for the compression level. Using a level of 2 we get a string (formatted in hexadecimal) of length 38, whereas with a level of 0 we get a hex string with length 44. This difference in length is due to the lack of compression in using level 0. If you don't format the string as hexadecimal, as I've done in this example, and view the output data you'll probably notice that the input string is still readable even after being "compressed", although it has a few extra formatting characters around it. Compressing Large Data Streams Large data streams can be managed with the compressobj() function, which returns a compression object. The syntax is as follows: compressobj(level=-1, method=DEFLATED, strategy=Z_DEFAULT_STRATEGY[, zdict]) wbits=15, memLevel=8, The main difference between the arguments of this function and the compress() function is (aside from the data parameter) the wbits argument, which controls the window size, and whether or not the header and trailer are included in the output. The possible values for wbits are: Value Window size logarithm Output +9 to +15 Base 2 Includes zlib header and trailer -9 to -15 Absolute value of wbits No header and trailer +25 to +31 Low 4 bits of the value Includes gzip header and trailing checksum Table 1 The method argument represents the compression algorithm used. Currently the only possible value is DEFLATED, which is the only method defined in the RFC 1950. The strategy argument relates to compression tuning. Unless you really know what you're doing I'd recommend to not use it and just use the default value. The following code shows how to use the compressobj() function: import zlib import binascii data = 'Hello world' compress = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) compressed_data = compress.compress(data) compressed_data += compress.flush() print('Original: ' + data) print('Compressed data: ' + binascii.hexlify(compressed_data)) After running this code, the result is: $ python compress_obj.py Original: Hello world Compressed data: f348cdc9c95728cf2fca490100 Figure 3 As we can see from the figure above, the phrase "Hello world" has been compressed. Typically this method is used for compressing data streams that won't fit into memory at once. Although this example does not have a very large stream of data, it serves the purpose of showing the mechanics of the compressobj() function. You may also be able to see how it would be useful in a larger application in which you can configure the compression and then pass around the compression object to other methods/modules. This can then be used to compress chunks of data in series. You may also be able to see how it would be useful in a scenario where you have a data stream to compress. Instead of having to accumulate all of the data in memory, you can just call compress.compress(data) and compress.flush() on your data chunk and then move on to the next chunk while leaving the previous one to be cleaned up by garbage collection. Compressing a File We can also use the compress() function to compress the data in a file. The syntax is the same as in the first example. In the example below we will compress a PNG image file named "logo.png" (which, I should note, is already a compressed version of the original raw image). The example code is as follows: import zlib original_data = open('logo.png', 'rb').read() compressed_data = zlib.compress(original_data, zlib.Z_BEST_COMPRESSION) compress_ratio = float(len(original_data)) (float(len(original_data)) - float(len(compressed_data))) / print('Compressed: %d%%' % (100.0 * compress_ratio)) In the above code, the zlib.compress(...) line uses the constant Z_BEST_COMPRESSION, which, as the name suggests, gives us the best compression level this algorithm has to offer. The next line then calculates the level of compression based on the ratio of length of compressed data over length of original data. The result is as follows: $ python compress_file.py Compressed: 13% Figure 4 As we can see, the file was compressed by 13%. The only difference between this example and our first one is the source of the data. However, I think it is important to show so you can get an idea of what kind of data can be compressed, whether it be just an ASCII string or binary image data. Simply read in your data from the file like you normally would and call the compress method. Saving Compressed Data to a File The compressed data can also be saved to a file for later use. The example below shows how to save some compressed text into a file: import zlib my_data = 'Hello world' compressed_data = zlib.compress(my_data, 2) f = open('outfile.txt', 'w') f.write(compressed_data) f.close() The above example compresses our simple "Hello world" string and saves the compressed data into a file named "outfile.txt". The "outfile.txt" file, when opened with our text editor, looks as follows: Figure 5 Decompression Decompressing a String of Data A compressed string of data can be easily decompressed by using the decompress() function. The syntax is as follows: decompress(data, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE) This function decompresses the bytes in the data argument. The wbits argument can be used to manage the size of the history buffer. The default value matches the largest window size. It also asks for the inclusion of the header and trailer of the compressed file. The possible values are: Value Window size logarithm Input +8 to +15 Base 2 Includes zlib header and trailer -8 to -15 Absolute value of wbits Raw stream with no header and trailer +24 to +31 = 16 + (8 to 15) Low 4 bits of the value Includes gzip header and trailer +40 to +47 = 32 + (8 to 15) Low 4 bits of the value zlib or gzip format Table 2 The initial value of the buffer size is indicated in the bufsize argument. However, the important aspect about this parameter is that it doesn't need to be exact, because if extra buffer size is needed, it will automatically be increased. The following example shows how to decompress the string of data compressed in our previous example: import zlib data = 'Hello world' compressed_data = zlib.compress(data, 2) decompressed_data = zlib.decompress(compressed_data) print('Decompressed data: ' + decompressed_data) The result is as follows: $ python decompress_str.py Decompressed data: Hello world Figure 5 Decompressing Large Data Streams Decompressing big data streams may require memory management due to the size or source of your data. It's possible that you may not be able to use all of the available memory for this task (or you don't have enough memory), so the decompressobj() method allows you to divide up a stream of data in to several chunks which you can decompress separately. The syntax of the decompressobj() function is as follows: decompressobj(wbits=15[, zdict]) This function returns a decompression object, which what you use to decompress the individual data. The wbits argument has the same characteristics as in decompress() function previously explained. The following code shows how to decompress a big stream of data that is stored in a file. Firstly, the program creates a file named "outfile.txt", which contains the compressed data. Note that the data is compressed using a value of wbits equal to +15. This ensures the creation of a header and a trailer in the data. The file is then decompressed using chunks of data. Again, in this example the file doesn't contain a massive amount of data, but nevertheless, it serves the purpose of explaining the buffer concept. The code is as follows: import zlib data = 'Hello world' compress = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, +15) compressed_data = compress.compress(data) compressed_data += compress.flush() print('Original: ' + data) print('Compressed data: ' + compressed_data) f = open('compressed.dat', 'w') f.write(compressed_data) f.close() CHUNKSIZE = 1024 data2 = zlib.decompressobj() my_file = open('compressed.dat', 'rb') buf = my_file.read(CHUNKSIZE) # Decompress stream chunks while buf: decompressed_data = data2.decompress(buf) buf = my_file.read(CHUNKSIZE) decompressed_data += data2.flush() print('Decompressed data: ' + decompressed_data) my_file.close() After running the above code, we obtain the following results: $ python decompress_data.py Original: Hello world Compressed data: x??H???W(?/?I?= Decompressed data: Hello world Figure 6 Decompressing Data from a File The compressed data contained in a file can be easily decompressed, as you've seen in previous examples. This example is very similar to the previous one in that we're decompressing data that originates from a file, except that in this case we're going back to using the oneoff decompressmethod, which decompresses the data in a single method call. This is useful for when your data is small enough to easily fit in memory. This can be seen from the following example: import zlib compressed_data = open('compressed.dat', 'rb').read() decompressed_data = zlib.decompress(compressed_data) print(decompressed_data) The above program opens the file "compressed.dat" created in a previous example, which contains the compressed "Hello world" string. In this example, once the compressed data is retrieved and stored in the variable compressed_data, the program decompresses the stream and shows the result on the screen. As the file contains a small amount of data, the example uses the decompress() function. However, as the previous example shows, we could also decompress the data using the decompressobj() function. After running the program we get the following result: $ python decompress_file.py Hello world Figure 7 String Pattern Matching: Regular expressions. These are tiny programs that process text. We access regular expressions through the re library. We call methods like re.match(). With methods, such as match() and search(), we run these little programs. More advanced methods like groupdict can process groups. Findall handles multiple matches. It returns a list. Regular Expression Patterns Except for control characters, (+ ? . * ^ $ ( ) [ ] { } | \), all characters match themselves. You can escape a control character by preceding it with a backslash. Following table lists the regular expression syntax that is available in Python − Sr.No. 1 Pattern & Description ^ Matches beginning of line. 2 $ Matches end of line. 3 . Matches any single character except newline. Using m option allows it to match newline as well. 4 [...] Matches any single character in brackets. 5 [^...] Matches any single character not in brackets 6 re* Matches 0 or more occurrences of preceding expression. 7 re+ Matches 1 or more occurrence of preceding expression. 8 re? Matches 0 or 1 occurrence of preceding expression. 9 re{ n} Matches exactly n number of occurrences of preceding expression. 10 re{ n,} Matches n or more occurrences of preceding expression. 11 re{ n, m} Matches at least n and at most m occurrences of preceding expression. 12 a| b Matches either a or b. 13 (re) Groups regular expressions and remembers matched text. Character classes Sr.No. 1 Example & Description [Pp]ython Match "Python" or "python" 2 rub[ye] Match "ruby" or "rube" 3 [aeiou] Match any one lowercase vowel 4 [0-9] Match any digit; same as [0123456789] 5 [a-z] Match any lowercase ASCII letter 6 [A-Z] Match any uppercase ASCII letter 7 [a-zA-Z0-9] Match any of the above 8 [^aeiou] Match anything other than a lowercase vowel 9 [^0-9] Match anything other than a digit Special Character Classes Sr.No. 1 Example & Description . Match any character except newline 2 \d Match a digit: [0-9] 3 \D Match a nondigit: [^0-9] 4 \s Match a whitespace character: [ \t\r\n\f] 5 \S Match nonwhitespace: [^ \t\r\n\f] 6 \w Match a single word character: [A-Za-z0-9_] 7 \W Match a nonword character: [^A-Za-z0-9_] Repetition Cases Sr.No. 1 Example & Description ruby? Match "rub" or "ruby": the y is optional 2 ruby* Match "rub" plus 0 or more ys 3 ruby+ Match "rub" plus 1 or more ys 4 \d{3} Match exactly 3 digits 5 \d{3,} Match 3 or more digits 6 \d{3,5} Match 3, 4, or 5 digits What is a Regular Expression? It's a string pattern written in a compact syntax, that allows us to quickly check whether a given string matches or contains a given pattern. The power of regular expressions is that they can specify patterns, not just fixed characters. Basic patterns a, X, 9 ordinary characters just match themselves exactly. .^$*+?{[] |() meta-characters with special meanings (see below) . (a period) matches any single character except newline 'n' w matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. It only matches a single character not a whole word. W matches any non-word character. w+ matches one or more words / characters b boundary between word and non-word s matches a single whitespace character, space, newline, return, tab, form S matches any non-whitespace character. t, n, r tab, newline, return D matches anything but a digit d matches a decimal digit [0-9] d{1,5} matches a digit between 1 and 5 in lengths. {n} d{5} matches for 5 digits in a row ^match the start of the string $match the of the string end * matches 0 or more repetitions ? matches 0 or 1 characters of whatever precedes it use . to match a period or to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, @, to make sure it is treated just as a character. re.findall The findall() is probably the single most powerful function in the re module and we will use that function in this script. In the example below we create a string that have a text with many email addresses. We then create a variable (emails) that will contain a list of all the found email strings. Lastly, we use a for loop that we can do something with for each email string that is found. str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher' ## Here re.findall() returns a list of all the found email strings emails = re.findall(r'[w.-]+@[w.-]+', str) ## ['alice@google.com', 'bob@abc.com'] for email in emails: # do something with each found email string print email We can also apply this for files. If you have a file and want to iterate over the lines of the file, just feed it into findall() and let it return a list of all the matches in a single step read() returns the whole text of a file in a single string. (If you want to read more about file handling in Python, we have written a 'Cheat Sheet' that you can find here) # Open file f = open('test.txt', 'r') # Feed the file text into findall(); it returns a list of all the found strings strings = re.findall(r'some pattern', f.read()) re.search The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. The syntax is re.search(pattern, string). where: pattern regular expression to be matched. string the string which would be searched to match the pattern anywhere in the string. It searches for first occurrence of RE pattern within string with optional flags. If the search is successful, search() returns a match object or None otherwise. Therefore, the search is usually immediately followed by an if-statement to test if the search succeeded. It is common to use the 'r' at the start of the pattern string, that designates a python "raw" string which passes through backslashes without change which is very handy for regular expressions. This example searches for the pattern 'word:' followed by a 3 letter word. The code match = re.search(pat, str) stores the search result in a variable named "match". Then the if-statement tests the match, if true the search succeeded and match.group() is the matching text (e.g. 'word:cat'). If the match is false, the search did not succeed, and there is no matching text. str = 'an example word:cat!!' match = re.search(r'word:www', str) # If-statement after search() tests if it succeeded if match: print 'found', match.group() ## 'found word:cat' else: print 'did not find' As you can see in the example below, I have used the | operator, which search for either pattern I specify. import re programming = ["Python", "Perl", "PHP", "C++"] pat = "^B|^P|i$|H$" for lang in programming: if re.search(pat,lang,re.IGNORECASE): print lang , "FOUND" else: print lang, "NOT FOUND" The output of above script will be: Python FOUND Perl FOUND PHP FOUND C++ NOT FOUND re.sub The re.sub() function in the re module can be used to replace substrings. The syntax for re.sub() is re.sub(pattern,repl,string). That will replace the matches in string with repl. In this example, I will replace all occurrences of the re pattern ("cool") in string (text) with repl ("good"). import re text = "Python for beginner is a very cool website" pattern = re.sub("cool", "good", text) print text2 Here is another example (taken from Googles Python class ) which searches for all the email addresses, and changes them to keep the user (1) but have yo-yo-dyne.com as the host. str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher' ## re.sub(pat, replacement, str) -- returns new string with all replacements, ## 1 is group(1), 2 group(2) in the replacement print re.sub(r'([w.-]+)@([w.-]+)', r'1@yo-yo-dyne.com', str) ## purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher re.compile With the re.compile() function we can compile pattern into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. Let's see two examples, using the re.compile() function. The first example checks if the input from the user contains only letters, spaces or . (no digits) Any other character is not allowed. import re name_check = re.compile(r"[^A-Za-zs.]") name = raw_input ("Please, enter your name: ") while name_check.search(name): print "Please enter your name correctly!" name = raw_input ("Please, enter your name: ") The second example checks if the input from the user contains only numbers, parentheses, spaces or hyphen (no letters) Any other character is not allowed import re phone_check = re.compile(r"[^0-9s-()]") phone = raw_input ("Please, enter your phone: ") while phone_check.search(phone): print "Please enter your phone correctly!" phone = raw_input ("Please, enter your phone: ") The output of above script will be: Please, enter your phone: s Please enter your phone correctly! It will continue to ask until you put in numbers only. Find Email Domain in Address Let's end this article about regular expressions in Python with a neat script I found on stackoverflow. @ scan till you see this character [w.] a set of characters to potentially match, so w is all alphanumeric characters, and the trailing period . adds to that set of characters. + one or more of the previous set. Because this regex is matching the period character and every alphanumeric after an @, it'll match email domains even in the middle of sentences. import re s = 'My name is Conrad, and blahblah@gmail.com is my email.' domain = re.search("@[w.]+", s) print domain.group() outputs: @gmail.com