A Python Book
A Python Book
Author: Dave Kuhlman Address: dkuhlman@rexx.com http://www.rexx.com/~dkuhlman Revision: 1.1a Date: April 22, 2012 Copyright: Copyright (c) 2009 Dave Kuhlman. All Rights Reserved. This document is subject to the provisions of the Open Source MIT License http://www.opensource.org/licenses/mit-license.php. Abstract: This document is a self-learning document for a course in Python programming. This course contains (1) a part for beginners, (2) a discussion of several advanced topics that are of interest to Python programmers, and (3) a Python workbook with lots of exercises. Contents 1 Part 1 -- Beginning Python 1.1 Introduction -- Python 101 -- Beginning Python 1.1.1 Important Features of Python 1.1.2 Where to Go For Additional help 1.2 Interactive Python 1.3 Lexical matters 1.3.1 Lines 1.3.2 Names and Tokens 1.3.3 Blocks and Indentation 1.3.4 Doc Strings 1.3.5 Operators 1.3.6 Also See 1.3.7 Code Evaluation 1.4 Built-in Data Types 1.4.1 Strings 1.4.1.1 What strings are 1.4.1.2 When to use strings 1.4.1.3 How to use strings 1.4.2 Sequences -- Lists and Tuples 1.4.2.1 What sequences are 1.4.2.2 When to use sequences 1.4.2.3 How to use sequences 1.4.3 Dictionaries 1.4.3.1 What dictionaries are 1.4.3.2 When to use dictionaries 1.4.3.3 How to use dictionaries 1.4.4 Files 1.4.4.1 What files are 1.4.4.2 When to use files 1.4.4.3 How to use files 1.4.4.4 Reading Text Files 1.5 Simple Statements 1.5.1 print statement 1.5.2 Assignment statement 1.5.3 import statement 1.5.4 assert statement 1.5.5 global statement 1.6 Compound statments -- Control Structures 1.6.1 if: statement 1.6.2 for: statement 1.6.2.1 The for: statement and unpacking 1.6.3 while: statement 1.6.4 try:except: and raise -- Exceptions 1.7 Organization 1.7.1 Functions 1.7.1.1 A basic function 1.7.1.2 A function with default arguments 1.7.1.3 Argument lists and keyword argument lists 1.7.1.4 Calling a function with keyword arguments 1.7.2 Classes and instances 1.7.2.1 A basic class 1.7.2.2 Inheritance 1.7.2.3 Class data 1.7.2.4 Static methods and class methods 1.7.2.5 Properties 1.7.3 Modules 1.7.4 Packages
converted by Web2PDFConvert.com
1.8 Acknowledgements and Thanks 1.9 See Also 2 Part 2 -- Advanced Python 2.1 Introduction -- Python 201 -- (Slightly) Advanced Python Topics 2.2 Regular Expressions 2.2.1 Defining regular expressions 2.2.2 Compiling regular expressions 2.2.3 Using regular expressions 2.2.4 Using match objects to extract a value 2.2.5 Extracting multiple items 2.2.6 Replacing multiple items 2.3 Iterator Objects 2.3.1 Example - A generator function 2.3.2 Example - A class containing a generator method 2.3.3 Example - An iterator class 2.3.4 Example - An iterator class that uses yield 2.3.5 Example - A list comprehension 2.3.6 Example - A generator expression 2.4 Unit Tests 2.4.1 Defining unit tests 2.4.1.1 Create a test class. 2.5 Extending and embedding Python 2.5.1 Introduction and concepts 2.5.2 Extension modules 2.5.3 SWIG 2.5.4 Pyrex 2.5.5 SWIG vs. Pyrex 2.5.6 Cython 2.5.7 Extension types 2.5.8 Extension classes 2.6 Parsing 2.6.1 Special purpose parsers 2.6.2 Writing a recursive descent parser by hand 2.6.3 Creating a lexer/tokenizer with Plex 2.6.4 A survey of existing tools 2.6.5 Creating a parser with PLY 2.6.6 Creating a parser with pyparsing 2.6.6.1 Parsing comma-delimited lines 2.6.6.2 Parsing functors 2.6.6.3 Parsing names, phone numbers, etc. 2.6.6.4 A more complex example 2.7 GUI Applications 2.7.1 Introduction 2.7.2 PyGtk 2.7.2.1 A simple message dialog box 2.7.2.2 A simple text input dialog box 2.7.2.3 A file selection dialog box 2.7.3 EasyGUI 2.7.3.1 A simple EasyGUI example 2.7.3.2 An EasyGUI file open dialog example 2.8 Guidance on Packages and Modules 2.8.1 Introduction 2.8.2 Implementing Packages 2.8.3 Using Packages 2.8.4 Distributing and Installing Packages 2.9 End Matter 2.9.1 Acknowledgements and Thanks 2.9.2 See Also 3 Part 3 -- Python Workbook 3.1 Introduction 3.2 Lexical Structures 3.2.1 Variables and names 3.2.2 Line structure 3.2.3 Indentation and program structure 3.3 Execution Model 3.4 Built-in Data Types 3.4.1 Numbers 3.4.1.1 Literal representations of numbers 3.4.1.2 Operators for numbers 3.4.1.3 Methods on numbers 3.4.2 Lists 3.4.2.1 Literal representation of lists 3.4.2.2 Operators on lists 3.4.2.3 Methods on lists 3.4.2.4 List comprehensions 3.4.3 Strings 3.4.3.1 Characters 3.4.3.2 Operators on strings 3.4.3.3 Methods on strings 3.4.3.4 Raw strings
converted by Web2PDFConvert.com
3.4.3.5 Unicode strings 3.4.4 Dictionaries 3.4.4.1 Literal representation of dictionaries 3.4.4.2 Operators on dictionaries 3.4.4.3 Methods on dictionaries 3.4.5 Files 3.4.6 A few miscellaneous data types 3.4.6.1 None 3.4.6.2 The booleans True and False 3.5 Statements 3.5.1 Assignment statement 3.5.2 print statement 3.5.3 if: statement exercises 3.5.4 for: statement exercises 3.5.5 while: statement exercises 3.5.6 break and continue statements 3.5.7 Exceptions and the try:except: and raise statements 3.6 Functions 3.6.1 Optional arguments and default values 3.6.2 Passing functions as arguments 3.6.3 Extra args and keyword args 3.6.3.1 Order of arguments (positional, extra, and keyword args) 3.6.4 Functions and duck-typing and polymorphism 3.6.5 Recursive functions 3.6.6 Generators and iterators 3.7 Object-oriented programming and classes 3.7.1 The constructor 3.7.2 Inheritance -- Implementing a subclass 3.7.3 Classes and polymorphism 3.7.4 Recursive calls to methods 3.7.5 Class variables, class methods, and static methods 3.7.5.1 Decorators for classmethod and staticmethod 3.8 Additional and Advanced Topics 3.8.1 Decorators and how to implement them 3.8.1.1 Decorators with arguments 3.8.1.2 Stacked decorators 3.8.1.3 More help with decorators 3.8.2 Iterables 3.8.2.1 A few preliminaries on Iterables 3.8.2.2 More help with iterables 3.9 Applications and Recipes 3.9.1 XML -- SAX, minidom, ElementTree, Lxml 3.9.2 Relational database access 3.9.3 CSV -- comma separated value files 3.9.4 YAML and PyYAML 3.9.5 Json 4 Part 4 -- Generating Python Bindings for XML 4.1 Introduction 4.2 Generating the code 4.3 Using the generated code to parse and export an XML document 4.4 Some command line options you might want to know 4.5 The graphical front-end 4.6 Adding application-specific behavior 4.6.1 Implementing custom subclasses 4.6.2 Using the generated "API" from your application 4.6.3 A combined approach 4.7 Special situations and uses 4.7.1 Generic, type-independent processing 4.7.1.1 Step 1 -- generate the bindings 4.7.1.2 Step 2 -- add application-specific code 4.7.1.3 Step 3 -- write a test/driver harness 4.7.1.4 Step 4 -- run the test application 4.8 Some hints 4.8.1 Children defined with maxOccurs greater than 1 4.8.2 Children defined with simple numeric types 4.8.3 The type of an element's character content 4.8.4 Constructors and their default values [table of contents] Preface This book is a collection of materials that I've used when conducting Python training and also materials from my Web site that are intended for self-instruction. You may prefer a machine readable copy of this book. You can find it in various formats here: HTML -- http://www.rexx.com/~dkuhlman/python_book_01.html PDF -- http://www.rexx.com/~dkuhlman/python_book_01.pdf
converted by Web2PDFConvert.com
ODF/OpenOffice -- http://www.rexx.com/~dkuhlman/python_book_01.odt And, let me thank the students in my Python classes. Their questions and suggestions were a great help in the preparation of these materials.
converted by Web2PDFConvert.com
$ python Python 2.6.1 (r261:67515, Jan 11 2009, 15:19:23) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print 'hello' hello >>> You may also want to consider using IDLE. IDLE is a graphical integrated development environment for Python; it contains a Python shell. It is likely that Idle was installed for you when you installed Python. You will find a script to start up IDLE in the Tools/scripts directory of your Python distribution. IDLE requires Tkinter. In addition, there are tools that will give you a more powerful and fancy Python interactive interpreter. One example is IPython, which is available at http://ipython.scipy.org/.
converted by Web2PDFConvert.com
Sphinx -- Sphinx is a powerful tool for generating Python documentation. See: Sphinx -- Python Documentation Generator --
http://sphinx.pocoo.org/.
1.3.5 Operators
See: http://docs.python.org/ref/operators.html. Python defines the following operators: + << < >> > * & <= ** | >= / ^ == // ~ != % <>
The comparison operators <> and != are alternate spellings of the same operator. != is the preferred spelling; <> is obsolescent. Logical operators: and or is not in
There are also (1) the dot operator, (2) the subscript operator [], and the function/method call operator (). For information on the precedences of operators, see Summary of operators -- http://docs.python.org/ref/summary.html, which is reproduced below. The following table summarizes the operator precedences in Python, from lowest precedence (least binding) to highest precedence (most binding). Operators on the same line have the same precedence. Unless the syntax is explicitly given, operators are binary. Operators on the same line group left to right (except for comparisons, including tests, which all have the same precedence and chain from left to right -- see section 5.9 -- and exponentiation, which groups from right to left): Operator ======================== lambda or and not x in, not in is, is not <, <=, >, >=, <>, !=, == | ^ & <<, >> +, *, /, % +x, -x ~x ** x.attribute x[index] x[index:index] f(arguments...) (expressions...) [expressions...] {key:datum...} `expressions...` Description ================== Lambda expression Boolean OR Boolean AND Boolean NOT Membership tests Identity tests Comparisons Bitwise OR Bitwise XOR Bitwise AND Shifts Addition and subtraction Multiplication, division, remainder Positive, negative Bitwise not Exponentiation Attribute reference Subscription Slicing Function call Binding or tuple display List display Dictionary display String conversion
Note that most operators result in calls to methods with special names, for example __add__, __sub__, __mul__, etc. See Special method names http://docs.python.org/ref/specialnames.html Later, we will see how these operators can be emulated in classes that you define yourself, through the use of these special names.
definition, (3) class definition, (4) function and method call, (5) importing a module, ... First class objects -- Almost all objects in Python are first class. Definition: An object is first class if: (1) we can put it in a structured object; (2) we can pass it to a function; and (3) we can return it from a function. References -- Objects (or references to them) can be shared. What does this mean? The object(s) satisfy the identity test operator is, that is, obj1 is obj2 returns True. The built-in function id(obj) returns the same value, that is, id(obj1) == id(obj2) is True. The consequences for mutable objects are different from those for immutable objects. Changing (updating) a mutable object referenced through one variable or container also changes that object referenced through other variables or containers, because it is the same object. del() -- The built-in function del() removes a reference, not (necessarily) the object itself.
Type "help(str)" or see http://www.python.org/doc/current/lib/string-methods.html for more information on string methods. You can also use the equivalent functions from the string module. For example: >>> import string >>> s1 = 'The happy cat ran home.' >>> string.find(s1, 'happy') 4 See string - Common string operations -- http://www.python.org/doc/current/lib/module-string.html for more information on the string module. There is also a string formatting operator: "%". For example: >>> >>> 'It >>> >>> >>> >>> >>> The state = 'California' 'It never rains in sunny %s.' % state never rains in sunny California.' width = 24 height = 32 depth = 8 print 'The box is %d by %d by %d.' % (width, height, depth, ) box is 24 by 32 by 8.
Things to know:
converted by Web2PDFConvert.com
Format specifiers consist of a percent sign followed by flags, length, and a type character. The number of format specifiers in the target string (to the left of the "%" operator) must be the same as the number of values on the right. When there are more than one value (on the right), they must be provided in a tuple. You can learn about the various conversion characters and flags used to control string formatting here: String Formatting Operations -http://docs.python.org/library/stdtypes.html#string-formatting-operations. You can also write strings to a file and read them from a file. Here are some examples. Writing - For example: >>> >>> >>> >>> >>> Notes: Note the end-of-line character at the end of each string. The open() built-in function creates a file object. It takes as arguments (1) the file name and (2) a mode. Commonly used modes are "r" (read), "w" (write), and "a" (append). See Built-in Functions: open() -- http://docs.python.org/library/functions.html#open for more information on opening files. See Built-in Types: File Objects -- http://docs.python.org/library/stdtypes.html#file-objects for more information on how to use file objects. Reading an entire file -- example: >>> infile = file('tmp.txt', 'r') >>> content = infile.read() >>> print content This is line #1 This is line #2 This is line #3 >>> infile.close() Notes: Also consider using something like content.splitlines(), if you want to divide content in lines (split on newline characters). Reading a file one line at a time -- example: >>> infile = file('tmp.txt', 'r') >>> for line in infile: ... print 'Line:', line ... Line: This is line #1 Line: This is line #2 Line: This is line #3 >>> infile.close() Notes: Learn more about the for: statement in section for: statement. "infile.readlines()" returns a list of lines in the file. For large files use the file object itself or "infile.xreadlines()", both of which are iterators for the lines in the file. In older versions of Python, a file object is not itself an iterator. In those older versions of Python, you may need to use infile.readlines() or a while loop containing infile.readline() For example: >>> infile = file('tmp.txt', 'r') >>> for line in infile.readlines(): ... print 'Line:', line ... A few additional comments about strings: A string is a special kind of sequence. So, you can index into the characters of a string and you can iterate over the characters in a string. For example: outfile = open('tmp.txt', 'w') outfile.write('This is line #1\n') outfile.write('This is line #2\n') outfile.write('This is line #3\n') outfile.close()
converted by Web2PDFConvert.com
If you need to do fast or complex string searches, there is a regular expression module in the standard library. re re - Regular expression operations -- http://docs.python.org/library/re.html. An interesting feature of string formatting is the ability to use dictionaries to supply the values that are inserted. Here is an example: names = {'tree': 'sycamore', 'flower': 'poppy', 'herb': 'arugula'} print 'The tree is %(tree)s' % names print 'The flower is %(flower)s' % names print 'The herb is %(herb)s' % names
converted by Web2PDFConvert.com
>>> trees1 = list(['oak', 'pine', 'sycamore']) >>> trees1 ['oak', 'pine', 'sycamore'] >>> trees2 = list(trees1) >>> trees2 ['oak', 'pine', 'sycamore'] >>> trees1 is trees2 False To create a tuple, use commas, and possibly parentheses as well: >>> a = (11, 22, 33, ) >>> b = 'aa', 'bb' >>> c = 123, >>> a (11, 22, 33) >>> b ('aa', 'bb') >>> c (123,) >>> type(c) <type 'tuple'> Notes: To create a tuple containing a single item, we still need the comma. Example: >>> print ('abc',) ('abc',) >>> type(('abc',)) <type 'tuple'> To add an item to the end of a list, use append(): >>> items.append(444) >>> items [111, 222, 333, 444] To insert an item into a list, use insert(). This example inserts an item at the beginning of a list: >>> items.insert(0, -1) >>> items [-1, 111, 222, 333, 444] To add two lists together, creating a new list, use the + operator. To add the items in one list to an existing list, use the extend() method. Examples: >>> a = [11, 22, 33,] >>> b = [44, 55] >>> c = a + b >>> c [11, 22, 33, 44, 55] >>> a [11, 22, 33] >>> b [44, 55] >>> a.extend(b) >>> a [11, 22, 33, 44, 55] You can also push items onto the right end of a list and pop items off the right end of a list with append() and pop(). This enables us to use a list as a stack-like data structure. Example: >>> items = [111, 222, 333, 444,] >>> items [111, 222, 333, 444] >>> items.append(555) >>> items [111, 222, 333, 444, 555] >>> items.pop() 555 >>> items [111, 222, 333, 444]
converted by Web2PDFConvert.com
And, you can iterate over the items in a list or tuple (or other collection, for that matter) with the for: statement: >>> for item in items: ... print 'item:', item ... item: -1 item: 111 item: 222 item: 333 item: 444 For more on the for: statement, see section for: statement.
1.4.3 Dictionaries
1.4.3.1 What dictionaries are
A dictionary is: An associative array. A mapping from keys to values. A container (collection) that holds key-value pairs. A dictionary has the following capabilities: Ability to iterate over keys or values or key-value pairs. Ability to add key-value pairs dynamically. Ability to look-up a value by key. For help on dictionaries, type: >>> help dict at Python's interactive prompt, or: $ pydoc dict at the command line. It also may be helpful to use the built-in dir() function, then to ask for help on a specific method. Example: >>> a = {} >>> dir(a) ['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values'] >>> >>> help(a.keys) Help on built-in function keys: keys(...) D.keys() -> list of D's keys More information about dictionary objects is available here: Mapping types - dict -- http://docs.python.org/library/stdtypes.html#mappingtypes-dict.
>>> states = {'az': 'Arizona', 'ca': 'California'} >>> states['ca'] 'California' or: >>> def fruitfunc(): ... print "I'm a fruit." >>> def vegetablefunc(): ... print "I'm a vegetable." >>> >>> lookup = {'fruit': fruitfunc, 'vegetable': vegetablefunc} >>> lookup {'vegetable': <function vegetablefunc at 0x4028980c>, 'fruit': <function fruitfunc at 0x4028e614>} >>> lookup['fruit']() I'm a fruit. >>> lookup['vegetable']() I'm a vegetable. or: >>> lookup = dict((('aa', 11), ('bb', 22), ('cc', 33))) >>> lookup {'aa': 11, 'cc': 33, 'bb': 22} Note that the keys in a dictionary must be immutable. Therefore, you can use any of the following as keys: numbers, strings, tuples. Test for the existence of a key in a dictionary with the in operator: >>> if 'fruit' in lookup: ... print 'contains key "fruit"' ... contains key "fruit" or, alternatively, use the (slightly out-dated) has_key() method: >>> if lookup.has_key('fruit'): ... print 'contains key "fruit"' ... contains key "fruit" Access the value associated with a key in a dictionary with the indexing operator (square brackets): >>> print lookup['fruit'] <function fruitfunc at 0x4028e614> Notice that the above will throw an exception if the key is not in the dictionary: >>> print lookup['salad'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'salad' And so, the get() method is an easy way to get a value from a dictionary while avoiding an exception. For example: >>> print <function >>> print None >>> print <function lookup.get('fruit') fruitfunc at 0x4028e614> lookup.get('salad') lookup.get('salad', fruitfunc) fruitfunc at 0x4028e614>
A dictionary is an iterator object that produces its keys. So, we can iterate over the keys in a dictionary as follows: >>> for key in lookup: ... print 'key: %s' % key ... lookup[key]() ... key: vegetable I'm a vegetable. key: fruit I'm a fruit.
converted by Web2PDFConvert.com
And, remember that you can sub-class dictionaries. Here are two versions of the same example. The keyword arguments in the second version require Python 2.3 or later: # # This example works with Python 2.2. class MyDict_for_python_22(dict): def __init__(self, **kw): for key in kw.keys(): self[key] = kw[key] def show(self): print 'Showing example for Python 2.2 ...' for key in self.keys(): print 'key: %s value: %s' % (key, self[key]) def test_for_python_22(): d = MyDict_for_python_22(one=11, two=22, three=33) d.show() test_for_python_22() A version for newer versions of Python: # # This example works with Python 2.3 or newer versions of Python. # Keyword support, when sub-classing dictionaries, seems to have # been enhanced in Python 2.3. class MyDict(dict): def show(self): print 'Showing example for Python 2.3 or newer.' for key in self.keys(): print 'key: %s value: %s' % (key, self[key]) def test(): d = MyDict(one=11, two=22, three=33) d.show() test() Running this example produces: Showing example for Python 2.2 ... key: one value: 11 key: three value: 33 key: two value: 22 Showing example for Python 2.3 or newer. key: three value: 33 key: two value: 22 key: one value: 11 A few comments about this example: Learn more about classes and how to implement them in section Classes and instances. The class MyDict does not define a constructor (__init__). This enables us to re-use the contructor from super-class dict and any of its forms. Type "help dict" at the Python interactive prompt to learn about the various ways to call the dict constructor. The show method is the specialization added to our sub-class. In our sub-class, we can refer to any methods in the super-class (dict). For example: self.keys(). In our sub-class, we can refer the dictionary itself. For example: self[key].
1.4.4 Files
1.4.4.1 What files are
A file is a Python object that gives us access to a file on the disk system. A file object can be created ("opened") for reading ("r" mode), for writing ("w" mode), or for appending ("a" mode) to a file. Opening a file for writing erases an existing with that path/name. Opening a file for append does not.
outfile.write('Line # 1\n') outfile.write('Line # 2\n') outfile.write('Line # 3\n') outfile.close() def append_file(outfilename): outfile = open(outfilename, 'a') outfile.write('Line # 4\n') outfile.write('Line # 5\n') outfile.close() def read_file(infilename): infile = open(infilename, 'r') for line in infile: print line.rstrip() infile.close() def test(): filename = 'temp_file.txt' write_file(filename) read_file(filename) append_file(filename) print '-' * 50 read_file(filename) test()
converted by Web2PDFConvert.com
>>> inFile.close() >>> print content aaa bbb ccc ddd eee fff ggg hhh iii >>> words = content.split() >>> print words ['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg', 'hhh', 'iii'] >>> for word in words: ... print word ... aaa bbb ccc ddd eee fff ggg hhh iii
The assignment operator is =. Here are some of the things you can assign a value to: A name (variable) An item (position) in a list. Example: >>> a = [11, 22, 33] >>> a [11, 22, 33] >>> a[1] = 99 >>> a [11, 99, 33] A key in a dictionary. Example: >>> names = {} >>> names['albert'] = 25 >>> names {'albert': 25} A slice in a list. Example: >>> a = [11, 22, 33, 44, 55, 66, 77, ] >>> a [11, 22, 33, 44, 55, 66, 77] >>> a[1:3] = [999, 888, 777, 666] >>> a [11, 999, 888, 777, 666, 44, 55, 66, 77] A tuple or list. Assignment to a tuple or list performs unpacking. Example: >>> values = 111, 222, 333 >>> values (111, 222, 333) >>> a, b, c = values >>> a 111 >>> b 222 >>> c 333 Unpacking suggests a convenient idiom for returning and capturing a multiple arguments from a function. Example: >>> ... ... >>> >>> >>> 4 >>> 8 >>> 12 def multiplier(n): return n, n * 2, n * 3 x, y, z = multiplier(4) x y z
If a function needs to return a variable number of values, then unpacking will not do. But, you can still return multiple values by returning a container of some kind (for example, a tuple, a list, a dictionary, a set, etc.). An attribute. Example: >>> class A(object): ... pass ... >>> c = A() >>> >>> a = A() >>> a.size = 33 >>> print a.size 33 >>> a.__dict__ {'size': 33}
converted by Web2PDFConvert.com
Notice that the trace-back identifies the file and line where the test is made and shows the test itself. If you run python with the optimize options (-O and -OO), the assertion test is not performed. The second argument to assert() is optional.
statement-block if condition: statement-block-1 else: statement-block-2 if condition-1: statement-block-1 elif condition-2: statement-block-2 o o o else: statement-block-n Here is an example: >>> y = 25 >>> >>> if y > 15: ... print 'y is large' ... else: ... print 'y is small' ... y is large A few notes: The condition can be any expression, i.e. something that returns a value. A detailed description of expressions can be found at Python Language Reference: Expressions -- http://docs.python.org/reference/expressions.html. Parentheses are not needed around the condition. Use parentheses to group sub-expressions and control the order of evaluation when the natural operator precedence is not what you want. Python's operator precedences are described at Python Language Reference: Expressions: Summary -- http://docs.python.org/reference/expressions.html#summary. Python has no switch statement. Use if:elif:.... Or consider using a dictionary. Here is an example that uses both of these techniques: def function1(): print "Hi. I'm function 1." def function2(): print "Hi. I'm function 2." def function3(): print "Hi. I'm function 3." def error_function(): print "Invalid option." def test1(): while 1: code = raw_input('Enter "one", "two", "three", or "quit": ') if code == 'quit': break if code == 'one': function1() elif code == 'two': function2() elif code == 'three': function3() else: error_function() def test2(): mapper = {'one': function1, 'two': function2, 'three': function3} while 1: code = raw_input('Enter "one", "two", "three", or "quit": ') if code == 'quit': break func = mapper.get(code, error_function) func() def test(): test1() print '-' * 50 test2() if __name__ == '__main__': test()
converted by Web2PDFConvert.com
converted by Web2PDFConvert.com
yield '||%s||' % item def test(): collection = [111, 222, 333, ] for x in t(collection): print x test() Which prints out: ||111|| ||222|| ||333||
Here is an example: >>> reply = 'repeat' >>> while reply == 'repeat': ... print 'Hello' ... reply = raw_input('Enter "repeat" to do it again: ') ... Hello Enter "repeat" to do it again: repeat Hello Enter "repeat" to do it again: bye Comments: Use the break statement to exit immediately from a loop. This works in both for: and while:. Here is an example that uses break in a for: statement: # for_break.py
converted by Web2PDFConvert.com
"""Count lines until a line that begins with a double #. """ import sys def countLines(infilename): infile = file(infilename, 'r') count = 0 for line in infile.readlines(): line = line.strip() if line[:2] == '##': break count += 1 return count def usage(): print 'Usage: python python_101_for_break.py <infilename>' sys.exit(-1) def main(): args = sys.argv[1:] if len(args) != 1: usage() count = countLines(args[0]) print 'count:', count if __name__ == '__main__': main() Use the continue statement to skip the remainder of the code block in a for: or while: statement. A continue is a short-circuit which, in effect, branches immediately back to the top of the for: or while: statement (or if you prefer, to the end of the block). The test if __name__ == '__main__': is used to enable a script to both be (1) imported and (2) run from the command line. That condition is true only when the script is run, but not imported. This is a common Python idiom, which you should consider including at the end of your scripts, whether (1) to give your users a demonstration of what your script does and how to use it or (2) to provide a test of the script.
Which produces: amissingfile.py is missing Exception types are described here: Python Standard Library: Built-in Exceptions -- http://docs.python.org/library/exceptions.html. Catch any one of several execption types by using a tuple containing the exceptions to be caught. Example: try: f = open('abcdexyz.txt', 'r') d = {} x = d['name'] except (IOError, KeyError), e: print 'The error is --', e Note that multiple types of exceptions to be caught by a single except: clause are in parentheses; they are a tuple. You can customize your error handling still further (1) by passing an object when you raise the exception and (2) by catching that object in the except: clause of your try: statement. By doing so, you can pass information up from the raise statement to an exception handler. One way of doing this is to pass an object. A reasonable strategy is to define a sub-class of a standard exception. For example: class E(Exception): def __init__(self, msg): self.msg = msg def getMsg(self): return self.msg def test(): try: raise E('my test error') except E, obj: print 'Msg:', obj.getMsg() test() Which produces: Msg: my test error If you catch an exception using try:except:, but then find that you do not want to handle the exception at that location, you can "reraise" the same exception (with the same arguments) by using raise with no arguments. An example: class GeneralException(Exception): pass class SimpleException(GeneralException): pass class ComplexException(GeneralException): pass def some_func_that_throws_exceptions(): #raise SimpleException('this is a simple error') raise ComplexException('this is a complex error') def test(): try: some_func_that_throws_exceptions() except GeneralException, e: if isinstance(e, SimpleException): print e else: raise test()
1.7 Organization
This section describes Python features that you can use to organize and structure your code.
1.7.1 Functions
1.7.1.1 A basic function
converted by Web2PDFConvert.com
Use def to define a function. Here is a simple example: def test(msg, count): for idx in range(count): print '%s %d' % (msg, idx) test('Test #', 4) Comments: After evaluation def creates a function object. Call the function using the parentheses function call notation, in this case "test('Test #', 4)". As with other Python objects, you can stuff a function object into other structures such as tuples, lists, and dictionaries. Here is an example: # Create a tuple: val = (test, 'A label:', 5) # Call the function: val[0](val[1], val[2])
The value of the keyword parameter (**kwargs) is a dictionary, so you can do anything with it that you do with a normal dictionary.
converted by Web2PDFConvert.com
In many object-oriented programming languages, the instance is hidden in the method definitions. These languages typically explain this by saying something like "The instance is passed as an implicit first argument to the method." In Python, the instance is visible and explicit in method definitions. You must explicitly declare the instance as the first parameter of each (instance) method. This first parameter is (almost) always spelled "self".
1.7.2.2 Inheritance
Define a class Special that inherits from a super-class Basic as follows: class Basic: def __init__(self, name): self.name = name def show(self): print 'Basic -- name: %s' % self.name class Special(Basic): def __init__(self, name, edible): Basic.__init__(self, name) self.upper = name.upper() self.edible = edible def show(self): Basic.show(self) print 'Special -- upper name: %s.' % self.upper if self.edible: print "It's edible." else: print "It's not edible." def edible(self): return self.edible def test(): obj1 = Basic('Apricot') obj1.show() print '=' * 30 obj2 = Special('Peach', 1) obj2.show() test() Running this example produces the following: Basic -- name: Apricot ============================== Basic -- name: Peach Special -- upper name: PEACH. It's edible. Comments: The super-class is listed after the class name in parentheses. For multiple inheritence, separate the super-classes with commas. Call a method in the super-class, by-passing the method with the same name in the sub-class, from the sub-class by using the superclass name. For example: Basic.__init__(self, name) and Basic.show(self). In our example (above), the sub-class (Special) specializes the super-class (Basic) by adding additional member variables (self.upper and self.edible) and by adding an additional method (edible).
converted by Web2PDFConvert.com
1.7.2.5 Properties
A new-style class can have properties. A property is an attribute of a class that is associated with a getter and a setter function. Declare the property and its getter and setter functions with property(). Here is an example: class A(object): count = 0 def __init__(self, name): self.name = name def set_name(self, name): print 'setting name: %s' % name self.name = name def get_name(self): print 'getting name: %s' % self.name return self.name objname = property(get_name, set_name) def test(): a = A('apple') print 'name: %s' % a.objname a.objname = 'banana' print 'name: %s' % a.objname test() Running the above produces the following output: getting name: apple name: apple setting name: banana getting name: banana name: banana Notes: The class inherits from class object, which makes it a new-style class. When a value is assigned to a property, the setter method is called. When the value of a property is accessed, the getter method is called. You can also define a delete method and a documentation attribute for a property. For more information, visit 2.1 Built-in Functions and look for property.
1.7.3 Modules
You can use a module to organize a number of Python definitions in a single file. A definition can be a function, a class, or a variable containing any Python object. Here is an example: # python_101_module_simple.py """ This simple module contains definitions of a class and several functions. """ LABEL = '===== Testing a simple module =====' class Person: """Sample of a simple class definition. """ def __init__(self, name, description): self.name = name self.description = description def show(self): print 'Person -- name: %s description: %s' % (self.name, self.description) def test(msg, count): """A sample of a simple function. """ for idx in range(count): print '%s %d' % (msg, idx) def testDefaultArgs(arg1='default1', arg2='default2'): """A function with default arguments. """ print 'arg1:', arg1
converted by Web2PDFConvert.com
print 'arg2:', arg2 def testArgLists(*args, **kwargs): """ A function which references the argument list and keyword arguments. """ print 'args:', args print 'kwargs:', kwargs def main(): """ A test harness for this module. """ print LABEL person = Person('Herman', 'A cute guy') person.show() print '=' * 30 test('Test #', 4) print '=' * 30 testDefaultArgs('Explicit value') print '=' * 30 testArgLists('aaa', 'bbb', arg1='ccc', arg2='ddd') if __name__ == '__main__': main() Running the above produces the following output: ===== Testing a simple module ===== Person -- name: Herman description: A cute guy ============================== Test # 0 Test # 1 Test # 2 Test # 3 ============================== arg1: Explicit value arg2: default2 ============================== args: ('aaa', 'bbb') kwargs: {'arg1': 'ccc', 'arg2': 'ddd'} Comments: The string definitions at the beginning of each of the module, class definitions, and function definitions serve as documentation for these items. You can show this documentation with the following from the command-line: $ pydoc python_101_module_simple Or this, from the Python interactive prompt: >>> import python_101_module_simple >>> help(python_101_module_simple) It is common and it is a good practice to include a test harness for the module at the end of the source file. Note that the test: if __name__ == '__main__': will be true only when the file is run (e.g. from the command-line with something like: "$ python python_101_module_simple.py but not when the module is imported. Remember that the code in a module is only evaluated the first time it is imported in a program. So, for example, change the value of a global variable in a module might cause behavior that users of the module might not expect. Constants, on the other hand, are safe. A constant, in Python, is a variable whose value is initialized but not changed. An example is LABEL, above.
1.7.4 Packages
A package is a way to organize a number of modules together as a unit. Python packages can also contain other packages.
converted by Web2PDFConvert.com
To give us an example to talk about, consider the follow package structure: package_example/ package_example/__init__.py package_example/module1.py package_example/module2.py package_example/A.py package_example/B.py And, here are the contents:
__init__.py:
# __init__.py # Expose definitions from modules in this package. from module1 import class1 from module2 import class2
module1.py:
# module1.py class class1: def __init__(self): self.description = 'class #1' def show(self): print self.description
module2.py:
# module2.py class class2: def __init__(self): self.description = 'class #2' def show(self): print self.description
A.py:
# A.py import B
B.py:
# B.py def function_b(): print 'Hello from function_b' In order to be used as a Python package (e.g. so that modules can be imported from it) a directory must contain a file whose name is __init__.py. The code in this module is evaluated the first time a module is imported from the package. In order to import modules from a package, you may either add the package directory to sys.path or, if the parent directory is on sys.path, use dot-notation to explicitly specify the path. In our example, you might use: "import package_example.module1". A module in a package can import another module from the same package directly without using the path to the package. For example, the module A in our sample package package_example can import module B in the same package with "import B". Module A does not need to use "import package_example.B". You can find additional information on packages at http://www.python.org/doc/essays/packages.html. Suggested techniques: In the __init__.py file, import and make available objects defined in modules in the package. Our sample package package_example does this. Then, you can use from package_example import * to import the package and its contents. For example: >>> from package_example import * >>> dir() ['__builtins__', '__doc__', '__file__', '__name__',
converted by Web2PDFConvert.com
'atexit', 'class1', 'class2', 'module1', 'module2', 'readline', 'rlcompleter', 'sl', 'sys'] >>> >>> c1 = class1() >>> c2 = class2() >>> c1.show() class #1 >>> c2.show() class #2 A few additional notes: With Python 2.3, you can collect the modules in a package into a Zip file by using PyZipFile from the Python standard library. See http://www.python.org/doc/current/lib/pyzipfile-objects.html. >>> >>> >>> >>> import zipfile a = zipfile.PyZipFile('mypackage.zip', 'w', zipfile.ZIP_DEFLATED) a.writepy('Examples') a.close()
Then you can import and use this archive by inserting its path in sys.path. In the following example, class_basic_1 is a module within package mypackage: >>> import sys >>> sys.path.insert(0, '/w2/Txt/Training/mypackage.zip') >>> import class_basic_1 Basic -- name: Apricot >>> obj = class_basic_1.Basic('Wilma') >>> obj.show() Basic -- name: Wilma
of occurances of "cd" followed by "ef", for example, "abef", "abcdef", "abcdcdef", etc. There are special names for some sets of characters, for example "\d" (any digit), "\w" (any alphanumeric character), "\W" (any nonalphanumeric character), etc. More more information, see Python Library Reference: Regular Expression Syntax -http://docs.python.org/library/re.html#regular-expression-syntax Because of the use of backslashes in patterns, you are usually better off defining regular expressions with raw strings, e.g. r"abc".
converted by Web2PDFConvert.com
import sys, re Targets = [ 'There are <<25>> sparrows.', 'I see <<15>> finches.', 'There is nothing here.', ] def test(): pat = re.compile('<<([0-9]*)>>') for line in Targets: mo = pat.search(line) if mo: value = mo.group(1) print 'value: %s' % value else: print 'no match' test() When we run the above, it prints out the following: value: 25 value: 15 no match Explanation: In the regular expression, put parentheses around the portion of the regular expression that will match what you want to extract. Each pair of parentheses marks off a group. After the search, check to determine if there was a successful match by checking for a matching object. "pat.search(line)" returns None if the search fails. If you specify more than one group in your regular expression (more that one pair of parentheses), then you can use "value = mo.group(N)" to extract the value matched by the Nth group from the matching object. "value = mo.group(1)" returns the first extracted value; "value = mo.group(2)" returns the second; etc. An argument of 0 returns the string matched by the entire regular expression. In addition, you can: Use "values = mo.groups()" to get a tuple containing the strings matched by all groups. Use "mo.expand()" to interpolate the group values into a string. For example, "mo.expand(r'value1: \1 value2: \2')"inserts the values of the first and second group into a string. If the first group matched "aaa" and the second matched "bbb", then this example would produce "value1: aaa value2: bbb". For example: In [76]: mo = re.search(r'h: (\d*) w: (\d*)', 'h: 123 w: 456') In [77]: mo.expand(r'Height: \1 Width: \2') Out[77]: 'Height: 123 Width: 456'
converted by Web2PDFConvert.com
converted by Web2PDFConvert.com
You can also use the sub function or method to do substitutions. Here is an example: import sys, re pat = re.compile('[0-9]+') print 'Replacing decimal digits.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break repl = raw_input('Enter a replacement: ') result = pat.sub(repl, target) print 'result: %s' % result Here is another example of the use of a function to insert calculated replacements. import sys, re, string pat = re.compile('[a-m]+') def replacer(mo): return string.upper(mo.group(0)) print 'Upper-casing a-m.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break result = pat.sub(replacer, target) print 'result: %s' % result Notes: If the replacement argument to sub is a function, that function must take one argument, a match object, and must return the modified (or replacement) value. The matched sub-string will be replaced by the value returned by this function. In our case, the function replacer converts the matched value to upper case. This is also a convenient use for a lambda instead of a named function, for example: import sys, re, string pat = re.compile('[a-m]+') print 'Upper-casing a-m.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break result = pat.sub( lambda mo: string.upper(mo.group(0)), target) print 'result: %s' % result
Types -- http://docs.python.org/library/stdtypes.html#iterator-types. Iterator class - A class that implements (satisfies) the iterator protocol. In particular, the class implements next() and __iter__() methods as described above and in Iterator Types -- http://docs.python.org/library/stdtypes.html#iterator-types. (Iterator) generator function - A function (or method) which, when called, returns an iterator object, that is, an object that satisfies the iterator protocol. A function containing a yield statement automatically becomes a generator. Generator expression - An expression which produces an iterator object. Generator expressions have a form similar to a list comprehension, but are enclosed in parentheses rather than square brackets. See example below. A few additional basic points: A function that contains a yield statement is a generator function. When called, it returns an iterator, that is, an object that provides next() and __iter__() methods. The iterator protocol is described here: Python Standard Library: Iterator Types -http://docs.python.org/library/stdtypes.html#iterator-types. A class that defines both a next() method and a __iter__() method satisfies the iterator protocol. So, instances of such a class will be iterators. Python provides a variety of ways to produce (implement) iterators. This section describes a few of those ways. You should also look at the iter() built-in function, which is described in The Python Standard Library: Built-in Functions: iter() -http://docs.python.org/library/functions.html#iter. An iterator can be used in an iterator context, for example in a for statement, in a list comprehension, and in a generator expression. When an iterator is used in an iterator context, the iterator produces its values. This section attempts to provide examples that illustrate the generator/iterator pattern. Why is this important? Once mastered, it is a simple, convenient, and powerful programming pattern. It has many and pervasive uses. It helps to lexically separate the producer code from the consumer code. Doing so makes it easier to locate problems and to modify or fix code in a way that is localized and does not have unwanted side-effects. Implementing your own iterators (and generators) enables you to define your own abstract sequences, that is, sequences whose composition are defined by your computations rather than by their presence in a container. In fact, your iterator can calculate or retrieve values as each one is requested. Examples - The remainder of this section provides a set of examples which implement and use iterators.
method enables us to pass the iterator object around and get values at different locations in our code. Once we have obtained all the values from an iterator, it is, in effect, "empty" or "exhausted". The iterator protocol, in fact, specifies that once an iterator raises the StopIteration exception, it should continue to do so. Another way to say this is that there is no "rewind" operation. But, you can call the the generator function again to get a "fresh" iterator. An alternative and perhaps simpler way to create an interator is to use a generator expression. This can be useful when you already have a collection or iterator to work with. Then following example implements a function that returns a generator object. The effect is to generate the objects in a collection which excluding items in a separte collection: DATA = [ 'lemon', 'lime', 'grape', 'apple', 'pear', 'watermelon', 'canteloupe', 'honeydew', 'orange', 'grapefruit', ] def make_producer(collection, excludes): gen = (item for item in collection if item not in excludes) return gen def test(): iter1 = make_producer(DATA, ('apple', 'orange', 'honeydew', )) print '%s' % iter1 for fruit in iter1: print fruit test() When run, this example produces the following: $ python workbook063.py <generator object <genexpr> at 0x7fb3d0f1bc80> lemon lime grape pear watermelon canteloupe grapefruit Notes: A generator expression looks almost like a list comprehension, but is surrounded by parentheses rather than square brackets. For more on list comprehensions see section Example - A list comprehension. The make_producer function returns the object produced by the generator expression.
yield child # # Print information on this node and walk over all children and # grandchildren ... def walk(self, level=0): print '%sname: %s value: %s' % ( get_filler(level), self.get_name(), self.get_value(), ) for child in self.iterchildren(): child.walk(level + 1) # # An function that is the equivalent of the walk() method in # class Node. # def walk(node, level=0): print '%sname: %s value: %s' % ( get_filler(level), node.get_name(), node.get_value(), ) for child in node.iterchildren(): walk(child, level + 1) def get_filler(level): return ' ' * level def test(): a7 = Node('gilbert', '777') a6 = Node('fred', '666') a5 = Node('ellie', '555') a4 = Node('daniel', '444') a3 = Node('carl', '333', [a4, a5]) a2 = Node('bill', '222', [a6, a7]) a1 = Node('alice', '111', [a2, a3]) # Use the walk method to walk the entire tree. print 'Using the method:' a1.walk() print '=' * 30 # Use the walk function to walk the entire tree. print 'Using the function:' walk(a1) test() Running this example produces the following output: Using the method: name: alice value: 111 name: bill value: 222 name: fred value: 666 name: gilbert value: 777 name: carl value: 333 name: daniel value: 444 name: ellie value: 555 ============================== Using the function: name: alice value: 111 name: bill value: 222 name: fred value: 666 name: gilbert value: 777 name: carl value: 333 name: daniel value: 444 name: ellie value: 555 Notes and explanation: This class contains a method iterchildren which, when called, returns an iterator. The yield statement in the method iterchildren makes it into a generator. The yield statement returns one item each time it is reached. The next time the iterator object is "called" it resumes immediately after the yield statement. A function may have any number of yield statements. A for statement will iterate over all the items produced by an iterator object. This example shows two ways to use the generator, specifically: (1) the walk method in the class Node and (2) the walk function. Both call the generator iterchildren and both do pretty much the same thing.
refresh method which enables us to "rewind" and reuse the iterator instance: # # An iterator class that does *not* use ``yield``. # This iterator produces every other item in a sequence. # class IteratorExample: def __init__(self, seq): self.seq = seq self.idx = 0 def next(self): self.idx += 1 if self.idx >= len(self.seq): raise StopIteration value = self.seq[self.idx] self.idx += 1 return value def __iter__(self): return self def refresh(self): self.idx = 0 def test_iteratorexample(): a = IteratorExample('edcba') for x in a: print x print '----------' a.refresh() for x in a: print x print '=' * 30 a = IteratorExample('abcde') try: print a.next() print a.next() print a.next() print a.next() print a.next() print a.next() except StopIteration, e: print 'stopping', e Running this example produces the following output: d b ---------d b ============================== b d stopping Notes and explanation: The next method must keep track of where it is and what item it should produce next. Alert: The iterator protocol has changed slightly in Python 3.0. In particular, the next() method has been renamed to __next__(). See: Python Standard Library: Iterator Types -- http://docs.python.org/3.0/library/stdtypes.html#iterator-types.
if flag: flag = 0 yield x else: flag = 1 def __iter__(self): return self.iterator def refresh(self): self.iterator = self._next() self.next = self.iterator.next def test_yielditeratorexample(): a = YieldIteratorExample('edcba') for x in a: print x print '----------' a.refresh() for x in a: print x print '=' * 30 a = YieldIteratorExample('abcde') try: print a.next() print a.next() print a.next() print a.next() print a.next() print a.next() except StopIteration, e: print 'stopping', e test_yielditeratorexample() Running this example produces the following output: d b ---------d b ============================== b d stopping Notes and explanation: Because the _next method uses yield, calling it (actually, calling the iterator object it produces) in an iterator context causes it to be "resumed" immediately after the yield statement. This reduces bookkeeping a bit. However, with this style, we must explicitly produce an iterator. We do this by calling the _next method, which contains a yield statement, and is therefore a generator. The following code in our constructor (__init__) completes the set-up of our class as an iterator class: self.iterator = self._next() self.next = self.iterator.next Remember that we need both __iter__() and next() methods in orderDictionary to satisfy the iterator protocol. The __iter__() method is already there and the above code in the constructor creates the next() method.
converted by Web2PDFConvert.com
def mycmpfunc(arg1, arg2): return cmp(string.lower(arg1), string.lower(arg2)) class XmlTest(unittest.TestCase): def test_import_export1(self): inFile = file('test1_in.xml', 'r') inContent = inFile.read() inFile.close() doc = webserv_example_heavy_sub.parseString(inContent) outFile = StringIO.StringIO() outFile.write('<?xml version="1.0" ?>\n') doc.export(outFile, 0) outContent = outFile.getvalue() outFile.close() self.failUnless(inContent == outContent) # make the test suite. def suite(): loader = unittest.TestLoader() # Change the test method prefix: test --> trial. #loader.testMethodPrefix = 'trial' # Change the comparison function that determines the order of tests. #loader.sortTestMethodsUsing = mycmpfunc testsuite = loader.loadTestsFromTestCase(XmlTest) return testsuite # Make the test suite; run the tests. def test_main(): testsuite = suite() runner = unittest.TextTestRunner(sys.stdout, verbosity=2) result = runner.run(testsuite) if __name__ == "__main__": test_main() Running the above script produces the following output: test_import_export (__main__.XmlTest) ... ok ---------------------------------------------------------------------Ran 1 test in 0.035s OK A few notes on this example: This example tests the ability to parse an xml document test1_in.xml and export that document back to XML. The test succeeds if the input XML document and the exported XML document are the same. The code which is being tested parses an XML document returned by a request to Amazon Web services. You can learn more about Amazon Web services at: http://www.amazon.com/webservices. This code was generated from an XML Schema document by generateDS.py. So we are in effect, testing generateDS.py. You can find generateDS.py at: http://www.rexx.com/~dkuhlman/#generateDS. Testing for success/failure and reporting failures -- Use the methods listed at http://www.python.org/doc/current/lib/testcaseobjects.html to test for and report success and failure. In our example, we used "self.failUnless(inContent == outContent)" to ensure that the content we parsed and the content that we exported were the same. Add additional tests by adding methods whose names have the prefix "test". If you prefer a different prefix for tests names, add something like the following to the above script: loader.testMethodPrefix = 'trial' By default, the tests are run in the order of their names sorted by the cmp function. So, if needed, you can control the order of execution of tests by selecting their names, for example, using names like test_1_checkderef, test_2_checkcalc, etc. Or, you can change the comparison function by adding something like the following to the above script: loader.sortTestMethodsUsing = mycmpfunc As a bit of motivation for creating and using unit tests, while developing this example, I discovered several errors (or maybe "special features") in generateDS.py.
converted by Web2PDFConvert.com
3. Handle errors and exceptions -- You will need to understand how to (1) clearing errors and exceptions and (2) Raise errors (exceptions). Many functions in the Python C API raise exceptions. You will need to check for and clear these exceptions. Here is an example: char * message; int messageNo; message = NULL; messageNo = -1; /* Is the argument a string? */ if (! PyArg_ParseTuple(args, "s", &message)) { /* It's not a string. Clear the error. * Then try to get a message number (an integer). */ PyErr_Clear(); if (! PyArg_ParseTuple(args, "i", &messageNo)) { o o o You can also raise exceptions in your C code that can be caught (in a "try:except:" block) back in the calling Python code. Here is an example: if (n == 0) { PyErr_SetString(PyExc_ValueError, "Value must not be zero"); return NULL; } See Include/pyerrors.h in the Python source distribution for more exception/error types. And, you can test whether a function in the Python C API that you have called has raised an exception. For example: if (PyErr_Occurred()) { /* An exception was raised. * Do something about it. */ o o o For more documentation on errors and exceptions, see: http://www.python.org/doc/current/api/exceptionHandling.html. 4. Create and return a value: For each built-in Python type there is a set of API functions to create and manipulate it. See the "Python/C API Reference Manual" for a description of these functions. For example, see: http://www.python.org/doc/current/api/intObjects.html http://www.python.org/doc/current/api/stringObjects.html http://www.python.org/doc/current/api/tupleObjects.html http://www.python.org/doc/current/api/listObjects.html http://www.python.org/doc/current/api/dictObjects.html Etc. The reference count -- You will need to follow Python's rules for reference counting that Python uses to garbage collect objects. You can learn about these rules at http://www.python.org/doc/current/ext/refcounts.html. You will not want Python to garbage collect objects that you create too early or too late. With respect to Python objects created with the above functions, these new objects are owned and may be passed back to Python code. However, there are situations where your C/C++ code will not automatically own a reference, for example when you extract an object from a container (a list, tuple, dictionary, etc). In these cases you should increment the reference count with Py_INCREF.
2.5.3 SWIG
Note: Our discussion and examples are for SWIG version 1.3 SWIG will often enable you to generate wrappers for functions in an existing C function library. SWIG does not understand everything in C header files. But it does a fairly impressive job. You should try it first before resorting to the hard work of writing wrappers by hand. More information on SWIG is at http://www.swig.org. Here are some steps that you can follow: 1. Create an interface file -- Even when you are wrapping functions defined in an existing header file, creating an interface file is a good idea.
converted by Web2PDFConvert.com
Include your existing header file into it, then add whatever else you need. Here is an extremely simple example of a SWIG interface file: %module MyLibrary %{ #include "MyLibrary.h" %} %include "MyLibrary.h" Comments: The "%{" and "%}" brackets are directives to SWIG. They say: "Add the code between these brackets to the generated wrapper file without processing it. The "%include" statement says: "Copy the file into the interface file here. In effect, you are asking SWIG to generate wrappers for all the functions in this header file. If you want wrappers for only some of the functions in a header file, then copy or reproduce function declarations for the desired functions here. An example: %module MyLibrary %{ #include "MyLibrary.h" %} int calcArea(int width, int height); int calcVolume(int radius); This example will generate wrappers for only two functions. You can find more information about the directives that are used in SWIG interface files in the SWIG User Manual, in particular at: http://www.swig.org/Doc1.3/Preprocessor.html http://www.swig.org/Doc1.3/Python.html 2. Generate the wrappers: swig -python MyLibrary.i 3. Compile and link the library. On Linux, you can use something like the following: gcc -c MyLibrary.c gcc -c -I/usr/local/include/python2.3 MyLibrary_wrap.c gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so Note that we produce a shared library whose name is the module name prefixed with an underscore. SWIG also generates a .py file, without the leading underscore, which we will import from our Python code and which, in turn, imports the shared library. 4. Use the extension module in your python code: Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import MyLibrary >>> MyLibrary.calcArea(4.0, 5.0) 20.0 Here is a makefile that will execute swig to generate wrappers, then compile and link the extension. CFLAGS = -I/usr/local/include/python2.3 all: _MyLibrary.so _MyLibrary.so: MyLibrary.o MyLibrary_wrap.o gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so MyLibrary.o: MyLibrary.c gcc -c MyLibrary.c -o MyLibrary.o MyLibrary_wrap.o: MyLibrary_wrap.c gcc -c ${CFLAGS} MyLibrary_wrap.c -o MyLibrary_wrap.o MyLibrary_wrap.c: MyLibrary.i swig -python MyLibrary.i
converted by Web2PDFConvert.com
clean: rm -f MyLibrary.py MyLibrary.o MyLibrary_wrap.c MyLibrary_wrap.o _MyLibrary.so Here is an example of running this makefile: $ make -f MyLibrary_makefile clean rm -f MyLibrary.py MyLibrary.o MyLibrary_wrap.c \ MyLibrary_wrap.o _MyLibrary.so $ make -f MyLibrary_makefile gcc -c MyLibrary.c -o MyLibrary.o swig -python MyLibrary.i gcc -c -I/usr/local/include/python2.3 MyLibrary_wrap.c -o MyLibrary_wrap.o gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so And, here are C source files that can be used in our example. MyLibrary.h: /* MyLibrary.h */ float calcArea(float width, float height); float calcVolume(float radius); int getVersion(); int getMode(); MyLibrary.c: /* MyLibrary.c */ float calcArea(float width, float height) { return (width * height); } float calcVolume(float radius) { return (3.14 * radius * radius); } int getVersion() { return 123; } int getMode() { return 1; }
2.5.4 Pyrex
Pyrex is a useful tool for writing Python extensions. Because the Pyrex language is similar to Python, writing extensions in Pyrex is easier than doing so in C. Cython appears to be the a newer version of Pyrex. More information is on Pyrex and Cython is at: Pyrex -- http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ Cython - C Extensions for Python -- http://www.cython.org/ Here is a simple function definition in Pyrex: # python_201_pyrex_string.pyx import string def formatString(object s1, object s2): s1 = string.strip(s1) s2 = string.strip(s2) s3 = '<<%s||%s>>' % (s1, s2) s4 = s3 * 4
converted by Web2PDFConvert.com
return s4 And, here is a make file: CFLAGS = -DNDEBUG -O3 -Wall -Wstrict-prototypes -fPIC \ -I/usr/local/include/python2.3 all: python_201_pyrex_string.so python_201_pyrex_string.so: python_201_pyrex_string.o gcc -shared python_201_pyrex_string.o -o python_201_pyrex_string.so python_201_pyrex_string.o: python_201_pyrex_string.c gcc -c ${CFLAGS} python_201_pyrex_string.c -o python_201_pyrex_string.o python_201_pyrex_string.c: python_201_pyrex_string.pyx pyrexc python_201_pyrex_string.pyx clean: rm -f python_201_pyrex_string.so python_201_pyrex_string.o \ python_201_pyrex_string.c Here is another example. In this one, one function in the .pyx file calls another. Here is the implementation file: # python_201_pyrex_primes.pyx def showPrimes(int kmax): plist = primes(kmax) for p in plist: print 'prime: %d' % p cdef primes(int kmax): cdef int n, k, i cdef int p[1000] result = [] if kmax > 1000: kmax = 1000 k = 0 n = 2 while k < kmax: i = 0 while i < k and n % p[i] <> 0: i = i + 1 if i == k: p[k] = n k = k + 1 result.append(n) n = n + 1 return result And, here is a make file: #CFLAGS = -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC # -I/usr/local/include/python2.3 CFLAGS = -DNDEBUG I/usr/local/include/python2.3 all: python_201_pyrex_primes.so python_201_pyrex_primes.so: python_201_pyrex_primes.o gcc -shared python_201_pyrex_primes.o -o python_201_pyrex_primes.so python_201_pyrex_primes.o: python_201_pyrex_primes.c gcc -c ${CFLAGS} python_201_pyrex_primes.c -o python_201_pyrex_primes.o python_201_pyrex_primes.c: python_201_pyrex_primes.pyx pyrexc python_201_pyrex_primes.pyx clean: rm -f python_201_pyrex_primes.so python_201_pyrex_primes.o python_201_pyrex_primes.c Here is the output from running the makefile: $ make -f python_201_pyrex_makeprimes clean rm -f python_201_pyrex_primes.so python_201_pyrex_primes.o \ python_201_pyrex_primes.c $ make -f python_201_pyrex_makeprimes
converted by Web2PDFConvert.com
pyrexc python_201_pyrex_primes.pyx gcc -c -DNDEBUG -I/usr/local/include/python2.3 python_201_pyrex_primes.c -o python_201_pyrex_primes.o gcc -shared python_201_pyrex_primes.o -o python_201_pyrex_primes.so Here is an interactive example of its use: $ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_primes >>> dir(python_201_pyrex_primes) ['__builtins__', '__doc__', '__file__', '__name__', 'showPrimes'] >>> python_201_pyrex_primes.showPrimes(5) prime: 2 prime: 3 prime: 5 prime: 7 prime: 11 This next example shows how to use Pyrex to implement a new extension type, that is a new Python built-in type. Notice that the class is declared with the cdef keyword, which tells Pyrex to generate the C implementation of a type instead of a class. Here is the implementation file: # python_201_pyrex_clsprimes.pyx """An implementation of primes handling class for a demonstration of Pyrex. """ cdef class Primes: """A class containing functions for handling primes. """ def showPrimes(self, int kmax): """Show a range of primes. Use the method primes() to generate the primes. """ plist = self.primes(kmax) for p in plist: print 'prime: %d' % p def primes(self, int kmax): """Generate the primes in the range 0 - kmax. """ cdef int n, k, i cdef int p[1000] result = [] if kmax > 1000: kmax = 1000 k = 0 n = 2 while k < kmax: i = 0 while i < k and n % p[i] <> 0: i = i + 1 if i == k: p[k] = n k = k + 1 result.append(n) n = n + 1 return result And, here is a make file: CFLAGS = -DNDEBUG -I/usr/local/include/python2.3 all: python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.so: python_201_pyrex_clsprimes.o gcc -shared python_201_pyrex_clsprimes.o -o python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o: python_201_pyrex_clsprimes.c gcc -c ${CFLAGS} python_201_pyrex_clsprimes.c -o python_201_pyrex_clsprimes.o
converted by Web2PDFConvert.com
python_201_pyrex_clsprimes.c: python_201_pyrex_clsprimes.pyx pyrexc python_201_pyrex_clsprimes.pyx clean: rm -f python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o \ python_201_pyrex_clsprimes.c Here is output from running the makefile: $ make -f python_201_pyrex_makeclsprimes clean rm -f python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o \ python_201_pyrex_clsprimes.c $ make -f python_201_pyrex_makeclsprimes pyrexc python_201_pyrex_clsprimes.pyx gcc -c -DNDEBUG -I/usr/local/include/python2.3 python_201_pyrex_clsprimes.c -o python_201_pyrex_clsprimes.o gcc -shared python_201_pyrex_clsprimes.o -o python_201_pyrex_clsprimes.so And here is an interactive example of its use: $ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_clsprimes >>> dir(python_201_pyrex_clsprimes) ['Primes', '__builtins__', '__doc__', '__file__', '__name__'] >>> primes = python_201_pyrex_clsprimes.Primes() >>> dir(primes) ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'primes', 'showPrimes'] >>> primes.showPrimes(4) prime: 2 prime: 3 prime: 5 prime: 7 Documentation -- Also notice that Pyrex preserves the documentation for the module, the class, and the methods in the class. You can show this documentation with pydoc, as follows: $ pydoc python_201_pyrex_clsprimes Or, in Python interactive mode, use: $ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_clsprimes >>> help(python_201_pyrex_clsprimes)
2.5.6 Cython
Here is a simple example that uses Cython to wrap a function implemented in C. First the C header file:
converted by Web2PDFConvert.com
/* test_c_lib.h */ int calculate(int width, int height); And, the C implementation file: /* test_c_lib.c */ #include "test_c_lib.h" int calculate(int width, int height) { int result; result = width * height * 3; return result; } Here is a Cython file that calls our C function: # test_c.pyx # Declare the external C function. cdef extern from "test_c_lib.h": int calculate(int width, int height) def test(w, h): # Call the external C function. result = calculate(w, h) print 'result from calculate: %d' % result We can compile our code using this script (on Linux): #!/bin/bash -x cython test_c.pyx gcc -c -fPIC -I/usr/local/include/python2.6 -o test_c.o test_c.c gcc -c -fPIC -I/usr/local/include/python2.6 -o test_c_lib.o test_c_lib.c gcc -shared -fPIC -I/usr/local/include/python2.6 -o test_c.so test_c.o test_c_lib.o Here is a small Python file that uses the wrapper that we wrote in Cython: # run_test_c.py import test_c def test(): test_c.test(4, 5) test_c.test(12, 15) if __name__ == '__main__': test() And, when we run it, we see the following: $ python run_test_c.py result from calculate: 60 result from calculate: 540
converted by Web2PDFConvert.com
Pyrex also goes some way toward giving you access to (existing) C structs and functions from Python.
2.6 Parsing
Python is an excellent language for text analysis. In some cases, simply splitting lines of text into words will be enough. In these cases use string.split(). In other cases, regular expressions may be able to do the parsing you need. If so, see the section on regular expressions in this document. However, in some cases, more complex analysis of input text is required. This section describes some of the ways that Python can help you with this complex parsing and analysis.
converted by Web2PDFConvert.com
#!/usr/bin/env python """ A recursive descent parser example. Usage: python rparser.py [options] <inputfile> Options: -h, --help Display this help message. Example: python rparser.py myfile.txt The grammar: Prog ::= Command | Command Prog Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call | Func_call ',' Func_call_list Term = <word> """ import import import import sys string types getopt
# # To use the IPython interactive shell to inspect your running # application, uncomment the following lines: # ## from IPython.Shell import IPShellEmbed ## ipshell = IPShellEmbed((), ## banner = '>>>>>>>> Into IPython >>>>>>>>', ## exit_msg = '<<<<<<<< Out of IPython <<<<<<<<') # # Then add the following line at the point in your code where # you want to inspect run-time values: # # ipshell('some message to identify where we are') # # For more information see: http://ipython.scipy.org/moin/ # # # Constants # # AST node types NoneNodeType = 0 ProgNodeType = 1 CommandNodeType = 2 FuncCallNodeType = 3 FuncCallListNodeType = 4 TermNodeType = 5 # Token types NoneTokType = 0 LParTokType = 1 RParTokType = 2 WordTokType = 3 CommaTokType = 4 EOFTokType = 5 # Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args:
converted by Web2PDFConvert.com
self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child: item.show(level) else: self.showLevel(level) print 'Child:', child def showLevel(self, level): for idx in range(level): print ' ', # # The recursive descent parser class. # Contains the "recognizer" methods, which implement the grammar # rules (above), one recognizer method for each production rule. # class ProgParser: def __init__(self): pass def parseFile(self, infileName): self.infileName = infileName self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = file(self.infileName, 'r') self.tokens = genTokens(self.infile) try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() self.infile.close() self.infile = None return result def parseStream(self, instream): self.tokens = genTokens(instream, '<instream>') try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() return result def prog_reco(self): commandList = [] while 1: result = self.command_reco() if not result: break commandList.append(result) return ASTNode(ProgNodeType, commandList) def command_reco(self): if self.tokenType == EOFTokType: return None result = self.func_call_reco() return ASTNode(CommandNodeType, result) def func_call_reco(self): if self.tokenType == WordTokType: term = ASTNode(TermNodeType, self.token) self.tokenType, self.token, self.lineNo = self.tokens.next() if self.tokenType == LParTokType: self.tokenType, self.token, self.lineNo = self.tokens.next() result = self.func_call_list_reco() if result: if self.tokenType == RParTokType: self.tokenType, self.token, self.lineNo = \ self.tokens.next() return ASTNode(FuncCallNodeType, term, result)
converted by Web2PDFConvert.com
else: raise ParseError(self.lineNo, 'missing right paren') else: raise ParseError(self.lineNo, 'bad func call list') else: raise ParseError(self.lineNo, 'missing left paren') else: return None def func_call_list_reco(self): terms = [] while 1: result = self.func_call_reco() if not result: break terms.append(result) if self.tokenType != CommaTokType: break self.tokenType, self.token, self.lineNo = self.tokens.next() return ASTNode(FuncCallListNodeType, terms) # # The parse error exception class. # class ParseError(Exception): def __init__(self, lineNo, msg): RuntimeError.__init__(self, msg) self.lineNo = lineNo self.msg = msg def getLineNo(self): return self.lineNo def getMsg(self): return self.msg def is_word(token): for letter in token: if letter not in string.ascii_letters: return None return 1 # # Generate the tokens. # Usage: # gen = genTokens(infile) # tokType, tok, lineNo = gen.next() # ... def genTokens(infile): lineNo = 0 while 1: lineNo += 1 try: line = infile.next() except: yield (EOFTokType, None, lineNo) toks = line.split() for tok in toks: if is_word(tok): tokType = WordTokType elif tok == '(': tokType = LParTokType elif tok == ')': tokType = RParTokType elif tok == ',': tokType = CommaTokType yield (tokType, tok, lineNo) def test(infileName): parser = ProgParser() #ipshell('(test) #1\nCtrl-D to exit') result = None try: result = parser.parseFile(infileName) except ParseError, exp: sys.stderr.write('ParseError: (%d) %s\n' % \ (exp.getLineNo(), exp.getMsg())) if result: result.show(0) def usage(): print __doc__
converted by Web2PDFConvert.com
sys.exit(1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() inputfile = args[0] test(inputfile) if __name__ == '__main__': #import pdb; pdb.set_trace() main() Comments and explanation: The tokenizer is a Python generator. It returns a Python generator that can produce "(tokType, tok, lineNo)" tuples. Our tokenizer is so simple-minded that we have to separate all of our tokens with whitespace. (A little later, we'll see how to use Plex to overcome this limitation.) The parser class (ProgParser) contains the recognizer methods that implement the production rules. Each of these methods recognizes a syntactic construct defined by a rule. In our example, these methods have names that end with "_reco". We could have, alternatively, implemented our recognizers as global functions, instead of as methods in a class. However, using a class gives us a place to "hang" the variables that are needed across methods and saves us from having to use ("evil") global variables. A recognizer method recognizes terminals (syntactic elements on the right-hand side of the grammar rule for which there is no grammar rule) by (1) checking the token type and the token value, and then (2) calling the tokenizer to get the next token (because it has consumed a token). A recognizer method checks for and processes a non-terminal (syntactic elements on the right-hand side for which there is a grammar rule) by calling the recognizer method that implements that non-terminal. If a recognizer method finds a syntax error, it raises an exception of class ParserError. Since our example recursive descent parser creates an AST (an abstract syntax tree), whenever a recognizer method successfully recognizes a syntactic construct, it creates an instance of class ASTNode to represent it and returns that instance to its caller. The instance of ASTNode has a node type and contains child nodes which were constructed by recognizer methods called by this one (i.e. that represent non-terminals on the right-hand side of a grammar rule). Each time a recognizer method "consumes a token", it calls the tokenizer to get the next token (and token type and line number). The tokenizer returns a token type in addition to the token value. It also returns a line number for error reporting. The syntax tree is constructed from instances of class ASTNode. The ASTNode class has a show method, which walks the AST and produces output. You can imagine that a similar method could do code generation. And, you should consider the possibility of writing analogous tree walk methods that perform tasks such as optimization, annotation of the AST, etc. And, here is a sample of the data we can apply this parser to: aaa ( ) bbb ( ccc ( ) ) ddd ( eee ( ) , fff ( ggg ( ) , hhh ( ) , iii ( ) ) ) And, if we run the parser on the this input data, we see: $ python workbook045.py workbook045.data Node -- Type ProgNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: aaa Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: bbb Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ccc Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ddd Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType
converted by Web2PDFConvert.com
Child: eee Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: fff Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ggg Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: hhh Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: iii Node -- Type FuncCallListNodeType
token = scanner.read() if token[0] is None: break position = scanner.position() posstr = ('(%d, %d)' % (position[1], position[2], )).ljust(10) tokstr = '"%s"' % token[1] tokstr = tokstr.ljust(20) print '%s tok: %s tokType: %s' % (posstr, tokstr, token[0],) print 'line_count: %d' % scanner.line_count def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infileName = args[0] test(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main() Here is a bit of data on which we can use the above lexer: mass = (height * (* some comment *) width * depth) / density totalmass = totalmass + mass And, when we apply the above test program to this data, here is what we see: $ python plex_example.py plex_example.data (1, 0) tok: "mass" tokType: ident (1, 5) tok: "=" tokType: operator (1, 7) tok: "(" tokType: lpar (1, 8) tok: "height" tokType: ident (1, 15) tok: "*" tokType: operator (1, 36) tok: "width" tokType: ident (1, 42) tok: "*" tokType: operator (1, 44) tok: "depth" tokType: ident (1, 49) tok: ")" tokType: rpar (1, 51) tok: "/" tokType: operator (1, 53) tok: "density" tokType: ident -----------------------------------------------------------(2, 0) tok: "totalmass" tokType: ident (2, 10) tok: "=" tokType: operator (2, 12) tok: "totalmass" tokType: ident (2, 22) tok: "+" tokType: operator (2, 24) tok: "mass" tokType: ident -----------------------------------------------------------line_count: 2 Comments and explanation: Create a lexicon from scanning patterns. See the Plex tutorial and reference (and below) for more information on how to construct the patterns that match various tokens. Create a scanner with a lexicon, an input file, and an input file name. The call "scanner.read()" gets the next token. It returns a tuple containing (1) the token value and (2) the token type. The call "scanner.position()" gets the position of the current token. It returns a tuple containing (1) the input file name, (2) the line number, and (3) the column number. We can execute a method when a given token is found by specifying the function as the token action. In our example, the function is count_lines. Maintaining a line count is actually unneeded, since the position gives us this information. However, notice how we are able to maintain a value (in our case line_count) as an attribute of the scanner. And, here are some comments on constructing the patterns used in a lexicon:
Plex.Range constructs a pattern that matches any character in the range. Plex.Rep constructs a pattern that matches a sequence of zero or more items. Plex.Rep1 constructs a pattern that matches a sequence of one or more items. pat1 + pat2 constructs a pattern that matches a sequence containing pat1 followed by pat2. pat1 | pat2 constructs a pattern that matches either pat1 or pat2. Plex.Any constructs a pattern that matches any one character in its argument.
Now let's revisit our recursive descent parser, this time with a tokenizer built with Plex. The tokenizer is trivial, but will serve as an example of how to hook it into a parser:
converted by Web2PDFConvert.com
#!/usr/bin/env python """ A recursive descent parser example using Plex. This example uses Plex to implement a tokenizer. Usage: python python_201_rparser_plex.py [options] <inputfile> Options: -h, --help Display this help message. Example: python python_201_rparser_plex.py myfile.txt The grammar: Prog ::= Command | Command Prog Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call | Func_call ',' Func_call_list Term = <word> """ import sys, string, types import getopt import Plex ## from IPython.Shell import IPShellEmbed ## ipshell = IPShellEmbed((), ## banner = '>>>>>>>> Into IPython >>>>>>>>', ## exit_msg = '<<<<<<<< Out of IPython <<<<<<<<') # # Constants # # AST node types NoneNodeType = 0 ProgNodeType = 1 CommandNodeType = 2 FuncCallNodeType = 3 FuncCallListNodeType = 4 TermNodeType = 5 # Token types NoneTokType = LParTokType = RParTokType = WordTokType = CommaTokType = EOFTokType = 0 1 2 3 4 5
# Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args: self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child:
converted by Web2PDFConvert.com
item.show(level) else: self.showLevel(level) print 'Child:', child def showLevel(self, level): for idx in range(level): print ' ', # # The recursive descent parser class. # Contains the "recognizer" methods, which implement the grammar # rules (above), one recognizer method for each production rule. # class ProgParser: def __init__(self): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = None self.tokens = None def parseFile(self, infileName): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = file(infileName, 'r') self.tokens = genTokens(self.infile, infileName) try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() self.infile.close() self.infile = None return result def parseStream(self, instream): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.tokens = genTokens(self.instream, '<stream>') try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty stream' result = self.prog_reco() self.infile.close() self.infile = None return result def prog_reco(self): commandList = [] while 1: result = self.command_reco() if not result: break commandList.append(result) return ASTNode(ProgNodeType, commandList) def command_reco(self): if self.tokenType == EOFTokType: return None result = self.func_call_reco() return ASTNode(CommandNodeType, result) def func_call_reco(self): if self.tokenType == WordTokType: term = ASTNode(TermNodeType, self.token) self.tokenType, self.token, self.lineNo = self.tokens.next() if self.tokenType == LParTokType: self.tokenType, self.token, self.lineNo = self.tokens.next() result = self.func_call_list_reco() if result: if self.tokenType == RParTokType: self.tokenType, self.token, self.lineNo = \ self.tokens.next() return ASTNode(FuncCallNodeType, term, result)
converted by Web2PDFConvert.com
else: raise ParseError(self.lineNo, 'missing right paren') else: raise ParseError(self.lineNo, 'bad func call list') else: raise ParseError(self.lineNo, 'missing left paren') else: return None def func_call_list_reco(self): terms = [] while 1: result = self.func_call_reco() if not result: break terms.append(result) if self.tokenType != CommaTokType: break self.tokenType, self.token, self.lineNo = self.tokens.next() return ASTNode(FuncCallListNodeType, terms) # # The parse error exception class. # class ParseError(Exception): def __init__(self, lineNo, msg): RuntimeError.__init__(self, msg) self.lineNo = lineNo self.msg = msg def getLineNo(self): return self.lineNo def getMsg(self): return self.msg # # Generate the tokens. # Usage - example # gen = genTokens(infile) # tokType, tok, lineNo = gen.next() # ... def genTokens(infile, infileName): letter = Plex.Range("AZaz") digit = Plex.Range("09") name = letter + Plex.Rep(letter | digit) lpar = Plex.Str('(') rpar = Plex.Str(')') comma = Plex.Str(',') comment = Plex.Str("#") + Plex.Rep(Plex.AnyBut("\n")) space = Plex.Any(" \t\n") lexicon = Plex.Lexicon([ (name, 'word'), (lpar, 'lpar'), (rpar, 'rpar'), (comma, 'comma'), (comment, Plex.IGNORE), (space, Plex.IGNORE), ]) scanner = Plex.Scanner(lexicon, infile, infileName) while 1: tokenType, token = scanner.read() name, lineNo, columnNo = scanner.position() if tokenType == None: tokType = EOFTokType token = None elif tokenType == 'word': tokType = WordTokType elif tokenType == 'lpar': tokType = LParTokType elif tokenType == 'rpar': tokType = RParTokType elif tokenType == 'comma': tokType = CommaTokType else: tokType = NoneTokType tok = token yield (tokType, tok, lineNo) def test(infileName): parser = ProgParser() #ipshell('(test) #1\nCtrl-D to exit')
converted by Web2PDFConvert.com
result = None try: result = parser.parseFile(infileName) except ParseError, exp: sys.stderr.write('ParseError: (%d) %s\n' % \ (exp.getLineNo(), exp.getMsg())) if result: result.show(0) def usage(): print __doc__ sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() infileName = args[0] test(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main() And, here is a sample of the data we can apply this parser to: # Test for recursive descent parser and Plex. # Command #1 aaa() # Command #2 bbb (ccc()) # An end of line comment. # Command #3 ddd(eee(), fff(ggg(), hhh(), iii())) # End of test And, when we run our parser, it produces the following: $ python plex_recusive.py plex_recusive.data Node -- Type ProgNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: aaa Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: bbb Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ccc Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ddd Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: eee Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: fff Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ggg Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType
converted by Web2PDFConvert.com
Node -- Type TermNodeType Child: hhh Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: iii Node -- Type FuncCallListNodeType Comments: We can now put comments in our input, and they will be ignored. Comments begin with a "#" and continue to the end of line. See the definition of comment in function genTokens. This tokenizer does not require us to separate tokens with whitespace as did the simple tokenizer in the earlier version of our recursive descent parser. The changes we made over the earlier version were to: 1. Import Plex. 2. Replace the definition of the tokenizer function genTokens. 3. Change the call to genTokens so that the call passes in the file name, which is needed to create the scanner. Our new version of genTokens does the following: 1. Create patterns for scanning. 2. Create a lexicon (an instance of Plex.Lexicon), which uses the patterns. 3. Create a scanner (an instance of Plex.Scanner), which uses the lexicon. 4. Execute a loop that reads tokens (from the scanner) and "yields" each one.
# End of test """ import import import import import sys types getopt ply.lex as lex ply.yacc as yacc
# # Globals # startlinepos = 0 # # Constants # # AST node types NoneNodeType = ProgNodeType = CommandNodeType = CommandListNodeType = FuncCallNodeType = FuncCallListNodeType = TermNodeType = 0 1 2 3 4 5 6
# Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', CommandListNodeType: 'CommandListNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args: self.children.append(item) def append(self, item): self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type: %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child: item.show(level) else: self.showLevel(level) print 'Value:', child def showLevel(self, level): for idx in range(level): print ' ', # # Exception classes # class LexerError(Exception): def __init__(self, msg, lineno, columnno): self.msg = msg self.lineno = lineno self.columnno = columnno def show(self): sys.stderr.write('Lexer error (%d, %d) %s\n' % \ (self.lineno, self.columnno, self.msg)) class ParserError(Exception):
converted by Web2PDFConvert.com
def __init__(self, msg, lineno, columnno): self.msg = msg self.lineno = lineno self.columnno = columnno def show(self): sys.stderr.write('Parser error (%d, %d) %s\n' % \ (self.lineno, self.columnno, self.msg)) # # Lexer specification # tokens = ( 'NAME', 'LPAR','RPAR', 'COMMA', ) # Tokens t_LPAR = t_RPAR = t_COMMA = t_NAME = r'\(' r'\)' r'\,' r'[a-zA-Z_][a-zA-Z0-9_]*'
# Ignore whitespace t_ignore = ' \t' # Ignore comments ('#' to end of line) def t_COMMENT(t): r'\#[^\n]*' pass def t_newline(t): r'\n+' global startlinepos startlinepos = t.lexer.lexpos - 1 t.lineno += t.value.count("\n") def t_error(t): global startlinepos msg = "Illegal character '%s'" % (t.value[0]) columnno = t.lexer.lexpos - startlinepos raise LexerError(msg, t.lineno, columnno) # # Parser specification # def p_prog(t): 'prog : command_list' t[0] = ASTNode(ProgNodeType, t[1]) def p_command_list_1(t): 'command_list : command' t[0] = ASTNode(CommandListNodeType, t[1]) def p_command_list_2(t): 'command_list : command_list command' t[1].append(t[2]) t[0] = t[1] def p_command(t): 'command : func_call' t[0] = ASTNode(CommandNodeType, t[1]) def p_func_call_1(t): 'func_call : term LPAR RPAR' t[0] = ASTNode(FuncCallNodeType, t[1]) def p_func_call_2(t): 'func_call : term LPAR func_call_list RPAR' t[0] = ASTNode(FuncCallNodeType, t[1], t[3]) def p_func_call_list_1(t): 'func_call_list : func_call' t[0] = ASTNode(FuncCallListNodeType, t[1]) def p_func_call_list_2(t): 'func_call_list : func_call_list COMMA func_call' t[1].append(t[3]) t[0] = t[1]
converted by Web2PDFConvert.com
def p_term(t): 'term : NAME' t[0] = ASTNode(TermNodeType, t[1]) def p_error(t): global startlinepos msg = "Syntax error at '%s'" % t.value columnno = t.lexer.lexpos - startlinepos raise ParserError(msg, t.lineno, columnno) # # Parse the input and display the AST (abstract syntax tree) # def parse(infileName): startlinepos = 0 # Build the lexer lex.lex(debug=1) # Build the parser yacc.yacc() # Read the input infile = file(infileName, 'r') content = infile.read() infile.close() try: # Do the parse result = yacc.parse(content) # Display the AST result.show(0) except LexerError, exp: exp.show() except ParserError, exp: exp.show() USAGE_TEXT = __doc__ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() infileName = args[0] parse(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main() Applying this parser to the following input: # Test for recursive descent parser and Plex. # Command #1 aaa() # Command #2 bbb (ccc()) # An end of line comment. # Command #3 ddd(eee(), fff(ggg(), hhh(), iii())) # End of test produces the following output: Node -- Type: ProgNodeType Node -- Type: CommandListNodeType Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: aaa
converted by Web2PDFConvert.com
Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: bbb Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ccc Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ddd Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: eee Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: fff Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ggg Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: hhh Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: iii Comments and explanation: Creating the syntax tree -- Basically, each rule (1) recognizes a non-terminal, (2) creates a node (possibly using the values from the right-hand side of the rule), and (3) returns the node by setting the value of t[0]. A deviation from this is the processing of sequences, discussed below. Sequences -- p_command_list_1 and p_command_list_1 show how to handle sequences of items. In this case: p_command_list_1 recognizes a command and creates an instance of ASTNode with type CommandListNodeType and adds the command to it as a child, and p_command_list_2 recognizes an additional command and adds it (as a child) to the instance of ASTNode that represents the list. Distinguishing between different forms of the same rule -- In order to process alternatives to the same production rule differently, we use different functions with different implementations. For example, we use: p_func_call_1 to recognize and process "func_call : term LPAR RPAR" (a function call without arguments), and p_func_call_2 to recognize and process "func_call : term LPAR func_call_list RPAR" (a function call with arguments). Reporting errors -- Our parser reports the first error and quits. We've done this by raising an exception when we find an error. We implement two exception classes: LexerError and ParserError. Implementing more than one exception class enables us to distinguish between different classes of errors (note the multiple except: clauses on the try: statement in function parse). And, we use an instance of the exception class as a container in order to "bubble up" information about the error (e.g. a message, a line number, and a column number).
converted by Web2PDFConvert.com
fieldDef = Word(alphanums) lineDef = fieldDef + ZeroOrMore("," + fieldDef) def test(): args = sys.argv[1:] if len(args) != 1: print 'usage: python pyparsing_test1.py <datafile.txt>' sys.exit(-1) infilename = sys.argv[1] infile = file(infilename, 'r') for line in infile: fields = lineDef.parseString(line) print fields test() Here is some sample data: abcd,defg 11111,22222,33333 And, when we run our parser on this data file, here is what we see: $ python comma_parser.py sample1.data ['abcd', ',', 'defg'] ['11111', ',', '22222', ',', '33333'] Notes and explanation: Note how the grammar is constructed from normal Python calls to function and object/class constructors. I've constructed the parser in-line because my example is simple, but constructing the parser in a function or even a module might make sense for more complex grammars. pyparsing makes it easy to use these these different styles. Use "+" to specify a sequence. In our example, a lineDef is a fieldDef followed by .... Use ZeroOrMore to specify repetition. In our example, a lineDef is a fieldDef followed by zero or more occurances of comma and fieldDef. There is also OneOrMore when you want to require at least one occurance. Parsing comma delimited text happens so frequently that pyparsing provides a shortcut. Replace: lineDef = fieldDef + ZeroOrMore("," + fieldDef) with: lineDef = delimitedList(fieldDef) And note that delimitedList takes an optional argument delim used to specify the delimiter. The default is a comma.
in the second string. So, our definition of identifier matches a word whose first character is an alpha and whose remaining characters are alpha-numerics or underscore. As another example, you can think of Word("0123456789") as analogous to a regexp containing the pattern "[0-9]+". Use a vertical bar for alternation. In our example, an arg can be either an identifier or an integer.
Here is output from parsing the above input: [['Jabberer', 'Jerry'], '111-222-3333', [['Bakersfield'], 'CA', '95111']] [['Kackler', 'Kerry'], '111-222-3334', [['Fresno'], 'CA', '95112']] [['Louderdale', 'Larry'], '111-222-3335', [['Los', 'Angeles'], 'CA', '94001']] Comments: We use the len=n argument to the Word constructor to restict the parser to accepting a specific number of characters, for example in the zip code and phone number. Word also accepts min=n'' and ``max=nto enable you to restrict the length of a word to within a range. We use Group to group the parsed results into sub-lists, for example in the definition of city and name. Group enables us to organize the parse results into simple parse trees. We use Combine to join parsed results back into a single string. For example, in the phone number, we can require dashes and yet join the results back into a single string. We use Suppress to remove unneeded sub-elements from parsed results. For example, we do not need the comma between last and first name.
converted by Web2PDFConvert.com
delimitedList import pprint testData = """ +-------+------+------+------+------+------+------+------+------+ | | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 | +=======+======+======+======+======+======+======+======+======+ | min | 7 | 43 | 7 | 15 | 82 | 98 | 1 | 37 | | max | 11 | 52 | 10 | 17 | 85 | 112 | 4 | 39 | | ave | 9 | 47 | 8 | 16 | 84 | 106 | 3 | 38 | | sdev | 1 | 3 | 1 | 1 | 1 | 3 | 1 | 1 | +-------+------+------+------+------+------+------+------+------+ """ # Define grammar for datatable heading = (Literal( "+-------+------+------+------+------+------+------+------+------+") + "| | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 |" + "+=======+======+======+======+======+======+======+======+======+").suppress() vert = Literal("|").suppress() number = Word(nums) rowData = Group( vert + Word(alphas) + vert + delimitedList(number,"|") + vert ) trailing = Literal( "+-------+------+------+------+------+------+------+------+------+").suppress() datatable = heading + Dict( ZeroOrMore(rowData) ) + trailing def main(): # Now parse data and print results data = datatable.parseString(testData) print "data:", data print "data.asList():", pprint.pprint(data.asList()) print "data keys:", data.keys() print "data['min']:", data['min'] print "data.max:", data.max if __name__ == '__main__': main() When we run this, it produces the following: data: [['min', '7', '43', '7', '15', '82', '98', '1', '37'], ['max', '11', '52', '10', '17', '85', '112', '4', '39'], ['ave', '9', '47', '8', '16', '84', '106', '3', '38'], ['sdev', '1', '3', '1', '1', '1', '3', '1', '1']] data.asList():[['min', '7', '43', '7', '15', '82', '98', '1', '37'], ['max', '11', '52', '10', '17', '85', '112', '4', '39'], ['ave', '9', '47', '8', '16', '84', '106', '3', '38'], ['sdev', '1', '3', '1', '1', '1', '3', '1', '1']] data keys: ['ave', 'min', 'sdev', 'max'] data['min']: ['7', '43', '7', '15', '82', '98', '1', '37'] data.max: ['11', '52', '10', '17', '85', '112', '4', '39'] Notes: Note the use of Dict to create a dictionary. The print statements show how to get at the items in the dictionary. Note how we can also get the parse results as a list by using method asList. Again, we use suppress to remove unneeded items from the parse results.
converted by Web2PDFConvert.com
2.7.2 PyGtk
Information about PyGTK is here: The PyGTK home page -- http://www.pygtk.org//.
Here is a sample that displays a message box: #!/usr/bin/env python import sys import getopt import gtk class MessageBox(gtk.Dialog): def __init__(self, message="", buttons=(), pixmap=None, modal= True): gtk.Dialog.__init__(self) self.connect("destroy", self.quit) self.connect("delete_event", self.quit) if modal: self.set_modal(True) hbox = gtk.HBox(spacing=5) hbox.set_border_width(5) self.vbox.pack_start(hbox) hbox.show() if pixmap: self.realize() pixmap = Pixmap(self, pixmap) hbox.pack_start(pixmap, expand=False) pixmap.show() label = gtk.Label(message) hbox.pack_start(label) label.show() for text in buttons: b = gtk.Button(text) b.set_flags(gtk.CAN_DEFAULT) b.set_data("user_data", text) b.connect("clicked", self.click) self.action_area.pack_start(b) b.show() self.ret = None def quit(self, *args): self.hide() self.destroy() gtk.main_quit() def click(self, button): self.ret = button.get_data("user_data") self.quit() # create a message box, and return which button was pressed def message_box(title="Message Box", message="", buttons=(), pixmap=None, modal= True): win = MessageBox(message, buttons, pixmap=pixmap, modal=modal) win.set_title(title) win.show() gtk.main() return win.ret def test(): result = message_box(title='Test #1', message='Here is your message', buttons=('Ok', 'Cancel')) print 'result:', result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message.
converted by Web2PDFConvert.com
Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': #import pdb; pdb.set_trace() main() Some explanation: First, we import gtk Next we define a class MessageBox that implements a message box. Here are a few important things to know about that class: It is a subclass of gtk.Dialog. It creates a label and packs it into the dialog's client area. Note that a Dialog is a Window that contains a vbox at the top of and an action_area at the bottom of its client area. The intension is for us to pack miscellaneous widgets into the vbox and to put buttons such as "Ok", "Cancel", etc into the action_area. It creates one button for each button label passed to its constructor. The buttons are all connected to the click method. The click method saves the value of the user_data for the button that was clicked. In our example, this value will be either "Ok" or "Cancel". And, we define a function (message_box) that (1) creates an instance of the MessageBox class, (2) sets its title, (3) shows it, (4) starts its event loop so that it can get and process events from the user, and (5) returns the result to the caller (in this case "Ok" or "Cancel"). Our testing function (test) calls function message_box and prints the result. This looks like quite a bit of code, until you notice that the class MessageBox and the function message_box could be put it a utility module and reused.
button = gtk.Button("Cancel") button.connect("clicked", self.quit) button.set_flags(gtk.CAN_DEFAULT) self.action_area.pack_start(button) button.show() self.ret = None def quit(self, w=None, event=None): self.hide() self.destroy() gtk.main_quit() def click(self, button): self.ret = self.entry.get_text() self.quit() def input_box(title="Input Box", message="", default_text='', modal=True): win = EntryDialog(message, default_text, modal=modal) win.set_title(title) win.show() gtk.main() return win.ret def test(): result = input_box(title='Test #2', message='Enter a valuexxx:', default_text='a default value') if result is None: print 'Canceled' else: print 'result: "%s"' % result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message. Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': #import pdb; pdb.set_trace() main() Most of the explanation for the message box example is relevant to this example, too. Here are some differences: Our EntryDialog class constructor creates instance of gtk.Entry, sets its default value, and packs it into the client area. The constructor also automatically creates two buttons: "OK" and "Cancel". The "OK" button is connect to the click method, which saves the value of the entry field. The "Cancel" button is connect to the quit method, which does not save the value. And, if class EntryDialog and function input_box look usable and useful, add them to your utility gui module.
converted by Web2PDFConvert.com
class FileChooser(gtk.FileSelection): def __init__(self, modal=True, multiple=True): gtk.FileSelection.__init__(self) self.multiple = multiple self.connect("destroy", self.quit) self.connect("delete_event", self.quit) if modal: self.set_modal(True) self.cancel_button.connect('clicked', self.quit) self.ok_button.connect('clicked', self.ok_cb) if multiple: self.set_select_multiple(True) self.ret = None def quit(self, *args): self.hide() self.destroy() gtk.main_quit() def ok_cb(self, b): if self.multiple: self.ret = self.get_selections() else: self.ret = self.get_filename() self.quit() def file_sel_box(title="Browse", modal=False, multiple=True): win = FileChooser(modal=modal, multiple=multiple) win.set_title(title) win.show() gtk.main() return win.ret def file_open_box(modal=True): return file_sel_box("Open", modal=modal, multiple=True) def file_save_box(modal=True): return file_sel_box("Save As", modal=modal, multiple=False) def test(): result = file_open_box() print 'open result:', result result = file_save_box() print 'save result:', result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message. Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': main() #import pdb #pdb.run('main()') A little guidance: There is a pre-defined file selection dialog. We sub-class it. This example displays the file selection dialog twice: once with a title "Open" and once with a title "Save As". Note how we can control whether the dialog allows multiple file selections. And, if we select the multiple selection mode, then we use get_selections instead of get_filename in order to get the selected file names.
converted by Web2PDFConvert.com
The dialog contains buttons that enable the user to (1) create a new folder, (2) delete a file, and (3) rename a file. If you do not want the user to perform these operations, then call hide_fileop_buttons. This call is commented out in our sample code. Note that there are also predefined dialogs for font selection (FontSelectionDialog) and color selection (ColorSelectionDialog)
2.7.3 EasyGUI
If your GUI needs are minimalist (maybe a pop-up dialog or two) and your application is imperative rather than event driven, then you may want to consider EasyGUI. As the name suggests, it is extremely easy to use. How to know when you might be able to use EasyGUI: Your application does not need to run in a window containing menus and a menu bar. Your GUI needs amount to little more than displaying a dialog now and then to get responses from the user. You do not want to write an event driven application, that is, one in which your code sits and waits for the the user to initiate operation, for example, with menu items. EasyGUI plus documentation and examples are available at EasyGUI home page at SourceForge -- http://easygui.sourceforge.net/ EasyGUI provides functions for a variety of commonly needed dialog boxes, including: A message box displays a message. A yes/no message box displays "Yes" and "No" buttons. A continue/cancel message box displays "Continue" and "Cancel" buttons. A choice box displays a selection list. An enter box allows entry of a line of text. An integer box allows entry of an interger. A multiple entry box allows entry into multiple fields. Code and text boxes support the display of text in monospaced or porportional fonts. File and directory boxes enable the user to select a file or a directory. See the documentation at the EasyGUI Web site for more features. For a demonstration of EasyGUI's capabilities, run the easygui.py as a Python script: $ python easygui.py
A Python package is a collection of Python modules in a disk directory. In order to be able to import individual modules from a directory, the directory must contain a file named __init__.py. (Note that requirement does not apply to directories that are listed in PYTHONPATH.) The __init__.py serves several purposes: The presence of the file __init__.py in a directory marks the directory as a Python package, which enables importing modules from the directory. The first time an application imports any module from the directory/package, the code in the module __init__ is evaluated. If the package itself is imported (as opposed to an individual module within the directory/package), then it is the __init__ that is imported (and evaluated).
long_description = 'Tests for installing and distributing Python packages' setup(name = 'testpackages', version = '1.0a', # [2]
converted by Web2PDFConvert.com
description = 'Tests for Python packages', maintainer = 'Dave Kuhlman', maintainer_email = 'dkuhlman@rexx.com', url = 'http://www.rexx.com/~dkuhlman', long_description = long_description, packages = ['testpackages'] ) Explanation:
# [3]
1. We import the necessary component from Distutils. 2. We describe the package and its developer/maintainer. 3. We specify the directory that is to be installed as a package. When the user installs our distribution, this directory and all the modules in it will be installed as a package. Now, to create a distribution file, we run the following: python setup.py sdist --formats=gztar which will create a file testpackages-1.0a.tar.gz under the directory dist. Then, you can give this distribution file to a potential user, who can install it by doing the following: $ $ $ $ tar xvzf testpackages-1.0a.tar.gz cd testpackages-1.0a python setup.py build python setup.py install # as root
Case is significant. Exercises: 1. Which of the following are valid names? 1. total 2. total_of_all_vegetables 3. big-title-1 4. _inner_func 5. 1bigtitle 6. bigtitle1 2. Which or the following pairs are the same name: 1. the_last_item and the_last_item 2. the_last_item and The_Last_Item 3. itemi and itemj 4. item1 and iteml Solutions: 1. Items 1, 2, 4, and 6 are valid. Item 3 is not a single name, but is three items separated by the minus operator. Item 5 is not valid because it begins with a digit. 2. Python names are case-sensitive, which means: 1. the_last_item and the_last_item are the same. 2. the_last_item and The_Last_Item are different -- The second name has an upper-case characters. 3. itemi and itemj are different. 4. item1 and iteml are different -- This one may be difficult to see, depending on the font you are viewing. One name ends with the digit one; the other ends with the alpha character "el". And this example provides a good reason to use "1" and "l" judiciously in names. The following are keywords in Python and should not be used as variable names: and as assert break class continue def Exercises: 1. Which of the following are valid names in Python? 1. _global 2. global 3. file Solutions: 1. Do not use keywords for variable names: 1. Valid 2. Not a valid name. "global" is a keyword. 3. Valid, however, "file" is the name of a built-in type, as you will learn later, so you are advised not to redefine it. Here are a few of the names of built-in types: "file", "int", "str", "float", "list", "dict", etc. See Built-in Types -- http://docs.python.org/lib/types.html for more built-in types.. The following are operators in Python and will separate names: + << < and Also: >> > or () * & <= is [] ** | >= not . (dot) / ^ == in // ~ != % <> del elif else except exec finally for from global if import in is lambda not or pass print raise return try while with yield
But, note that the Python style guide suggests that you place blanks around binary operators. One exception to this rule is function arguments and parameters for functions: it is suggested that you not put blanks around the equal sign (=) used to specify keyword arguments and default parameters. Exercises: 1. Which of the following are single names and which are names separated by operators? 1. fruit_collection 2. fruit-collection Solutions:
converted by Web2PDFConvert.com
1. Do not use a dash, or other operator, in the middle of a name: 1. fruit_collection is a single name 2. fruit-collection is two names separated by a dash.
converted by Web2PDFConvert.com
def show_sum(x, y): Solutions: 1. Indentation indicates that one statement is nested inside another statement: if x > 0: print x 2. Indentation indicates that a block of statements is nested inside another statement: def show_sum(x, y): z = x + y print z
converted by Web2PDFConvert.com
3.4.1 Numbers
The numbers you will use most commonly are likely to be integers and floats. Python also has long integers and complex numbers. A few facts about numbers (in Python): Python will convert to using a long integer automatically when needed. You do not need to worry about exceeding the size of a (standard) integer. The size of the largest integer in your version of Python is in sys.maxint. To learn what it is, do: >>> import sys >>> print sys.maxint 9223372036854775807 The above show the maximum size of an integer on a 64-bit version of Python. You can convert from integer to float by using the float constructor. Example: >>> x = 25 >>> y = float(x) >>> print y 25.0 Python does "mixed arithmetic". You can add, multiply, and divide integers and floats. When you do, Python "promotes" the result to a float.
converted by Web2PDFConvert.com
2. 0.0, 0., or .0 3. 101 4. 1000.0 5. 1e3 or 1.0e3 6. Asigning integer values to variables: In [7]: value1 = 23 In [8]: value2 = -14 In [9]: value3 = 0 In [10]: value1 Out[10]: 23 In [11]: value2 Out[11]: -14 In [12]: value3 Out[12]: 0 7. Assigning expression values to variables: value1 = 4 * (3 + 5) value2 = (value1 / 3.0) - 2 8. Assigning floats to variables: value1 = 0.01 value2 = -3.0 value3 = 3e-4 9. Assigning expressions containing varialbes: value4 = value1 * (value2 - value3) value4 = value1 + value2 + value3 - value4 10. Mixed arithmetic: x = 5 y = 8 z = float(x) / y You can also construct integers and floats using the class. Calling a class (using parentheses after a class name, for example) produces an instance of the class. Exercises: 1. Construct an integer from the string "123". 2. Construct a float from the integer 123. 3. Construct an integer from the float 12.345. Solutions: 1. Use the int data type to construct an integer instance from a string: int("123") 2. Use the float data type to construct a float instance from an integer: float(123) 3. Use the int data type to construct an integer instance from a float: int(12.345) # --> 12
<< <
>> >
& <=
| >=
^ ==
~ !=
<>
Look here for an explanation of these operators when applied to numbers: Numeric Types -- int, float, long, complex -http://docs.python.org/lib/typesnumeric.html. Some operators take precedence over others. The table in the Web page just referenced above also shows that order of priority. Here is a bit of that table: All numeric types (except complex) support the following operations, sorted by ascending priority (operations in the same box have the same priority; all numeric operations have a higher priority than comparison operations): Operation --------x + y x - y x * y x / y x // y x % y -x +x abs(x) int(x) long(x) float(x) complex(re,im) Result -----sum of x and y difference of x and y product of x and y quotient of x and y (floored) quotient of x and y remainder of x / y x negated x unchanged absolute value or magnitude of x x converted to integer x converted to long integer x converted to floating point a complex number with real part re, imaginary part im. im defaults to zero. c.conjugate() conjugate of the complex number c divmod(x, y) the pair (x // y, x % y) pow(x, y) x to the power y x ** y x to the power y Notice also that the same operator may perform a different function depending on the data type of the value to which it is applied. Exercises: 1. Add the numbers 3, 4, and 5. 2. Add 2 to the result of multiplying 3 by 4. 3. Add 2 plus 3 and multiply the result by 4. Solutions: 1. Arithmetic expressions are follow standard infix algebraic syntax: 3 + 4 + 5 2. Use another infix expression: 2 + 3 * 4 Or: 2 + (3 * 4) But, in this case the parentheses are not necessary because the * operator binds more tightly than the + operator. 3. Use parentheses to control order of evaluation: (2 + 3) * 4 Note that the * operator has precedence over (binds tighter than) the + operator, so the parentheses are needed. Python does mixed arithemetic. When you apply an operation to an integer and a float, it promotes the result to the "higher" data type, a float. If you need to perform an operation on several integers, but want use a floating point operation, first convert one of the integers to a float using float(x), which effectively creates an instance of class float. Try the following at your Python interactive prompt: 1. 1.0 + 2
converted by Web2PDFConvert.com
2. 2 / 3 -- Notice that the result is truncated. 3. float(2) / 3 -- Notice that the result is not truncated. Exercises: 1. Given the following assignments: x = 20 y = 50 Divide x by y giving a float result. Solutions: 1. Promote one of the integers to float before performing the division: z = float(x) / y
3.4.2 Lists
Lists are a container data type that acts as a dynamic array. That is to say, a list is a sequence that can be indexed into and that can grow and shrink. A tuple is an index-able container, like a list, except that a tuple is immutable. A few characteristics of lists and tuples: A list has a (current) length -- Get the length of a list with len(mylist). A list has an order -- The items in a list are ordered, and you can think of that order as going from left to right. A list is heterogeous -- You can insert different types of objects into the same list. Lists are mutable, but tuples are not. Thus, the following are true of lists, but not of tuples: You can extended or add to a list. You can shrink a list by deleting items from it. You can insert items into the middle of a list or at the beginning of a list. You can add items to the end of a list. You can change which item is at a given position in a list.
2. A list can contain lists. In fact a list can contain any kind of object: >>> [1, [2, 3], 4, [5, 6, 7, ], 8] 3. Lists are heterogenous, that is, different kinds of objects can be in the same list. Here is a list that contains a number, a string, and another list: >>> [123, 'abc', [456, 789]] Exercises:
converted by Web2PDFConvert.com
1. Create (define) the following tuples and lists using a literal: 1. A tuple of integers 2. A tuple of strings 3. A list of integers 4. A list of strings 5. A list of tuples or tuple of lists 6. A list of integers and strings and tuples 7. A tuple containing exactly one item 8. An empty tuple 2. Do each of the following: 1. Print the length of a list. 2. Print each item in the list -- Iterate over the items in one of your lists. Print each item. 3. Append an item to a list. 4. Insert an item at the beginning of a list. Insert an item in the middle of a list. 5. Add two lists together. Do so by using both the extend method and the plus (+) operator. What is the difference between extending a list and adding two lists? 6. Retrieve the 2nd item from one of your tuples or lists. 7. Retrieve the 2nd, 3rd, and 4th items (a slice) from one of your tuples or lists. 8. Retrieve the last (right-most) item in one of your lists. 9. Replace an item in a list with a new item. 10. Pop one item off the end of your list. 11. Delete an item from a list. 12. Do the following list manipulations: 1. Write a function that takes two arguments, a list and an item, and that appends the item to the list. 2. Create an empty list, 3. Call your function several times to append items to the list. 4. Then, print out each item in the list. Solutions: 1. We can define list literals at the Python or IPython interactive prompt: 1. Create a tuple using commas, optionally with parentheses: In [1]: a1 = (11, 22, 33, ) In [2]: a1 Out[2]: (11, 22, 33) 2. Quoted characters separated by commas create a tuple of strings: In [3]: a2 = ('aaa', 'bbb', 'ccc') In [4]: a2 Out[4]: ('aaa', 'bbb', 'ccc') 3. Items separated by commas inside square brackets create a list: In [26]: a3 = [100, 200, 300, ] In [27]: a3 Out[27]: [100, 200, 300] 4. Strings separated by commas inside square brackets create a list of strings: In [5]: a3 = ['basil', 'parsley', 'coriander'] In [6]: a3 Out[6]: ['basil', 'parsley', 'coriander'] In [7]: 5. A tuple or a list can contain tuples and lists: In [8]: a5 = [(11, 22), (33, 44), (55,)] In [9]: a5 Out[9]: [(11, 22), (33, 44), (55,)] 6. A list or tuple can contain items of different types: In [10]: a6 = [101, 102, 'abc', "def", (201, 202), ('ghi', 'jkl')] In [11]: a6 Out[11]: [101, 102, 'abc', 'def', (201, 202), ('ghi', 'jkl')] 7. In order to create a tuple containing exactly one item, we must use a comma: In [13]: a7 = (6,) In [14]: a7 Out[14]: (6,)
converted by Web2PDFConvert.com
8. In order to create an empty tuple, use the tuple class/type to create an instance of a empty tuple: In [21]: In [22]: Out[22]: In [23]: Out[23]: a = tuple() a () type(a) <type 'tuple'>
Solutions: 1. The extend method adds elements from another list, or other iterable: >>> a = [11, 22, 33, 44, ] >>> b = [55, 66] >>> a.extend(b) >>> a [11, 22, 33, 44, 55, 66] 2. Use the append method on a list to add/append an item to the end of a list: >>> a = ['aa', 11] >>> a.append('bb') >>> a.append(22) >>> a ['aa', 11, 'bb', 22] 3. The insert method on a list enables us to insert items at a given position in a list: >>> a = [11, 22, 33, 44, ] >>> a.insert(0, 'aa') >>> a
converted by Web2PDFConvert.com
['aa', 11, 22, 33, 44] >>> a.insert(2, 'bb') >>> a ['aa', 11, 'bb', 22, 33, 44] But, note that we use append to add items at the end of a list. 4. The pop method on a list returns the "right-most" item from a list and removes that item from the list: >>> a = [11, 22, 33, 44, ] >>> >>> b = a.pop() >>> a [11, 22, 33] >>> b 44 >>> b = a.pop() >>> a [11, 22] >>> b 33 Note that the append and pop methods taken together can be used to implement a stack, that is a LIFO (last in first out) data structure.
Here is an example: >>> a = [11, 22, 33, 44] >>> b = [x * 2 for x in a] >>> b [22, 44, 66, 88] Exercises: 1. Given the following list of strings: names = ['alice', 'bertrand', 'charlene'] produce the following lists: (1) a list of all upper case names; (2) a list of capitalized (first letter upper case); 2. Given the following function which calculates the factorial of a number: def t(n): if n <= 1: return n else: return n * t(n - 1) and the following list of numbers: numbers = [2, 3, 4, 5] create a list of the factorials of each of the numbers in the list. Solutions: 1. For our expression in a list comprehension, use the upper and capitalize methods: >>> names = ['alice', 'bertrand', 'charlene'] >>> [name.upper() for name in names]
converted by Web2PDFConvert.com
['ALICE', 'BERTRAND', 'CHARLENE'] >>> [name.capitalize() for name in names] ['Alice', 'Bertrand', 'Charlene'] 2. The expression in our list comprehension calls the factorial function: def t(n): if n <= 1: return n else: return n * t(n - 1) def test(): numbers = [2, 3, 4, 5] factorials = [t(n) for n in numbers] print 'factorials:', factorials if __name__ == '__main__': test() A list comprehension can also contain an if clause. Here is a template: [expr(x) for x in iterable if pred(x)] where:
pred(x) is an expression that evaluates to a true/false value. Values that count as false are numeric zero, False, None, and any empty
collection. All other values count as true. Only values for which the if clause evaluates to true are included in creating the resulting list. Examples: >>> a = [11, 22, 33, 44] >>> b = [x * 3 for x in a if x % 2 == 0] >>> b [66, 132] Exercises: 1. Given two lists, generate a list of all the strings in the first list that are not in the second list. Here are two sample lists: names1 = ['alice', 'bertrand', 'charlene', 'daniel'] names2 = ['bertrand', 'charlene'] Solutions: 1. The if clause of our list comprehension checks for containment in the list names2: def test(): names1 = ['alice', 'bertrand', 'charlene', 'daniel'] names2 = ['bertrand', 'charlene'] names3 = [name for name in names1 if name not in names2] print 'names3:', names3 if __name__ == '__main__': test() When run, this script prints out the following: names3: ['alice', 'daniel']
3.4.3 Strings
A string is an ordered sequence of characters. Here are a few characteristics of strings: A string has a length. Get the length with the len() built-in function. A string is indexable. Get a single character at a position in a string with the square bracket operator, for example mystring[5]. You can retrieve a slice (sub-string) of a string with a slice operation, for example mystring[5:8]. Create strings with single quotes or double quotes. You can put single quotes inside double quotes and you can put double quotes inside single quotes. You can also escape characters with a backslash.
converted by Web2PDFConvert.com
Exercises: 1. Create a string containing a single quote. 2. Create a string containing a double quote. 3. Create a string containing both a single quote a double quote. Solutions: 1. Create a string with double quotes to include single quotes inside the string: >>> str1 = "that is jerry's ball" 2. Create a string enclosed with single quotes in order to include double quotes inside the string: >>> str1 = 'say "goodbye", bullwinkle' 3. Take your choice. Escape either the single quotes or the double quotes with a backslash: >>> str1 = 'say >>> str2 = "say >>> str1 'say "hello" to >>> str2 'say "hello" to "hello" to jerry\'s mom' \"hello\" to jerry's mom" jerry\'s mom' jerry\'s mom'
Triple quotes enable you to create a string that spans multiple lines. Use three single quotes or three double quotes to create a single quoted string. Examples: 1. Create a triple quoted string that contains single and double quotes. Solutions: 1. Use triple single quotes or triple double quotes to create multi-line strings: String1 = '''This string extends across several lines. And, so it has end-of-line characters in it. ''' String2 = """ This string begins and ends with an end-of-line character. It can have both 'single' quotes and "double" quotes in it. """ def test(): print String1 print String2 if __name__ == '__main__': test()
3.4.3.1 Characters
Python does not have a distinct character type. In Python, a character is a string of length 1. You can use the ord() and chr() built-in functions to convert from character to integer and back. Exercises: 1. Create a character "a". 2. Create a character, then obtain its integer representation. Solutions: 1. The character "a" is a plain string of length 1: >>> x = 'a' 2. The integer equivalent of the letter "A": >>> x = "A" >>> ord(x)
converted by Web2PDFConvert.com
65
'home/myusername/Workdir/notes.txt' Notes: Note that importing the os module and then using os.sep from that module gives us a platform independent solution. If you do decide to code the path separator character explicitly and if you are on MS Windows where the path separator is the backslash, then you will need to use a double backslash, because that character is the escape character.
Solutions: 1. The rstrip() method strips whitespace off the right side of a string: >>> s1 = 'some text \n' >>> s1 'some text \n' >>> s2 = s1.rstrip() >>> s2 'some text' 2. The center(n) method centers a string within a padded string of width n: >>> s1 = 'Dave' >>> s2 = s1.center(20) >>> s2 ' Dave ' 3. The upper() method produces a new string that converts all alpha characters in the original to upper case: >>> s1 = 'Banana' >>> s1 'Banana' >>> s2 = s1.upper() >>> s2 'BANANA' 4. The split(sep) method produces a list of strings that are separated by sep in the original string. If sep is omitted, whitespace is treated
converted by Web2PDFConvert.com
as the separator: >>> s1 = """how does it feel ... to be on your own ... no directions known ... like a rolling stone ... """ >>> words = s1.split() >>> words ['how', 'does', 'it', 'feel', 'to', 'be', 'on', 'your', 'own', 'no', 'directions', 'known', 'like', 'a', 'rolling', 'stone'] Note that the split() function in the re (regular expression) module is useful when the separator is more complex than whitespace or a single character. 5. The join() method concatenates strings from a list of strings to form a single string: >>> lines = [] >>> lines.append('how does it feel') >>> lines.append('to be on your own') >>> lines.append('no directions known') >>> lines.append('like a rolling stone') >>> lines ['how does it feel', 'to be on your own', 'no directions known', 'like a rolling stone'] >>> s1 = ''.join(lines) >>> s2 = ' '.join(lines) >>> s3 = '\n'.join(lines) >>> s1 'how does it feelto be on your ownno directions knownlike a rolling stone' >>> s2 'how does it feel to be on your own no directions known like a rolling stone' >>> s3 'how does it feel\nto be on your own\nno directions known\nlike a rolling stone' >>> print s3 how does it feel to be on your own no directions known like a rolling stone
Solutions: 1. We can represent unicode string with either the "u" prefix or with a call to the unicode type:
converted by Web2PDFConvert.com
def exercise1(): a = u'abcd' print a b = unicode('efgh') print b 2. We convert a string from another character encoding into unicode with the decode() string method: import sys def exercise2(): a = 'abcd'.decode('utf-8') print a b = 'abcd'.decode(sys.getdefaultencoding()) print b 3. We can convert a unicode string to another character encoding with the encode() string method: import sys def exercise3(): a = u'abcd' print a.encode('utf-8') print a.encode(sys.getdefaultencoding()) 4. Here are two ways to check the type of a string: import types def exercise4(): a = u'abcd' print type(a) is types.UnicodeType print type(a) is type(u'') 5. We can encode unicode characters in a string in several ways, for example, (1) by defining a utf-8 string and converting it to unicode or (2) defining a string with an embedded unicode character or (3) concatenating a unicode characher into a string: def exercise5(): utf8_string = 'Ivan Krsti\xc4\x87' unicode_string = utf8_string.decode('utf-8') print unicode_string.encode('utf-8') print len(utf8_string) print len(unicode_string) unicode_string = u'aa\u0107bb' print unicode_string.encode('utf-8') unicode_string = 'aa' + unichr(263) + 'bb' print unicode_string.encode('utf-8') Guidance for use of encodings and unicode: 1. Convert/decode from an external encoding to unicode early: my_source_string.decode(encoding) 2. Do your work (Python processing) in unicode. 3. Convert/encode to an external encoding late (for example, just before saving to an external file): my_unicode_string.encode(encoding) For more information, see: Unicode In Python, Completely Demystified -- http://farmdev.com/talks/unicode/ Unicode How-to -- http://www.amk.ca/python/howto/unicode. PEP 100: Python Unicode Integration -- http://www.python.org/dev/peps/pep-0100/ 4.8 codecs -- Codec registry and base classes -- http://docs.python.org/lib/module-codecs.html 4.8.2 Encodings and Unicode -- http://docs.python.org/lib/encodings-overview.html 4.8.3 Standard Encodings -- http://docs.python.org/lib/standard-encodings.html Converting Unicode Strings to 8-bit Strings -- http://effbot.org/zone/unicode-convert.htm
3.4.4 Dictionaries
A dictionary is an un-ordered collection of key-value pairs.
converted by Web2PDFConvert.com
A dictionary has a length, specifically the number of key-value pairs. The keys must be immutable object types.
Pepper Green, Red, Yellow 2. Define a dictionary to represent the "enum" days of the week: Sunday, Monday, Tuesday, ... Solutions: 1. A dictionary whose keys and values are strings can be used to represent this table: vegetables = { 'Eggplant': 'Purple', 'Tomato': 'Red', 'Parsley': 'Green', 'Lemon': 'Yellow', 'Pepper': 'Green', } Note that the open curly bracket enables us to continue this statement across multiple lines without using a backslash. 2. We might use strings for the names of the days of the week as keys: DAYS = { 'Sunday': 'Monday': 'Tuesday': 'Wednesday': 'Thrusday': 'Friday': 'Saturday': }
1, 2, 3, 4, 5, 6, 7,
Dictionaries support the following "operators": Length -- len(d) returns the number of pairs in a dictionary. Indexing -- You can both set and get the value associated with a key by using the indexing operator [ ]. Examples: In [12]: Out[12]: In [13]: In [14]: Out[14]: d3[2] 'GREEN' d3[0] = 'WHITE' d3[0] 'WHITE'
Test for key -- The in operator tests for the existence of a key in a dictionary. Example: In [6]: trees = {'poplar': 'deciduous', 'cedar': 'evergreen'} In [7]: if 'cedar' in trees: ...: print 'The cedar is %s' % (trees['cedar'], ) ...: The cedar is evergreen Exercises: 1. Create an empty dictionary, then use the indexing operator [ ] to in sert the following name-value pairs: "red" -- "255:0:0" "green" -- "0:255:0" "blue" -- "0:0:255" 2. Print out the number of items in your dictionary. Solutions: 1. We can use "[ ]" to set the value of a key in a dictionary: def test(): colors = {} colors["red"] = "255:0:0" colors["green"] = "0:255:0" colors["blue"] = "0:0:255" print 'The value of red is "%s"' % (colors['red'], ) print 'The colors dictionary contains %d items.' % (len(colors), ) test() When we run this, we see: The value of red is "255:0:0" The colors dictionary contains 3 items. 2. The len() built-in function gives us the number of items in a dictionary. See the previous solution for an example of this.
a.values() a.get(k[, x]) a.setdefault(k[, x]) a.pop(k[, x]) a.popitem() a.iteritems() a.iterkeys() a.itervalues()
a copy of a's list of values a[k] if k in a, else x) a[k] if k in a, else x (also setting it) a[k] if k in a, else x (and remove k) (8) remove and return an arbitrary (key, value) pair return an iterator over (key, value) pairs return an iterator over the mapping's keys return an iterator over the mapping's values
You can also find this table at the standard documentation Web site in the "Python Library Reference": Mapping Types -- dict http://docs.python.org/lib/typesmapping.html Exercises: 1. Print the keys and values in the above "vegetable" dictionary. 2. Print the keys and values in the above "vegetable" dictionary with the keys in alphabetical order. 3. Test for the occurance of a key in a dictionary. Solutions: 1. We can use the d.items() method to retrieve a list of tuples containing key-value pairs, then use unpacking to capture the key and value: Vegetables = { 'Eggplant': 'Purple', 'Tomato': 'Red', 'Parsley': 'Green', 'Lemon': 'Yellow', 'Pepper': 'Green', } def test(): for key, value in Vegetables.items(): print 'key:', key, ' value:', value test() 2. We retrieve a list of keys with the keys() method, the sort it with the list sort() method: Vegetables = { 'Eggplant': 'Purple', 'Tomato': 'Red', 'Parsley': 'Green', 'Lemon': 'Yellow', 'Pepper': 'Green', } def test(): keys = Vegetables.keys() keys.sort() for key in keys: print 'key:', key, ' value:', Vegetables[key] test() 3. To test for the existence of a key in a dictionary, we can use either the in operator (preferred) or the d.has_key() method (old style): Vegetables = { 'Eggplant': 'Purple', 'Tomato': 'Red', 'Parsley': 'Green', 'Lemon': 'Yellow', 'Pepper': 'Green', } def test(): if 'Eggplant' in Vegetables: print 'we have %s egplants' % Vegetables['Eggplant'] if 'Banana' not in Vegetables: print 'yes we have no bananas' if Vegetables.has_key('Parsley'): print 'we have leafy, %s parsley' % Vegetables['Parsley'] test()
converted by Web2PDFConvert.com
Which will print out: we have Purple egplants yes we have no bananas we have leafy, Green parsley
3.4.5 Files
A Python file object represents a file on a file system. A file object open for reading a text file is iterable. When we iterate over it, it produces the lines in the file. A file may be opened in these modes: 'r' -- read mode. The file must exist. 'w' -- write mode. The file is created; an existing file is overwritten. 'a' -- append mode. An existing file is opened for writing (at the end of the file). A file is created if it does not exist. The open() built-in function is used to create a file object. For example, the following code (1) opens a file for writing, then (2) for reading, then (3) for appending, and finally (4) for reading again: def test(infilename): # 1. Open the file in write mode, which creates the file. outfile = open(infilename, 'w') outfile.write('line 1\n') outfile.write('line 2\n') outfile.write('line 3\n') outfile.close() # 2. Open the file for reading. infile = open(infilename, 'r') for line in infile: print 'Line:', line.rstrip() infile.close() # 3. Open the file in append mode, and add a line to the end of # the file. outfile = open(infilename, 'a') outfile.write('line 4\n') outfile.close() print '-' * 40 # 4. Open the file in read mode once more. infile = open(infilename, 'r') for line in infile: print 'Line:', line.rstrip() infile.close() test('tmp.txt') Exercises: 1. Open a text file for reading, then read the entire file as a single string, and then split the content on newline characters. 2. Open a text file for reading, then read the entire file as a list of strings, where each string is one line in the file. 3. Open a text file for reading, then iterate of each line in the file and print it out. Solutions: 1. Use the open() built-in function to open the file and create a file object. Use the read() method on the file object to read the entire file. Use the split() or splitlines() methods to split the file into lines: >>> infile = open('tmp.txt', 'r') >>> content = infile.read() >>> infile.close() >>> lines = content.splitlines() >>> print lines ['line 1', 'line 2', 'line 3', ''] 2. The f.readlines() method returns a list of lines in a file: >>> infile = open('tmp.txt', 'r') >>> lines = infile.readlines() >>> infile.close() >>> print lines ['line 1\n', 'line 2\n', 'line 3\n'] 3. Since a file object (open for reading) is itself an iterator, we can iterate over it in a for statement:
converted by Web2PDFConvert.com
""" Test iteration over a text file. Usage: python test.py in_file_name """ import sys def test(infilename): infile = open(infilename, 'r') for line in infile: # Strip off the new-line character and any whitespace on # the right. line = line.rstrip() # Print only non-blank lines. if line: print line infile.close() def main(): args = sys.argv[1:] if len(args) != 1: print __doc__ sys.exit(1) infilename = args[0] test(infilename) if __name__ == '__main__': main() Notes: The last two lines of this solution check the __name__ attribute of the module itself so that the module will run as a script but will not run when the module is imported by another module. The __doc__ attribute of the module gives us the module's doc-string, which is the string defined at the top of the module. sys.argv gives us the command line. And, sys.argv[1:] chops off the program name, leaving us with the comman line arguments.
Test for None with the identity operator is. Exercises: 1. Create a list, some of whose elements are None. Then write a for loop that counts the number of occurances of None in the list. Solutions: 1. The identity operators is and is not can be used to test for None: >>> a = [11, None, 'abc', None, {}] >>> a [11, None, 'abc', None, {}] >>> count = 0 >>> for item in a: ... if item is None: ... count += 1 ... >>> >>> print count 2
x = 3 y = 4 z = 5 What does the following print out: print y > x and z > y Answer -- Prints out "True"
3.5 Statements
3.5.1 Assignment statement
The assignment statement uses the assignment operator =. The assignment statement is a binding statement: it binds a value to a name within a namespace. Exercises: 1. Bind the value "eggplant" to the variable vegetable. Solutions: 1. The = operator is an assignment statement that binds a value to a variable: >>> vegetable = "eggplant" There is also augmented assignment using the operators +=, -=, *=, /=, etc. Exercises: 1. 2. 3. 4. Use augmented assignment to increment the value of an integer. Use augmented assignment to append characters to the end of a string. Use augmented assignment to append the items in one list to another. Use augmented assignment to decrement a variable containing an integer by 1.
Solutions: 1. The += operator increments the value of an integer: >>> >>> >>> 1 >>> >>> 2 count = 0 count += 1 count count += 1 count
2. The += operator appends characters to the end of a string: >>> buffer = 'abcde' >>> buffer += 'fgh' >>> buffer 'abcdefgh' 3. The += operator appends items in one list to another: In [20]: In [21]: In [22]: In [23]: Out[23]: a = [11, 22, 33] b = [44, 55] a += b a [11, 22, 33, 44, 55]
1. The -= operator decrements the value of an integer: >>> >>> 5 >>> >>> 4 count = 5 count count -= 1 count
converted by Web2PDFConvert.com
You can also assign a value to (1) an element of a list, (2) an item in a dictionary, (3) an attribute of an object, etc. Exercises: 1. Create a list of three items, then assign a new value to the 2nd element in the list. 2. Create a dictionary, then assign values to the keys "vegetable" and "fruit" in that dictionary. 3. Use the following code to create an instance of a class: class A(object): pass a = A() Then assign values to an attribue named category in that instance. Solutions: 1. Assignment with the indexing operator [] assigns a value to an element in a list: >>> trees = ['pine', 'oak', 'elm'] >>> trees ['pine', 'oak', 'elm'] >>> trees[1] = 'cedar' >>> trees ['pine', 'cedar', 'elm'] 2. Assignment with the indexing operator [] assigns a value to an item (a key-value pair) in a dictionary: >>> foods = {} >>> foods {} >>> foods['vegetable'] = 'green beans' >>> foods['fruit'] = 'nectarine' >>> foods {'vegetable': 'green beans', 'fruit': 'nectarine'} 3. Assignment along with the dereferencing operator . (dot) enables us to assign a value to an attribute of an object: >>> class A(object): ... pass ... >>> a = A() >>> a.category = 25 >>> a.__dict__ {'category': 25} >>> a.category 25
2. We can print literals and the value of variables: >>> description = 'cute' >>> print 'I am a', description, 'kid.' I am a cute kid. 3. The string formatting operator gives more control over formatting output: >>> name = 'Alice' >>> print 'My name is "%s".' % (name, ) My name is "Alice".
Numeric zero An empty collection, for example an empty list or dictionary An empty string (a string of length zero) All other values count as true. Exercises: 1. Given the following list: >>> bananas = ['banana1', 'banana2', 'banana3',] Print one message if it is an empty list and another messge if it is not. 2. Here is one way of defining a Python equivalent of an "enum": NO_COLOR, RED, GREEN, BLUE = range(4) Write an if: statement which implements the effect of a "switch" statement in Python. Print out a unique message for each color. Solutions: 1. We can test for an empty or non-empty list: >>> bananas = ['banana1', 'banana2', 'banana3',] >>> if not bananas: ... print 'yes, we have no bananas' ... else: ... print 'yes, we have bananas' ... yes, we have bananas 2. We can simulate a "switch" statement using if:elif: ...: NO_COLOR, RED, GREEN, BLUE = range(4) def test(color): if color == RED: print "It's red." elif color == GREEN: print "It's green." elif color == BLUE: print "It's blue." def main(): color = BLUE test(color)
converted by Web2PDFConvert.com
if __name__ == '__main__': main() Which, when run prints out the following: It's blue.
Exercises: 1. Create a list of integers. Use a for: statement to print out each integer in the list. 2. Create a string. print out each character in the string. Solutions: 1. The for: statement can iterate over the items in a list: In [13]: a = [11, 22, 33, ] In [14]: for value in a: ....: print 'value: %d' % value ....: ....: value: 11 value: 22 value: 33 2. The for: statement can iterate over the characters in a string: In [16]: b = 'chocolate' In [17]: for chr1 in b: ....: print 'character: %s' % chr1 ....: ....: character: c character: h character: o character: c character: o character: l character: a character: t character: e Notes: In the solution, I used the variable name chr1 rather than chr so as not to over-write the name of the built-in function chr(). When we need a sequential index, we can use the range() built-in function to create a list of integers. And, the xrange() built-in function produces an interator that produces a sequence of integers without creating the entire list. To iterate over a large sequence of integers, use xrange() instead of range(). Exercises: 1. Print out the integers from 0 to 5 in sequence. 2. Compute the sum of all the integers from 0 to 99999. 3. Given the following generator function:
converted by Web2PDFConvert.com
import urllib Urls = [ 'http://yahoo.com', 'http://python.org', 'http://gimp.org', # The GNU image manipulation program ] def walk(url_list): for url in url_list: f = urllib.urlopen(url) stuff = f.read() f.close() yield stuff Write a for: statement that uses this iterator generator to print the lengths of the content at each of the Web pages in that list. Solutions: 1. The range() built-in function gives us a sequence to iterate over: In [5]: for idx in range(6): ...: print 'idx: %d' % idx ...: ...: idx: 0 idx: 1 idx: 2 idx: 3 idx: 4 idx: 5 2. Since that sequence is a bit large, we'll use xrange() instead of range(): In [8]: count = 0 In [9]: for n in xrange(100000): ...: count += n ...: ...: In [10]: count Out[10]: 4999950000 3. The for: statement enables us to iterate over iterables as well as collections: import urllib Urls = [ 'http://yahoo.com', 'http://python.org', 'http://gimp.org', # The GNU image manipulation program ] def walk(url_list): for url in url_list: f = urllib.urlopen(url) stuff = f.read() f.close() yield stuff def test(): for url in walk(Urls): print 'length: %d' % (len(url), ) if __name__ == '__main__': test() When I ran this script, it prints the following: length: 9562 length: 16341 length: 12343 If you need an index while iterating over a sequence, consider using the enumerate() built-in function. Exercises:
converted by Web2PDFConvert.com
1. Given the following two lists of integers of the same length: a = [1, 2, 3, 4, 5] b = [100, 200, 300, 400, 500] Add the values in the first list to the corresponding values in the second list. Solutions: 1. The enumerate() built-in function gives us an index and values from a sequence. Since enumerate() gives us an interator that produces a sequence of two-tuples, we can unpack those tuples into index and value variables in the header line of the for statement: In In In In [13]: [14]: [15]: [16]: ....: ....: ....: In [17]: Out[17]: a = [1, 2, 3, 4, 5] b = [100, 200, 300, 400, 500] for idx, value in enumerate(a): b[idx] += value b [101, 202, 303, 404, 505]
Exercises: 1. Write a while: loop that doubles all the values in a list of integers. Solutions: 1. A while: loop with an index variable can be used to modify each element of a list: def test_while(): numbers = [11, 22, 33, 44, ] print 'before: %s' % (numbers, ) idx = 0 while idx < len(numbers): numbers[idx] *= 2 idx += 1 print 'after: %s' % (numbers, ) But, notice that this task is easier using the for: statement and the built-in enumerate() function: def test_for(): numbers = [11, 22, 33, 44, ] print 'before: %s' % (numbers, ) for idx, item in enumerate(numbers): numbers[idx] *= 2 print 'after: %s' % (numbers, )
Exercises:
converted by Web2PDFConvert.com
1. Write a for: loop that takes a list of integers and triples each integer that is even. Use the continue statement. 2. Write a loop that takes a list of integers and computes the sum of all the integers up until a zero is found in the list. Use the break statement. Solutions: 1. The continue statement enables us to "skip" items that satisfy a condition or test: def test(): numbers = [11, 22, 33, 44, 55, 66, ] print 'before: %s' % (numbers, ) for idx, item in enumerate(numbers): if item % 2 != 0: continue numbers[idx] *= 3 print 'after: %s' % (numbers, ) test() 2. The break statement enables us to exit from a loop when we find a zero: def test(): numbers = [11, 22, 33, 0, 44, 55, 66, ] print 'numbers: %s' % (numbers, ) sum = 0 for item in numbers: if item == 0: break sum += item print 'sum: %d' % (sum, ) test()
converted by Web2PDFConvert.com
def test(): infilename = 'nothing_noplace.txt' try: infile = open(infilename, 'r') for line in infile: print line except IOError, exp: print 'cannot open file "%s"' % infilename test() 2. We define a exception class as a sub-class of class Exception, then throw it (with the raise statement) and catch it (with a try:except: statement): class SizeError(Exception): pass def test_exception(size): try: if size <= 0: raise SizeError, 'size must be greater than zero' # Produce a different error to show that it will not be caught. x = y except SizeError, exp: print '%s' % (exp, ) print 'goodbye' def test(): test_exception(-1) print '-' * 40 test_exception(1) test() When we run this script, it produces the following output: $ python workbook027.py size must be greater than zero goodbye ---------------------------------------Traceback (most recent call last): File "workbook027.py", line 20, in <module> test() File "workbook027.py", line 18, in test test_exception(1) File "workbook027.py", line 10, in test_exception x = y NameError: global name 'y' is not defined Notes: Our except: clause caught the SizeError, but allowed the NameError to be uncaught. 3. We define a sub-class of of class Exception, then raise it in an inner loop and catch it outside of an outer loop: class BreakException1(Exception): pass def test(): a = [11, 22, 33, 44, 55, 66, ] b = [111, 222, 333, 444, 555, 666, ] try: for x in a: print 'outer -- x: %d' % x for y in b: if x > 22 and y > 444: raise BreakException1('leaving inner loop') print 'inner -- y: %d' % y print 'outer -- after' print '-' * 40 except BreakException1, exp: print 'out of loop -- exp: %s' % exp test() Here is what this prints out when run:
converted by Web2PDFConvert.com
outer -- x: 11 inner -- y: 111 inner -- y: 222 inner -- y: 333 inner -- y: 444 inner -- y: 555 inner -- y: 666 outer -- after ---------------------------------------outer -- x: 22 inner -- y: 111 inner -- y: 222 inner -- y: 333 inner -- y: 444 inner -- y: 555 inner -- y: 666 outer -- after ---------------------------------------outer -- x: 33 inner -- y: 111 inner -- y: 222 inner -- y: 333 inner -- y: 444 out of loop -- exp: leaving inner loop
3.6 Functions
A function has these characteristics: It groups a block of code together so that we can call it by name. It enables us to pass values into the the function when we call it. It can returns a value (even if None). When a function is called, it has its own namespace. Variables in the function are local to the function (and disappear when the function exits). A function is defined with the def: statement. Here is a simple example/template: def function_name(arg1, local_var1 = arg1 + local_var2 = arg2 * return local_var1 + arg2): 1 2 local_var2
And, here is an example of calling this function: result = function_name(1, 2) Here are a few notes of explanation: The above defines a function whose name is function_name. The function function_name has two arguments. That means that we can and must pass in exactly two values when we call it. This function has two local variables, local_var1 and local_var2. These variables are local in the sense that after we call this function, these two variables are not available in the location of the caller. When we call this function, it returns one value, specifically the sum of local_var1 and local_var2. Exercises: 1. Write a function that takes a list of integers as an argument, and returns the sum of the integers in that list. Solutions: 1. The return statement enables us to return a value from a function: def list_sum(values): sum = 0 for value in values: sum += value return sum def test(): a = [11, 22, 33, 44, ] print list_sum(a) if __name__ == '__main__': test()
converted by Web2PDFConvert.com
2. In this solution we are careful not to use a mutable object as a default value: def add_to_dict(name, value, dic=None): if dic is None: dic = {} dic[name] = value return dic def test(): dic1 = {'albert': 'cute', } print add_to_dict('barry', 'funny', dic1) print add_to_dict('charlene', 'smart', dic1) print add_to_dict('darryl', 'outrageous') print add_to_dict('eddie', 'friendly') test() If we run this script, we see: {'barry': 'funny', 'albert': 'cute'} {'barry': 'funny', 'albert': 'cute', 'charlene': 'smart'} {'darryl': 'outrageous'} {'eddie': 'friendly'} Notes: It's important that the default value for the dictionary is None rather than an empty dictionary, for example ({}). Remember that the def: statement is evaluated only once, which results in a single dictionary, which would be shared by all callers that do not provide a dictionary as an argument.
Running this might produce something like the following (note for MS Windows users: use type instead of cat): $ cat tmp.txt line 1 line 2 line 3 $ cat tmp.txt | python workbook005.py ## line 1 ## line 2 ## line 3
def func1(*args, **kwargs): print 'args: %s' % (args, ) print 'kwargs: %s' % (kwargs, ) def func2(*args, **kwargs): print 'before' func1(*args, **kwargs) print 'after' def test(): func2('aaa', 'bbb', 'ccc', arg1='ddd', arg2='eee') test() When we run this, it prints the following: before args: ('aaa', 'bbb', 'ccc') kwargs: {'arg1': 'ddd', 'arg2': 'eee'} after Notes: In a function call, the * operator unrolls a list into individual positional arguments, and the ** operator unrolls a dictionary into individual keyword arguments.
In a function call, arguments must appear in the following order, from left to right: 1. Positional (plain) arguments 2. Extra arguments, if present 3. Keyword arguments, if present
converted by Web2PDFConvert.com
2. We can also put functions (function objects) in a data structure (for example, a list), and then pass that data structure to a function: def fancy(obj): print 'fancy fancy -- %s -- fancy fancy' % (obj, ) def plain(obj): print 'plain -- %s -- plain' % (obj, ) Func_list = [fancy, plain, ] def show(funcs, obj): for func in funcs: func(obj) def main(): a = {'aa': 11, 'bb': 22, } show(Func_list, a) if __name__ == '__main__': main() Notice that Python supports polymorphism (with or) without inheritance. This type of polymorphism is enabled by what is called duck-typing. For more on this see: Duck typing -- http://en.wikipedia.org/wiki/Duck_typing at Wikipedia.
converted by Web2PDFConvert.com
'name': 'animals', 'left_branch': { 'name': 'birds', 'left_branch': { 'name': 'seed eaters', 'left_branch': { 'name': 'house finch', 'left_branch': None, 'right_branch': None, }, 'right_branch': { 'name': 'white crowned sparrow', 'left_branch': None, 'right_branch': None, }, }, 'right_branch': { 'name': 'insect eaters', 'left_branch': { 'name': 'hermit thrush', 'left_branch': None, 'right_branch': None, }, 'right_branch': { 'name': 'black headed phoebe', 'left_branch': None, 'right_branch': None, }, }, }, 'right_branch': None, } Indents = [' ' * idx for idx in range(10)]
def walk_and_show(node, level=0): if node is None: return print '%sname: %s' % (Indents[level], node['name'], ) level += 1 walk_and_show(node['left_branch'], level) walk_and_show(node['right_branch'], level) def test(): walk_and_show(Tree) if __name__ == '__main__': test() Notes: Later, you will learn how to create equivalent data structures using classes and OOP (object-oriented programming). For more on that see Recursive calls to methods in this document.
result = transforms([11, 22], p, [f, g]) then the resulting generator might return: g(f(11)) 2. Implement a generator function that takes a list of URLs as its argument and generates the contents of each Web page, one by one (that is, it produces a sequence of strings, the HTML page contents). Solutions: 1. Here is the implementation of a function which contains yield, and, therefore, produces a generator: #!/usr/bin/env python """ filter_and_transform filter_and_transform(content, test_func, transforms=None) Return a generator that returns items from content after applying the functions in transforms if the item satisfies test_func . Arguments: 1. ``values`` -- A list of values 2. ``predicate`` -- A function that takes a single argument, performs a test on that value, and returns True or False. 3. ``transforms`` -- (optional) A list of functions. Apply each function in this list and returns the resulting value. So, for example, if the function is called like this:: result = filter_and_transforms([11, 22], p, [f, g]) then the resulting generator might return:: g(f(11)) """ def filter_and_transform(content, test_func, transforms=None): for x in content: if test_func(x): if transforms is None: yield x elif isiterable(transforms): for func in transforms: x = func(x) yield x else: yield transforms(x) def isiterable(x): flag = True try: x = iter(x) except TypeError, exp: flag = False return flag def iseven(n): return n % 2 == 0 def f(n): return n * 2 def g(n): return n ** 2 def test(): data1 = [11, 22, 33, 44, 55, 66, 77, ] for val in filter_and_transform(data1, iseven, f): print 'val: %d' % (val, ) print '-' * 40 for val in filter_and_transform(data1, iseven, [f, g]): print 'val: %d' % (val, ) print '-' * 40
converted by Web2PDFConvert.com
for val in filter_and_transform(data1, iseven): print 'val: %d' % (val, ) if __name__ == '__main__': test() Notes: Because function filter_and_transform contains yield, when called, it returns an iterator object, which we can use in a for statement. The second parameter of function filter_and_transform takes any function which takes a single argument and returns True or False. This is an example of polymorphism and "duck typing" (see Duck Typing -- http://en.wikipedia.org/wiki/Duck_typing). An analogous claim can be made about the third parameter. 2. The following function uses the urllib module and the yield function to generate the contents of a sequence of Web pages: import urllib Urls = [ 'http://yahoo.com', 'http://python.org', 'http://gimp.org', # The GNU image manipulation program ] def walk(url_list): for url in url_list: f = urllib.urlopen(url) stuff = f.read() f.close() yield stuff def test(): for x in walk(Urls): print 'length: %d' % (len(x), ) if __name__ == '__main__': test() When I run this, I see: $ python generator_example.py length: 9554 length: 16748 length: 11487
converted by Web2PDFConvert.com
test() Notes: Notice that we use object as a superclass, because we want to define an "new-style" class and because there is no other class that we want as a superclass. See the following for more information on new-style classes: New-style Classes -http://www.python.org/doc/newstyle/. In Python, we create an instance of a class by calling the class, that is, we apply the function call operator (parentheses) to the class.
class Node(object): def __init__(self, name=None, children=None): self.name = name if children is None: self.children = [] else: self.children = children def show_name(self, indent): print '%sname: "%s"' % (Indents[indent], self.name, ) def show(self, indent=0): self.show_name(indent) indent += 1 for child in self.children: child.show(indent) def test(): n1 = Node('N1') n2 = Node('N2') n3 = Node('N3')
converted by Web2PDFConvert.com
n4 = Node('N4') n5 = Node('N5', [n1, n2,]) n6 = Node('N6', [n3, n4,]) n7 = Node('N7', [n5, n6,]) n7.show() if __name__ == '__main__': test() Notes: Notice that we do not use the constructor for a list ([]) as a default value for the children parameter of the constructor. A list is mutable and would be created only once (when the class statement is executed) and would be shared.
superclass. Why the difference? Because, if (in the Plant class, for example) it used self.__init__(...) it would be calling the __init__ in the Plant class, itself. So, it bypasses itself by referencing the constructor in the superclass directly. This exercise also demonstrates "polymorphism" -- The show method is called a number of times, but which implementation executes depends on which instance it is called on. Calling on the show method on an instance of class Plant results in a call to Plant.show. Calling the show method on an instance of class Animal results in a call to Animal.show. And so on. It is important that each show method takes the correct number of arguments.
class AnimalNode(object): def __init__(self, name, left_branch=None, right_branch=None): self.name = name self.left_branch = left_branch self.right_branch = right_branch
converted by Web2PDFConvert.com
def show(self, level=0): print '%sname: %s' % (Indents[level], self.name, ) level += 1 if self.left_branch is not None: self.left_branch.show(level) if self.right_branch is not None: self.right_branch.show(level) Tree = AnimalNode('animals', AnimalNode('birds', AnimalNode('seed eaters', AnimalNode('house finch'), AnimalNode('white crowned sparrow'), ), AnimalNode('insect eaters', AnimalNode('hermit thrush'), AnimalNode('black headed phoebe'), ), ), None, ) def test(): Tree.show() if __name__ == '__main__': test() 2. Instead of using a left branch and a right branch, in this solution we use a list to represent the children of a node: class AnimalNode(object): def __init__(self, data, children=None): self.data = data if children is None: self.children = [] else: self.children = children def show(self, level=''): print '%sdata: %s' % (level, self.data, ) level += ' ' for child in self.children: child.show(level) Tree = AnimalNode('animals', [ AnimalNode('birds', [ AnimalNode('seed eaters', [ AnimalNode('house finch'), AnimalNode('white crowned sparrow'), AnimalNode('lesser gold finch'), ]), AnimalNode('insect eaters', [ AnimalNode('hermit thrush'), AnimalNode('black headed phoebe'), ]), ]) ]) def test(): Tree.show() if __name__ == '__main__': test() Notes: We represent the children of a node as a list. Each node "has-a" list of children. Notice that because a list is mutable, we do not use a list constructor ([]) in the initializer of the method header. Instead, we use None, then construct an empty list in the body of the method if necessary. See section Optional arguments and default values for more on this. We (recursively) call the show method for each node in the children list. Since a node which has no children (a leaf node) will have an empty children list, this provides a limit condition for our recursion.
"Normal" methods are instance methods. An instance method receives the instance as its first argument. A instance method is defined by using the def statement in the body of a class statement. A class method receives the class as its first argument. A class method is defined by defining a normal/instance method, then using the classmethod built-in function. For example: class ASimpleClass(object): description = 'a simple class' def show_class(cls, msg): print '%s: %s' % (cls.description , msg, ) show_class = classmethod(show_class) A static method does not receive anything special as its first argument. A static method is defined by defining a normal/instance method, then using the staticmethod built-in function. For example: class ASimpleClass(object): description = 'a simple class' def show_class(msg): print '%s: %s' % (ASimpleClass.description , msg, ) show_class = staticmethod(show_class) In effect, both class methods and static methods are defined by creating a normal (instance) method, then creating a wrapper object (a class method or static method) using the classmethod or staticmethod built-in function. Exercises: 1. Implement a class that keeps a running total of the number of instances created. 2. Implement another solution to the same problem (a class that keeps a running total of the number of instances), but this time use a static method instead of a class method. Solutions: 1. We use a class variable named instance_count, rather than an instance variable, to keep a running total of instances. Then, we increment that variable each time an instance is created: class CountInstances(object): instance_count = 0 def __init__(self, name='-no name-'): self.name = name CountInstances.instance_count += 1 def show(self): print 'name: "%s"' % (self.name, ) def show_instance_count(cls): print 'instance count: %d' % (cls.instance_count, ) show_instance_count = classmethod(show_instance_count) def test(): instances = [] instances.append(CountInstances('apple')) instances.append(CountInstances('banana')) instances.append(CountInstances('cherry')) instances.append(CountInstances()) for instance in instances: instance.show() CountInstances.show_instance_count() if __name__ == '__main__': test() Notes: When we run this script, it prints out the following: name: "apple" name: "banana" name: "cherry" name: "-no name-" instance count: 4
converted by Web2PDFConvert.com
The call to the classmethod built-in function effectively wraps the show_instance_count method in a class method, that is, in a method that takes a class object as its first argument rather than an instance object. To read more about classmethod, go to Built-in Functions -- http://docs.python.org/lib/built-in-funcs.html and search for "classmethod". 2. A static method takes neither an instance (self) nor a class as its first paramenter. And, static method is created with the staticmethod() built-in function (rather than with the classmethod() built-in): class CountInstances(object): instance_count = 0 def __init__(self, name='-no name-'): self.name = name CountInstances.instance_count += 1 def show(self): print 'name: "%s"' % (self.name, ) def show_instance_count(): print 'instance count: %d' % ( CountInstances.instance_count, ) show_instance_count = staticmethod(show_instance_count) def test(): instances = [] instances.append(CountInstances('apple')) instances.append(CountInstances('banana')) instances.append(CountInstances('cherry')) instances.append(CountInstances()) for instance in instances: instance.show() CountInstances.show_instance_count() if __name__ == '__main__': test()
Solutions: 1. A decorator is an easier and cleaner way to define a class method (or a static method): class CountInstances(object): instance_count = 0 def __init__(self, name='-no name-'): self.name = name CountInstances.instance_count += 1 def show(self): print 'name: "%s"' % (self.name, ) @classmethod def show_instance_count(cls): print 'instance count: %d' % (cls.instance_count, ) # Note that the following line has been replaced by # the classmethod decorator, above. # show_instance_count = classmethod(show_instance_count) def test(): instances = [] instances.append(CountInstances('apple')) instances.append(CountInstances('banana')) instances.append(CountInstances('cherry')) instances.append(CountInstances()) for instance in instances: instance.show() CountInstances.show_instance_count() if __name__ == '__main__': test()
print 'x:', x, 'y:', y func2((x, y)) @trace def func2(content): print 'content:', content def test(): func1('aa', 'bb') test() Notes: Your inner function can use *args and **kwargs to enable it to call functions with any number of arguments.
converted by Web2PDFConvert.com
@dec2 @dec1 def func(arg1, arg2, ...): pass are equivalent to: def func(arg1, arg2, ...): pass func = dec2(dec1(func)) Exercises: 1. Implement a decorator (as above) that traces calls to a decorated function. Then "stack" that with another decorator that prints a horizontal line of dashes before and after calling the function. 2. Modify your solution to the above exercise so that the decorator that prints the horizontal line takes one argument: a character (or characters) that can be repeated to produce a horizontal line/separator. Solutions: 1. Reuse your tracing function from the previous exercise, then write a simple decorator that prints a row of dashes: def trace(msg): def inner1(func): def inner2(*args, **kwargs): print '>> [%s]' % (msg, ) retval = func(*args, **kwargs) print '<< [%s]' % (msg, ) return retval return inner2 return inner1 def horizontal_line(func): def inner(*args, **kwargs): print '-' * 50 retval = func(*args, **kwargs) print '-' * 50 return retval return inner @trace('tracing func1') def func1(x, y): print 'x:', x, 'y:', y result = func2((x, y)) return result @horizontal_line @trace('tracing func2') def func2(content): print 'content:', content return content * 3 def test(): result = func1('aa', 'bb') print 'result:', result test() 2. Once again, a decorator with arguments can be implemented with a function nested inside a function which is nested inside a function. This remains the same whether the decorator is used as a stacked decorator or not. Here is a solution: def trace(msg): def inner1(func): def inner2(*args, **kwargs): print '>> [%s]' % (msg, ) retval = func(*args, **kwargs) print '<< [%s]' % (msg, ) return retval return inner2 return inner1 def horizontal_line(line_chr): def inner1(func): def inner2(*args, **kwargs): print line_chr * 15 retval = func(*args, **kwargs)
converted by Web2PDFConvert.com
print line_chr * 15 return retval return inner2 return inner1 @trace('tracing func1') def func1(x, y): print 'x:', x, 'y:', y result = func2((x, y)) return result @horizontal_line('<**>') @trace('tracing func2') def func2(content): print 'content:', content return content * 3 def test(): result = func1('aa', 'bb') print 'result:', result test()
3.8.2 Iterables
3.8.2.1 A few preliminaries on Iterables
Definition: iterable (adjective) -- that which can be iterated over. A good test of whether something is iterable is whether it can be used in a for: statement. For example, if we can write for item in X: , then X is iterable. Here is another simple test: def isiterable(x): try: y = iter(x) except TypeError, exp: return False return True Some kinds of iterables: Containers -- We can iterate over lists, tuples, dictionaries, sets, strings, and other containers. Some built-in (non-container) types -- Examples: A text file open in read mode is iterable: it iterates over the lines in the file. The xrange type -- See XRange Type http://docs.python.org/lib/typesseq-xrange.html. It's useful when you want a large sequence of integers to iterate over. Instances of classes that obey the iterator protocol. For a description of the iterator protocol, see Iterator Types -http://docs.python.org/lib/typeiter.html. Hint: Type dir(obj) and look for "__iter__" and "next". Generators -- An object returned by any function or method that contains yield. Exercises: 1. Implement a class whose instances are interable. The constructor takes a list of URLs as its argument. An instance of this class, when iterated over, generates the content of the Web page at that address. Solutions: 1. We implement a class that has __iter__() and next() methods: import urllib class WebPages(object): def __init__(self, urls): self.urls = urls self.current_index = 0 def __iter__(self):
converted by Web2PDFConvert.com
self.current_index = 0 return self def next(self): if self.current_index >= len(self.urls): raise StopIteration url = self.urls[self.current_index] self.current_index += 1 f = urllib.urlopen(url) content = f.read() f.close() return content def test(): urls = [ 'http://www.python.org', 'http://en.wikipedia.org/', 'http://en.wikipedia.org/wiki/Python_(programming_language)', ] pages = WebPages(urls) for page in pages: print 'length: %d' % (len(page), ) pages = WebPages(urls) print '-' * 50 page = pages.next() print 'length: %d' % (len(page), ) page = pages.next() print 'length: %d' % (len(page), ) page = pages.next() print 'length: %d' % (len(page), ) page = pages.next() print 'length: %d' % (len(page), ) test()
<client> <fullname>Arnold Applebee</fullname> <refid>10001</refid> </client> </promoter> <promoter> <firstname>Edward</firstname> <lastname>Eddleberry</lastname> <client> <fullname>Arnold Applebee</fullname> <refid>10001</refid> </client> </promoter> </person> </people> 3. ElementTree -- Parse an XML document with ElementTree, then walk the DOM tree and show some information (tag, attributes, character data) for each element. 4. lxml -- Parse an XML document with lxml, then walk the DOM tree and show some information (tag, attributes, character data) for each element. 5. Modify document with ElementTree -- Use ElementTree to read a document, then modify the tree. Show the contents of the tree, and then write out the modified document. 6. XPath -- lxml supports XPath. Use the XPath support in lxml to address each of the following in the above XML instance document: The text in all the name elements The values of all the id attributes Solutions: 1. We can use the SAX support in the Python standard library: #!/usr/bin/env python """ Parse and XML with SAX. Display info about each element. Usage: python test_sax.py infilename Examples: python test_sax.py people.xml """ import sys from xml.sax import make_parser, handler class TestHandler(handler.ContentHandler): def __init__(self): self.level = 0 def show_with_level(self, value): print '%s%s' % (' ' * self.level, value, ) def startDocument(self): self.show_with_level('Document start') self.level += 1 def endDocument(self): self.level -= 1 self.show_with_level('Document end') def startElement(self, name, attrs): self.show_with_level('start element -- name: "%s"' % (name, )) self.level += 1 def endElement(self, name): self.level -= 1 self.show_with_level('end element -- name: "%s"' % (name, )) def characters(self, content): content = content.strip() if content: self.show_with_level('characters: "%s"' % (content, )) def test(infilename): parser = make_parser()
converted by Web2PDFConvert.com
handler = TestHandler() parser.setContentHandler(handler) parser.parse(infilename) def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infilename = args[0] test(infilename) if __name__ == '__main__': main() 2. The minidom module contains a parse() function that enables us to read an XML document and create a DOM tree: #!/usr/bin/env python """Process an XML document with minidom. Show the document tree. Usage: python minidom_walk.py [options] infilename """ import sys from xml.dom import minidom def show_tree(doc): root = doc.documentElement show_node(root, 0) def show_node(node, level): count = 0 if node.nodeType == minidom.Node.ELEMENT_NODE: show_level(level) print 'tag: %s' % (node.nodeName, ) for key in node.attributes.keys(): attr = node.attributes.get(key) show_level(level + 1) print '- attribute name: %s value: "%s"' % (attr.name, attr.value, ) if (len(node.childNodes) == 1 and node.childNodes[0].nodeType == minidom.Node.TEXT_NODE): show_level(level + 1) print '- data: "%s"' % (node.childNodes[0].data, ) for child in node.childNodes: count += 1 show_node(child, level + 1) return count def show_level(level): for x in range(level): print ' ', def test(): args = sys.argv[1:] if len(args) != 1: print __doc__ sys.exit(1) docname = args[0] doc = minidom.parse(docname) show_tree(doc) if __name__ == '__main__': #import pdb; pdb.set_trace() test() 3. ElementTree enables us to parse an XML document and create a DOM tree: #!/usr/bin/env python """Process an XML document with elementtree.
converted by Web2PDFConvert.com
Show the document tree. Usage: python elementtree_walk.py [options] infilename """ import sys from xml.etree import ElementTree as etree def show_tree(doc): root = doc.getroot() show_node(root, 0) def show_node(node, level): show_level(level) print 'tag: %s' % (node.tag, ) for key, value in node.attrib.iteritems(): show_level(level + 1) print '- attribute -- name: %s value: "%s"' % (key, value, ) if node.text: text = node.text.strip() show_level(level + 1) print '- text: "%s"' % (node.text, ) if node.tail: tail = node.tail.strip() show_level(level + 1) print '- tail: "%s"' % (tail, ) for child in node.getchildren(): show_node(child, level + 1) def show_level(level): for x in range(level): print ' ', def test(): args = sys.argv[1:] if len(args) != 1: print __doc__ sys.exit(1) docname = args[0] doc = etree.parse(docname) show_tree(doc) if __name__ == '__main__': #import pdb; pdb.set_trace() test() 4. lxml enables us to parse an XML document and create a DOM tree. In fact, since lxml attempts to mimic the ElementTree API, our code is very similar to that in the solution to the ElementTree exercise: #!/usr/bin/env python """Process an XML document with elementtree. Show the document tree. Usage: python lxml_walk.py [options] infilename """ # # Imports: import sys from lxml import etree def show_tree(doc): root = doc.getroot() show_node(root, 0) def show_node(node, level): show_level(level) print 'tag: %s' % (node.tag, ) for key, value in node.attrib.iteritems(): show_level(level + 1) print '- attribute -- name: %s value: "%s"' % (key, value, ) if node.text: text = node.text.strip() show_level(level + 1)
converted by Web2PDFConvert.com
print '- text: "%s"' % (node.text, ) if node.tail: tail = node.tail.strip() show_level(level + 1) print '- tail: "%s"' % (tail, ) for child in node.getchildren(): show_node(child, level + 1) def show_level(level): for x in range(level): print ' ', def test(): args = sys.argv[1:] if len(args) != 1: print __doc__ sys.exit(1) docname = args[0] doc = etree.parse(docname) show_tree(doc) if __name__ == '__main__': #import pdb; pdb.set_trace() test() 5. We can modify the DOM tree and write it out to a new file: #!/usr/bin/env python """Process an XML document with elementtree. Show the document tree. Modify the document tree and then show it again. Write the modified XML tree to a new file. Usage: python elementtree_walk.py [options] infilename outfilename Options: -h, --help Display this help message. Example: python elementtree_walk.py myxmldoc.xml myotherxmldoc.xml """ import import import import sys os getopt time
# Use ElementTree. from xml.etree import ElementTree as etree # Or uncomment to use Lxml. #from lxml import etree def show_tree(doc): root = doc.getroot() show_node(root, 0) def show_node(node, level): show_level(level) print 'tag: %s' % (node.tag, ) for key, value in node.attrib.iteritems(): show_level(level + 1) print '- attribute -- name: %s value: "%s"' % (key, value, ) if node.text: text = node.text.strip() show_level(level + 1) print '- text: "%s"' % (node.text, ) if node.tail: tail = node.tail.strip() show_level(level + 1) print '- tail: "%s"' % (tail, ) for child in node.getchildren(): show_node(child, level + 1) def show_level(level): for x in range(level): print ' ', def modify_tree(doc, tag, attrname, attrvalue):
converted by Web2PDFConvert.com
root = doc.getroot() modify_node(root, tag, attrname, attrvalue) def modify_node(node, tag, attrname, attrvalue): if node.tag == tag: node.attrib[attrname] = attrvalue for child in node.getchildren(): modify_node(child, tag, attrname, attrvalue) def test(indocname, outdocname): doc = etree.parse(indocname) show_tree(doc) print '-' * 50 date = time.ctime() modify_tree(doc, 'person', 'date', date) show_tree(doc) write_output = False if os.path.exists(outdocname): response = raw_input('Output file (%s) exists. Over-write? (y/n): ' % outdocname) if response == 'y': write_output = True else: write_output = True if write_output: doc.write(outdocname) print 'Wrote modified XML tree to %s' % outdocname else: print 'Did not write output file.' def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help', ]) except: usage() for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 2: usage() indocname = args[0] outdocname = args[1] test(indocname, outdocname) if __name__ == '__main__': #import pdb; pdb.set_trace() main() Notes: The above solution contains an import statement for ElementTree and another for lxml. The one for lxml is commented out, but you could change that if you wish to use lxml instead of ElementTree. This solution will work the same way with either ElementTree or lxml. 6. When we parse and XML document with lxml, each element (node) has an xpath() method. # test_xpath.py from lxml import etree def test(): doc = etree.parse('people.xml') root = doc.getroot() print root.xpath("//name/text()") print root.xpath("//@id") test() And, when we run the above code, here is what we see: $ python test_xpath.py ['Alberta', 'Bernardo', 'Charlie'] ['1', '2', '3']
converted by Web2PDFConvert.com
Our examples use the gadfly database, which is written in Python. If you want to use gadfly, you can find it here: http://gadfly.sourceforge.net/. gadfly is a reasonable choice if you want an easy to use database on your local machine. Another reasonable choice for a local database is sqlite3, which is in the Python standard library. Here is a descriptive quote from the SQLite Web site: "SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code for SQLite is in the public domain." You can learn about it here: sqlite3 - DB-API 2.0 interface for SQLite databases -- http://docs.python.org/library/sqlite3.html SQLite home page -- http://www.sqlite.org/ The pysqlite web page -- http://oss.itsystementwicklung.de/trac/pysqlite/ If you want or need to use another, enterprise class database, for example PostgreSQL, MySQL, Oracle, etc., you will need an interface module for your specific database. You can find information about database interface modules here: Database interfaces -http://wiki.python.org/moin/DatabaseInterfaces Excercises: 1. 2. 3. 4. Write a script that retrieves all the rows in a table and prints each row. Write a script that retrieves all the rows in a table, then uses the cursor as an iterator to print each row. Write a script that uses the cursor's description attribute to print out the name and value of each field in each row. Write a script that performs several of the above tasks, but uses sqlite3 instead of gadfly.
Solutions: 1. We can execute a SQL query and then retrieve all the rows with fetchall(): import gadfly def test(): connection = gadfly.connect("dbtest1", "plantsdbdir") cur = connection.cursor() cur.execute('select * from plantsdb order by p_name') rows = cur.fetchall() for row in rows: print '2. row:', row connection.close() test() 2. The cursor itself is an iterator. It iterates over the rows returned by a query. So, we execute a SQL query and then we use the cursor in a for: statement: import gadfly def test(): connection = gadfly.connect("dbtest1", "plantsdbdir") cur = connection.cursor() cur.execute('select * from plantsdb order by p_name') for row in cur: print row connection.close()
converted by Web2PDFConvert.com
test() 3. The description attribute in the cursor is a container that has an item describing each field: import gadfly def test(): cur.execute('select * from plantsdb order by p_name') for field in cur.description: print 'field:', field rows = cur.fetchall() for row in rows: for idx, field in enumerate(row): content = '%s: "%s"' % (cur.description[idx][0], field, ) print content, print connection.close() test() Notes: The comma at the end of the print statement tells Python not to print a new-line. The cur.description is a sequence containing an item for each field. After the query, we can extract a description of each field. 4. The solutions using sqlite3 are very similar to those using gadfly. For information on sqlite3, see: sqlite3 DB-API 2.0 interface for SQLite databases http://docs.python.org/library/sqlite3.html#module-sqlite3. #!/usr/bin/env python """ Perform operations on sqlite3 (plants) database. Usage: python py_db_api.py command [arg1, ... ] Commands: create -- create new database. show -- show contents of database. add -- add row to database. Requires 3 args (name, descrip, rating). delete - remove row from database. Requires 1 arg (name). Examples: python test1.py create python test1.py show python test1.py add crenshaw "The most succulent melon" 10 python test1.py delete lemon """ import sys import sqlite3 Values = [ ('lemon', 'bright and yellow', '7'), ('peach', 'succulent', '9'), ('banana', 'smooth and creamy', '8'), ('nectarine', 'tangy and tasty', '9'), ('orange', 'sweet and tangy', '8'), ] Field_defs = [ 'p_name varchar', 'p_descrip varchar', #'p_rating integer', 'p_rating varchar', ] def createdb(): connection = sqlite3.connect('sqlite3plantsdb') cursor = connection.cursor() q1 = "create table plantsdb (%s)" % (', '.join(Field_defs)) print 'create q1: %s' % q1 cursor.execute(q1) q1 = "create index index1 on plantsdb(p_name)" cursor.execute(q1) q1 = "insert into plantsdb (p_name, p_descrip, p_rating) values ('%s', '%s', %s)" for spec in Values: q2 = q1 % spec
converted by Web2PDFConvert.com
print 'q2: "%s"' % q2 cursor.execute(q2) connection.commit() showdb1(cursor) connection.close() def showdb(): connection, cursor = opendb() showdb1(cursor) connection.close() def showdb1(cursor): cursor.execute("select * from plantsdb order by p_name") hr() description = cursor.description print description print 'description:' for rowdescription in description: print ' %s' % (rowdescription, ) hr() rows = cursor.fetchall() print rows print 'rows:' for row in rows: print ' %s' % (row, ) hr() print 'content:' for row in rows: descrip = row[1] name = row[0] rating = '%s' % row[2] print ' %s%s%s' % ( name.ljust(12), descrip.ljust(30), rating.rjust(4), ) def addtodb(name, descrip, rating): try: rating = int(rating) except ValueError, exp: print 'Error: rating must be integer.' return connection, cursor = opendb() cursor.execute("select * from plantsdb where p_name = '%s'" % name) rows = cursor.fetchall() if len(rows) > 0: ql = "update plantsdb set p_descrip='%s', p_rating='%s' where p_name='%s'" % ( descrip, rating, name, ) print 'ql:', ql cursor.execute(ql) connection.commit() print 'Updated' else: cursor.execute("insert into plantsdb values ('%s', '%s', '%s')" % ( name, descrip, rating)) connection.commit() print 'Added' showdb1(cursor) connection.close() def deletefromdb(name): connection, cursor = opendb() cursor.execute("select * from plantsdb where p_name = '%s'" % name) rows = cursor.fetchall() if len(rows) > 0: cursor.execute("delete from plantsdb where p_name='%s'" % name) connection.commit() print 'Plant (%s) deleted.' % name else: print 'Plant (%s) does not exist.' % name showdb1(cursor) connection.close() def opendb(): connection = sqlite3.connect("sqlite3plantsdb") cursor = connection.cursor() return connection, cursor
converted by Web2PDFConvert.com
def hr(): print '-' * 60 def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] if len(args) < 1: usage() cmd = args[0] if cmd == 'create': if len(args) != 1: usage() createdb() elif cmd == 'show': if len(args) != 1: usage() showdb() elif cmd == 'add': if len(args) < 4: usage() name = args[1] descrip = args[2] rating = args[3] addtodb(name, descrip, rating) elif cmd == 'delete': if len(args) < 2: usage() name = args[1] deletefromdb(name) else: usage() if __name__ == '__main__': main()
converted by Web2PDFConvert.com
infile.close() def main(): infilename = 'csv_report.csv' test(infilename) if __name__ == '__main__': main() And, when run, here is what it displays: ==== Name ==== Lemon Eggplant Tangerine =========== Description =========== Bright yellow and tart Purple and shiny Succulent ====== Rating ====== 5 6 8
test() 2. The YAML dump() function enables us to dump data to a file: import yaml import pprint def test(): infile = open('test1.yaml', 'r') data = yaml.load(infile) infile.close() data['national'].append('San Francisco Giants') outfile = open('test1_new.yaml', 'w') yaml.dump(data, outfile) outfile.close() test() Notes: If we want to produce the standard YAML "block" style rather than the "flow" format, then we could use: yaml.dump(data, outfile, default_flow_style=False)
3.9.5 Json
Here is a quote from Wikipedia entry for Json: "JSON (pronounced 'Jason'), short for JavaScript Object Notation, is a lightweight computer data interchange format. It is a textbased, human-readable format for representing simple data structures and associative arrays (called objects)." The Json text representation looks very similar to Python literal representation of Python builtin data types (for example, lists, dictionaries, numbers, and strings). Learn more about Json and Python support for Json here: Introducing JSON -- http://json.org/ Json at Wikipedia -- http://en.wikipedia.org/wiki/Json python-json -- http://pypi.python.org/pypi/python-json simplejson -- http://pypi.python.org/pypi/simplejson Excercises: 1. Write a Python script, using your favorite Python Json implementation (for example python-json or simplejson), that dumps the following data structure to a file: Data = { 'rock and roll': ['Elis', 'The Beatles', 'The Rolling Stones',], 'country': ['Willie Nelson', 'Hank Williams', ] } 2. Write a Python script that reads Json data from a file and loads it into Python data structures. Solutions: 1. This solution uses simplejson to store a Python data structure encoded as Json in a file: import simplejson as json Data = { 'rock and roll': ['Elis', 'The Beatles', 'The Rolling Stones',], 'country': ['Willie Nelson', 'Hank Williams', ] } def test(): fout = open('tmpdata.json', 'w') content = json.dumps(Data) fout.write(content) fout.write('\n') fout.close()
converted by Web2PDFConvert.com
test() 2. We can read the file into a string, then decode it from Json: import simplejson as json def test(): fin = open('tmpdata.json', 'r') content = fin.read() fin.close() data = json.loads(content) print data test() Note that you may want some control over indentation, character encoding, etc. For simplejson, you can learn about that here: simplejson JSON encoder and decoder -- http://simplejson.googlecode.com/svn/tags/simplejson-2.0.1/docs/index.html.
4.1 Introduction
Additional information: If you plan to work through this tutorial, you may find it helpful to look at the sample code that accompanies this tutorial. You can find it in the distribution under: tutorial/ tutorial/Code/ You can find additional information about generateDS.py here: http://www.rexx.com/~dkuhlman/generateDS.html That documentation is also included in the distribution.
generateDS.py generates Python data structures (for example, class definitions) from an XML schema document. These data structures represent the elements in an XML document described by the XML schema. generateDS.py also generates parsers that load an XML document
into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document. The generated Python code contains: A class definition for each element defined in the XML schema document. A main and driver function that can be used to test the generated code. A parser that will read an XML document which satisfies the XML schema from which the parser was generated. The parser creates and populates a tree structure of instances of the generated Python classes. Methods in each class to export the instance back out to XML (method export) and to export the instance to a literal representing the Python data structure (method exportLiteral). Each generated class contains the following: A constructor method (__init__), with member variable initializers. Methods with names get_xyz and set_xyz for each member variable "xyz" or, if the member variable is defined with maxOccurs="unbounded", methods with names get_xyz, set_xyz, add_xyz, and insert_xyz. (Note: If you use the --use-old-gettersetter, then you will get methods with names like getXyz and setXyz.) A build method that can be used to populate an instance of the class from a node in an ElementTree or Lxml tree. An export method that will write the instance (and any nested sub-instances) to a file object as XML text. An exportLiteral method that will write the instance (and any nested sub-instances) to a file object as Python literals (text). The generated subclass file contains one (sub-)class definition for each data representation class. If the subclass file is used, then the parser creates instances of the subclasses (instead of creating instances of the superclasses). This enables the user to extend the subclasses with "tree walk" methods, for example, that process the contents of the XML file. The user can also generate and extend multiple subclass files which use a single, common superclass file, thus implementing a number of different processes on the same XML document type.
converted by Web2PDFConvert.com
This document introduces the user to generateDS.py and walks the user through several examples that show how to generate Python code and how to use that generated code.
converted by Web2PDFConvert.com
4.3 Using the generated code to parse and export an XML document
Now that you have generated code for your data model, you can test it by running it as an application. Suppose that you have an XML instance document people1.xml that satisfies your schema. Then you can parse that instance document and export it (print it out) with something like the following: $ python people_api.py people1.xml And, if you have used the --super command line option, as I have above, to connect your subclass file with the superclass (API) file, then you could use the following to do the same thing: $ python people_appl1.py people1.xml
default is not what you want. member-specs=list|dict Suppose you want to write some code that can be generically applied to elements of different kinds (element types implemented by several different generated classes. If so, it might be helpful to have a list or dictionary specifying information about each member data item in each class. This option does that by generating a list or a dictionary (with the member data item name as key) in each generated class. Take a look at the generated code to learn about it. In particular, look at the generated list or dictionary in a class for any element type and also at the definition of the class _MemberSpec generated near the top of the API module. version Ask generateDS.py to tell you what version it is. This is helpful when you want to ask about a problem, for example at the generatedsusers email list (https://lists.sourceforge.net/lists/listinfo/generateds-users), and want to specify which version you are using.
However, you are likely to want to go beyond that. In many situations you will want to construct a custom application that processes your XML documents using the generated code.
Create instances of those classes. Link those instances, for example put "children" inside of a parent, or add one or more instances to a parent that can contain a list of objects (think "maxOccurs" greater than 1 in your schema) Get to know the generated export API by inspecting the generated code in the superclass file. That's the file generated with the "-o" command line flag. What to look for: Look at the arguments to the constructor (__init__) to learn how to initialize an instance. Look at the "getters" and "setters" (methods name getxxx and setxxx, to learn how to modify member variables. Look for a method named addxxx for members that are lists. These correspond to members defined with maxOccurs="n", where n > 1. Look at the build methods: build, buildChildren, and buildAttributes. These will give you information about how to construct each of the members of a given element/class. Now, you can import your generated API module, and use it to construct and manipulate objects. Here is an example using code generated with the "people" schema: import sys import people_api as api def test(names): people = api.peopleType() for count, name in enumerate(names): id = '%d' % (count + 1, ) person = api.personType(name=name, id=id) people.add_person(person) people.export(sys.stdout, 0) test(['albert', 'betsy', 'charlie'])
converted by Web2PDFConvert.com
Run this and you might see something like the following: $ python tmp.py <people > <person id="1"> <name>albert</name> </person> <person id="2"> <name>betsy</name> </person> <person id="3"> <name>charlie</name> </person> </people>
class peopleTypeSub(supermod.peopleType): def __init__(self, comments=None, person=None, specialperson=None, programmer=None, python_programmer=None, java super(peopleTypeSub, self).__init__(comments, person, specialperson, programmer, python_programmer, java_pro def upcase_names(self): for person in self.get_person(): person.upcase_names() supermod.peopleType.subclass = peopleTypeSub # end class peopleTypeSub
class personTypeSub(supermod.personType): def __init__(self, vegetable=None, fruit=None, ratio=None, id=None, value=None, name=None, interest=None, catego super(personTypeSub, self).__init__(vegetable, fruit, ratio, id, value, name, interest, category, agent, pro def upcase_names(self): self.set_name(self.get_name().upper()) supermod.personType.subclass = personTypeSub # end class personTypeSub Notes: These classes were generated with the "-s" command line option. They are subclasses of classes in the module people_api, which was generated with the "-o" command line option. The only modification to the skeleton subclasses is the addition of the two methods named upcase_names(). In the subclass peopleTypeSub, the method upcase_names() merely walk over its immediate children. In the subclass personTypeSub, the method upcase_names() just converts the value of its "name" member to upper case. Here is the application itself (upcase_names.py): import sys import upcase_names_appl as appl def create_people(names): people = appl.peopleTypeSub() for count, name in enumerate(names): id = '%d' % (count + 1, ) person = appl.personTypeSub(name=name, id=id) people.add_person(person) return people def main(): names = ['albert', 'betsy', 'charlie'] people = create_people(names) print 'Before:' people.export(sys.stdout, 1) people.upcase_names() print '-' * 50 print 'After:' people.export(sys.stdout, 1) main()
converted by Web2PDFConvert.com
Notes: The create_people() function creates a peopleTypeSub instance with several personTypeSub instances inside it. And, when you run this mini-application, here is what you might see: $ python upcase_names.py Before: <people > <person id="1"> <name>albert</name> </person> <person id="2"> <name>betsy</name> </person> <person id="3"> <name>charlie</name> </person> </people> -------------------------------------------------After: <people > <person id="1"> <name>ALBERT</name> </person> <person id="2"> <name>BETSY</name> </person> <person id="3"> <name>CHARLIE</name> </person> </people>
converted by Web2PDFConvert.com
if (XMLParser_import_library == XMLParser_import_lxml and 'parser' not in kwargs): # Use the lxml ElementTree compatible parser so that, e.g., # we ignore comments. kwargs['parser'] = etree_.ETCompatXMLParser() doc = etree_.parse(*args, **kwargs) return doc # # Globals # ExternalEncoding = 'ascii' # # Utility funtions needed in each generated class. # def upper_elements(obj): for item in obj.member_data_items_: if item.get_data_type() == 'xs:string': name = remap(item.get_name()) val1 = getattr(obj, name) if isinstance(val1, list): for idx, val2 in enumerate(val1): val1[idx] = val2.upper() else: setattr(obj, name, val1.upper()) def remap(name): newname = name.replace('-', '_') return newname # # Data representation classes # class contactlistTypeSub(supermod.contactlistType): def __init__(self, locator=None, description=None, contact=None): super(contactlistTypeSub, self).__init__(locator, description, contact, ) def upper(self): upper_elements(self) for child in self.get_contact(): child.upper() supermod.contactlistType.subclass = contactlistTypeSub # end class contactlistTypeSub
class contactTypeSub(supermod.contactType): def __init__(self, priority=None, color_code=None, id=None, first_name=None, last_name=None, interest=None, cate super(contactTypeSub, self).__init__(priority, color_code, id, first_name, last_name, interest, category, ) def upper(self): upper_elements(self) supermod.contactType.subclass = contactTypeSub # end class contactTypeSub def get_root_tag(node): tag = supermod.Tag_pattern_.match(node.tag).groups()[-1] rootClass = None if hasattr(supermod, tag): rootClass = getattr(supermod, tag) return tag, rootClass def parse(inFilename): doc = parsexml_(inFilename) rootNode = doc.getroot() rootTag, rootClass = get_root_tag(rootNode) if rootClass is None: rootTag = 'contact-list' rootClass = supermod.contactlistType rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_=rootTag, namespacedef_='')
converted by Web2PDFConvert.com
doc = None return rootObj def parseString(inString): from StringIO import StringIO doc = parsexml_(StringIO(inString)) rootNode = doc.getroot() rootTag, rootClass = get_root_tag(rootNode) if rootClass is None: rootTag = 'contact-list' rootClass = supermod.contactlistType rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_=rootTag, namespacedef_='') return rootObj def parseLiteral(inFilename): doc = parsexml_(inFilename) rootNode = doc.getroot() rootTag, rootClass = get_root_tag(rootNode) if rootClass is None: rootTag = 'contact-list' rootClass = supermod.contactlistType rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('#from member_specs_api import *\n\n') sys.stdout.write('import member_specs_api as model_\n\n') sys.stdout.write('rootObj = model_.contact_list(\n') rootObj.exportLiteral(sys.stdout, 0, name_="contact_list") sys.stdout.write(')\n') return rootObj USAGE_TEXT = """ Usage: python ???.py <infilename> """ def usage(): print USAGE_TEXT sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infilename = args[0] root = parse(infilename) if __name__ == '__main__': #import pdb; pdb.set_trace() main() Notes: We add the functions upper_elements and remap that we use in each generated class. Notice how the function upper_elements calls the function remap only on those members whose type is xs:string. In each generated (sub-)class, we add the methods that walk the DOM tree and apply the method (upper) that transforms each xs:string value.
import sys import member_specs_api as supermod import member_specs_upper def process(inFilename): doc = supermod.parsexml_(inFilename) rootNode = doc.getroot() rootClass = member_specs_upper.contactlistTypeSub rootObj = rootClass.factory() rootObj.build(rootNode) # Enable Python to collect the space used by the DOM. doc = None sys.stdout.write('<?xml version="1.0" ?>\n') rootObj.export(sys.stdout, 0, name_="contact-list", namespacedef_='') rootObj.upper() sys.stdout.write('-' * 60) sys.stdout.write('\n') rootObj.export(sys.stdout, 0, name_="contact-list", namespacedef_='') return rootObj USAGE_MSG = """\ Synopsis: Sample application using classes and subclasses generated by generateDS.py Usage: python member_specs_test.py infilename """ def usage(): print USAGE_MSG sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infilename = args[0] process(infilename) if __name__ == '__main__': main() Notes: We copy the function parse() from our generated code to serve as a model for our function process(). After parsing and displaying the XML instance document, we call method upper() in the generated class contactlistTypeSub in order to walk the DOM tree and transform each xs:string to uppercase.
converted by Web2PDFConvert.com
<category>2</category> </contact> </contact-list> Notes: The output above shows both before- and after-version of exporting the parsed XML instance document.
converted by Web2PDFConvert.com
converted by Web2PDFConvert.com