티스토리 수익 글 보기

티스토리 수익 글 보기

Python XML processing with lxml

Abstract

Describes the lxml package for reading and writing XML files with the Python programming language.

This publication is available in Web form and also as a PDF document. Please forward any comments to tcc-doc@nmt.edu.

Table of Contents

1. Introduction: Python and XML
2. How ElementTree represents XML
3. Reading an XML document
4. Creating a new XML document
5. Modifying an existing XML document
6. Features of the etree module
6.1. The Comment() constructor
6.2. The Element() constructor
6.3. The ElementTree() constructor
6.4. The fromstring() function: Create an element from a string
6.5. The parse() function: build an ElementTree from a file
6.6. The ProcessingInstruction() constructor
6.7. The QName() constructor
6.8. The SubElement() constructor
6.9. The tostring() function: Serialize as XML
6.10. The XMLID() function: Convert text to XML with a dictionary of id values
7. class ElementTree: A complete XML document
7.1. ElementTree.find()
7.2. ElementTree.findall(): Find matching elements
7.3. ElementTree.findtext(): Retrieve the text content from an element
7.4. ElementTree.getiterator(): Make an iterator
7.5. ElementTree.getroot(): Find the root element
7.6. ElementTree.xpath(): Evaluate an XPath expression
7.7. ElementTree.write(): Translate back to XML
8. class Element: One element in the tree
8.1. Attributes of an Element instance
8.2. Accessing the list of child elements
8.3. Element.append(): Add a new element child
8.4. Element.clear(): Make an element empty
8.5. Element.find(): Find a matching sub-element
8.6. Element.findall(): Find all matching sub-elements
8.7. Element.findtext(): Extract text content
8.8. Element.get(): Retrieve an attribute value with defaulting
8.9. Element.getchildren(): Get element children
8.10. Element.getiterator(): Make an iterator to walk a subtree
8.11. Element.getroottree(): Find the ElementTree containing this element
8.12. Element.insert(): Insert a new child element
8.13. Element.items(): Produce attribute names and values
8.14. Element.iterancestors(): Find an element’s ancestors
8.15. Element.iterchildren(): Find all children
8.16. Element.iterdescendants(): Find all descendants
8.17. Element.itersiblings(): Find other children of the same parent
8.18. Element.keys(): Find all attribute names
8.19. Element.remove(): Remove a child element
8.20. Element.set(): Set an attribute value
8.21. Element.xpath(): Evaluate an XPath expression
9. XPath processing
9.1. An XPath example
10. Automated validation of input files
10.1. Validation with a Relax NG schema
10.2. Validation with an XSchema (XSD) schema
11. etbuilder: A simplified XML builder module
11.1. Using the etbuilder module
11.2. CLASS(): Adding class attributes
11.3. subElement(): Adding a child element
11.4. addText(): Adding text content to an element
12. Implementation of etbuilder
12.1. Features differing from Lundh’s original
12.2. Prologue
12.3. CLASS(): Helper function for adding CSS class attributes
12.4. subElement(): Add a child element
12.5. addText(): Add text content to an element
12.6. class ElementMaker: The factory class
12.7. ElementMaker.__init__(): Constructor
12.8. ElementMaker.__call__(): Handle calls to the factory instance
12.9. ElementMaker.__handleArg(): Process one positional argument
12.10. ElementMaker.__getattr__(): Handle arbitrary method calls
12.11. Epilogue
12.12. testetbuilder: A test driver for etbuilder

1. Introduction: Python and XML

With the continued growth of both Python and XML, there is a plethora of packages out there that help you read, generate, and modify XML files from Python scripts. Compared to most of them, the lxml package has two big advantages:

  • Performance. Reading and writing even fairly large XML files takes an almost imperceptible amount of time.

  • Ease of programming. The lxml package is based on ElementTree, which Fredrik Lundh invented to simplify and streamline XML processing.

lxml is similar in many ways to two other, earlier packages:

  • Fredrik Lundh continues to maintain his original ElementTree.

  • xml.etree.ElementTree is now an official part of the Python library. There is a C-language version called cElementTree which may be even faster than lxml for some applications.

However, the author prefers lxml for providing a number of additional features that make life easier. In particular, support for XPath makes it considerably easier to manage more complex XML structures.