티스토리 수익 글 보기

티스토리 수익 글 보기

Python XML processing with lxml

Next / Previous / Contents / TCC Help System / NM Tech homepage

Python XML processing with `lxml`

Abstract

Describes the lxml package for reading and writing XML files with the Python programming language.

This publication is available in Web form and also as a PDF document. Please forward any comments to tcc-doc@nmt.edu.

Table of Contents

1. Introduction: Python and XML

2. How ElementTree represents XML

3. Reading an XML document

4. Creating a new XML document

5. Modifying an existing XML document

6. Features of the etree module

6.1. The Comment() constructor
6.2. The Element() constructor
6.3. The ElementTree() constructor
6.4. The fromstring() function: Create an element from a string
6.5. The parse() function: build an ElementTree from a file
6.6. The ProcessingInstruction() constructor
6.7. The QName() constructor
6.8. The SubElement() constructor
6.9. The tostring() function: Serialize as XML
6.10. The XMLID() function: Convert text to XML with a dictionary of id values

7. class ElementTree: A complete XML document

7.1. ElementTree.find()
7.2. ElementTree.findall(): Find matching elements
7.3. ElementTree.findtext(): Retrieve the text content from an element
7.4. ElementTree.getiterator(): Make an iterator
7.5. ElementTree.getroot(): Find the root element
7.6. ElementTree.xpath(): Evaluate an XPath expression
7.7. ElementTree.write(): Translate back to XML

8. class Element: One element in the tree

8.1. Attributes of an Element instance
8.2. Accessing the list of child elements
8.3. Element.append(): Add a new element child
8.4. Element.clear(): Make an element empty
8.5. Element.find(): Find a matching sub-element
8.6. Element.findall(): Find all matching sub-elements
8.7. Element.findtext(): Extract text content
8.8. Element.get(): Retrieve an attribute value with defaulting
8.9. Element.getchildren(): Get element children
8.10. Element.getiterator(): Make an iterator to walk a subtree
8.11. Element.getroottree(): Find the ElementTree containing this element
8.12. Element.insert(): Insert a new child element
8.13. Element.items(): Produce attribute names and values
8.14. Element.iterancestors(): Find an element’s ancestors
8.15. Element.iterchildren(): Find all children
8.16. Element.iterdescendants(): Find all descendants
8.17. Element.itersiblings(): Find other children of the same parent
8.18. Element.keys(): Find all attribute names
8.19. Element.remove(): Remove a child element
8.20. Element.set(): Set an attribute value
8.21. Element.xpath(): Evaluate an XPath expression

9. XPath processing

9.1. An XPath example

10. Automated validation of input files

10.1. Validation with a Relax NG schema
10.2. Validation with an XSchema (XSD) schema

11. etbuilder: A simplified XML builder module

11.1. Using the etbuilder module
11.2. CLASS(): Adding class attributes
11.3. subElement(): Adding a child element
11.4. addText(): Adding text content to an element

12. Implementation of etbuilder

12.1. Features differing from Lundh’s original
12.2. Prologue
12.3. CLASS(): Helper function for adding CSS class attributes
12.4. subElement(): Add a child element
12.5. addText(): Add text content to an element
12.6. class ElementMaker: The factory class
12.7. ElementMaker.__init__(): Constructor
12.8. ElementMaker.__call__(): Handle calls to the factory instance
12.9. ElementMaker.__handleArg(): Process one positional argument
12.10. ElementMaker.__getattr__(): Handle arbitrary method calls
12.11. Epilogue
12.12. testetbuilder: A test driver for etbuilder

1. Introduction: Python and XML

With the continued growth of both Python and XML, there is a plethora of packages out there that help you read, generate, and modify XML files from Python scripts. Compared to most of them, the lxml package has two big advantages:

Performance. Reading and writing even fairly large XML files takes an almost imperceptible amount of time.
Ease of programming. The lxml package is based on ElementTree, which Fredrik Lundh invented to simplify and streamline XML processing.

lxml is similar in many ways to two other, earlier packages:

Fredrik Lundh continues to maintain his original ElementTree.
xml.etree.ElementTree is now an official part of the Python library. There is a C-language version called cElementTree which may be even faster than lxml for some applications.

However, the author prefers lxml for providing a number of additional features that make life easier. In particular, support for XPath makes it considerably easier to manage more complex XML structures.

Next: 2. How ElementTree represents XML

Help: Tech Computer Center: Help System

Home: About New Mexico Tech

John W. Shipman

Comments welcome: tcc-doc@nmt.edu

Last updated: 2010-12-01 13:27

URL: http://www.nmt.edu/tcc/help/pubs/pylxml/index.html