XML Tutorial
XML Tutorial
Student Guide
Revision 4.0
Student Guide
Information in this document is subject to change without notice. Companies, names and data
used in examples herein are fictitious unless otherwise noted. No part of this document may be
reproduced or transmitted in any form or by any means, electronic or mechanical, for any
purpose, without the express written permission of Object Innovations.
Product and company names mentioned herein are the trademarks or registered trademarks of
their respective owners.
Object Innovations
877-558-7246
www.objectinnovations.com
Rev. 4.0
ii
Appendix A
Appendix B
Rev. 4.0
iii
Rev. 4.0
iv
Labs
The course relies on hands-on experience in various
topics and techniques.
Application code for this course is all in XmlCs under
the top-level directory, which by default is C:\OIC.
Where possible, starter code is provided to take work
off your hands that would be largely irrelevant to the
topic of the lab; thus you can be as productive as
possible in the time allotted and focus on the topic at
hand.
The labs are installed by running the simple selfextractor:
Install_XmlCs_40.exe
Rev. 4.0
Rev. 4.0
vi
vii
viii
ix
xi
Rev. 4.0
xii
XmlCs
Chapter 1
Chapter 1
Rev. 4.0
XmlCs
Chapter 1
Rev. 4.0
XmlCs
Chapter 1
XML
The eXtensible Markup Language, or XML, has
become a very popular choice for a wide array of
software applications:
Traditional web applications enhanced with XML as an
HTML transformation source
XML as a portable format for data exchange and archiving
Business-to-business messaging and Web Services
Many more
Rev. 4.0
XmlCs
Chapter 1
Parsing XML
At some point, however, the information in XML
documents must be available to application code.
At its most basic, the process by which an application
reads the information in an XML document is known
as parsing the document.
Clearly, the literal meaning of the term refers to the gritty
work of reading the document as a stream of characters, and
interpreting that stream according to XML grammar.
Stated another way, the parsing task might be seen as that of
abstracting the document content often called its
information set or infoset from its lexical representation in
XML proper.
This information set can then be read by application code,
using any number of possible models.
Document validation can also be performed as part of
parsing.
Rev. 4.0
XmlCs
Chapter 1
Rev. 4.0
XmlCs
Chapter 1
XML schemas
XSLT
SOAP (object serialization)
XmlCs
Chapter 1
Rev. 4.0
XmlCs
Chapter 1
Rev. 4.0
XmlCs
Chapter 1
Parsing Techniques
There are two traditional methods for parsing XML
streams, and each has advantages and disadvantages.
The Document Object Model (DOM).
The Simple API for XML (SAX). This API is not supported
by .NET and will not be discussed further.
Rev. 4.0
XmlCs
Chapter 1
Rev. 4.0
10
XmlCs
Chapter 1
11
XmlCs
Chapter 1
Rev. 4.0
12
XmlCs
Chapter 1
XmlWriter Example
In this example we use XmlWriter to create the XML
file that we read in earlier code. This code is in the
WriteCars() method.
XmlWriterSettings settings =
new XmlWriterSettings();
settings.Indent = true;
XmlWriter tw = XmlWriter.Create(xmlPath, settings);
//Opens the document
tw.WriteStartDocument();
//Write comments
tw.WriteComment("A lot of cars!");
//Write first element
tw.WriteStartElement("Dealership");
tw.WriteAttributeString("name", "Cars R Us");
tw.WriteStartElement("Car");
//Write the Make of the Car element
tw.WriteStartElement("Make");
tw.WriteString("AMC");
tw.WriteEndElement();
//Write one more element
tw.WriteStartElement("Model");
tw.WriteString("Pacer");
tw.WriteEndElement();
//... Shortened for brevity
tw.WriteStartElement("Price");
tw.WriteString("3998.99");
tw.WriteEndElement();
tw.WriteEndElement();
tw.WriteEndElement();
tw.WriteEndDocument();
tw.Close();
Rev. 4.0
//
//
//
//
end of car
end of dealership
end of document
close writer
13
XmlCs
Chapter 1
Rev. 4.0
14
XmlCs
Chapter 1
Rev. 4.0
15
XmlCs
Chapter 1
XmlDocument Example
The following code, found in the method
ParseWithTheDOM(), reads the XML file created in
the previous example.
This code uses the DOM parser with the XmlNode class.
XmlDocument doc = new XmlDocument();
doc.Load(xmlPath);
XmlNode root = doc.DocumentElement;
XmlNodeList list = root.SelectNodes("//*");
foreach ( XmlNode elem in list )
{
Console.WriteLine( elem.Name);
}
Rev. 4.0
16
XmlCs
Chapter 1
17
XmlCs
Chapter 1
LINQ to XML
Language-Integrated Query (LINQ) provides an
intuitive syntax for querying a variety of data sources
using C# and Visual Basic.
The query syntax is part of the programming
language, giving the advantages of strong typing and
tool support such as IntelliSense in Visual Studio.
LINQ provides a consistent API that can be used with
many different kinds of data, including .NET
collections, SQL Server databases and XML
documents.
LINQ to XML is a programming model for
manipulating XML documents using .NET languages.
It is similar in goals to the Document Object Model
(DOM) but lighter weight and easier to work with.
With respect to query capability, the programming model is
consistent with the model for other LINQ data sources.
Rev. 4.0
18
XmlCs
Chapter 1
Rev. 4.0
19
XmlCs
Chapter 1
20
XmlCs
Chapter 1
Summary
XML parsing is the cornerstone of .NET Framework
application development. Many higher-level
application capabilities can make use of XML
enabled features:
Cached/Non-cached, push model, syntax/validating type
parsers available
XML object serialization
XML messaging, for instance using SOAP
Rev. 4.0
21
XmlCs
Rev. 4.0
Chapter 1
22
XmlCs
Chapter 6
Chapter 6
Rev. 4.0
135
XmlCs
Chapter 6
Rev. 4.0
136
XmlCs
Chapter 6
Modifying Documents
In the previous chapter we focused exclusively on
using the XmlDocument as a parser.
The XmlDocument is actually a read/write class.
Nodes of all types have both accessors and mutators, and can
be modified, added and removed as children of other nodes.
In some cases content can be modified; some node types
have certain immutable properties that can only be changed
by removing the original node with a partially-modified
copy.
To create a new XML document, simply create an instance of
XmlDocument and start adding element, attribute or other
node(s). The DOM tree will be maintained in memory and
can be written to a file using the save method.
Rev. 4.0
137
XmlCs
Chapter 6
Rev. 4.0
138
XmlCs
Chapter 6
Rev. 4.0
139
XmlCs
Chapter 6
Rev. 4.0
140
XmlCs
Chapter 6
Modifying Documents
The XmlDocument class is key to this capability, since
it has all the factory methods for various node types
(the important ones are listed as follows) :
XmlElement CreateElement(string name);
XmlAttribute CreateAttribute(string name);
XmlText CreateTextNode(string text);
XmlComment CreateComment(string data);
void Save(destination);
// string, Stream, TextWriter, XmlWriter
Rev. 4.0
141
XmlCs
Chapter 6
Rev. 4.0
142
XmlCs
Chapter 6
143
XmlCs
Chapter 6
Managing Children
Use of the XmlNode class to add or remove child
elements is simple enough.
Additions can be managed using either
AppendChild(), InsertAfter(), or InsertBefore().
The choice between them is really a question of convenience
in a particular algorithm.
Each will assure uniqueness in the child list by first removing
the node if it is already in the list somewhere.
Rev. 4.0
144
XmlCs
Chapter 6
Cloning
The XmlNode class also provides the CloneNode()
method.
The DOM recommendation calls it a generic copy
constructor, imprecisely echoing C++ terminology.
Rev. 4.0
145
XmlCs
Chapter 6
Modifying Elements
There are several possible changes to an element.
The tag name is immutable; to change it you must replace the
element with a new one.
The character content of an element is captured in a separate
text node as a child of the element. Thus, changing this
means setting a new value on the child element.
The XmlCharacterData class includes a number of mutators
that allow the text to be modified:
string Data {get; set;}
void AppendData(string strData);
void InsertData(int offset, string strData);
void DeleteData(int offset, int count);
void ReplaceData(int offset, int count,
string strData);
Rev. 4.0
146
XmlCs
Chapter 6
Rev. 4.0
147
XmlCs
Chapter 6
Modifying Attributes
Many of the XmlElement class mutators concern
management of the elements attributes.
void SetAttribute(string name, string value);
void RemoveAttribute(string name);
XmlAttribute SetAttributeNode(
XmlAttribute newAttr);
XmlAttribute RemoveAttributeNode(
XmlAttribute oldAttr);
Rev. 4.0
148
XmlCs
Chapter 6
Lab 6
Shipping Information for Zenith Courseware
In this lab you will continue the PrintShipDOM program from the
previous chapter to create an XML file that specifies shipping and
handling charges for each destination. Very simple algorithms are
used for determining shipping and handling costs. You are
supplied a file Ship.cs that encapsulates these algorithms.
Detailed instructions are contained in the Lab 6 write-up in the Lab
Manual.
Suggested time: 60 minutes
Rev. 4.0
149
XmlCs
Chapter 6
Summary
Weve seen the DOM from both sides, now.
The DOM offers quite a lot as a parsing technology.
Now weve learned how to use it to modify existing
documents, and even to create new documents from scratch.
Rev. 4.0
150
XmlCs
Chapter 10
Chapter 10
Introduction to XSLT
Rev. 4.0
283
XmlCs
Chapter 10
Introduction to XSLT
Objectives
After completing this unit you will be able to:
Describe the origins of XSLT.
Distinguish XSLT as a rule-based language from
procedural languages used in application
programming.
Apply an XSLT transform to an XML source
document to produce a transformed document.
Use classes in the System.Xml.Xsl namespace to
perform XSLT transforms programmatically.
Rev. 4.0
284
XmlCs
Chapter 10
Rev. 4.0
285
XmlCs
Chapter 10
When one turns to XSLT for the first time, one often
has a specific problem in mind, which usually
involves extracting some source document content
and reshaping it: filtering, sorting, reformatting, etc.
To do whatever you want with XSLT requires a
thorough knowledge of the subject, which is beyond
the scope of this chapter.
Well introduce the subject in this chapter, by
examining the transformation process, the rules by
which templates are matched to source content, and
the basic means of generating output.
Most of our output will be staticthat is, written into the
transform, rather than extracted from the source document.
This will enable us to develop a good sense of control over
the process and the look of the output, with few distractions.
Rev. 4.0
286
XmlCs
Chapter 10
Rule-Based Transformation
XSLT is a rule-based language.
If XSLT were a procedural language, then a transformation
would be described as a process, with steps in a certain order:
get this element, read this value, write that attribute, etc.
As a rule-based language, XSLT instead defines a
transformation as a series of rules, each of which dictates
what output to produce based on certain types of input, if
found.
The primary means of expressing a rule in XSLT is the
template.
To apply a transformation is to look for elements in the
source document that match the templates, and then to apply
the output directives in that template based on the matching
element and its content.
Rev. 4.0
287
XmlCs
Chapter 10
288
XmlCs
Chapter 10
Result.xml
Transform.xsl
Rev. 4.0
289
XmlCs
Chapter 10
Referencing a Stylesheet
It is possible for an XML document to directly
reference an external .xsl file, as a stylesheet.
This is most useful when XSLT is being used to generate
HTML for presentation.
The browser would not otherwise know that a transformation
(stylesheet) were to be applied.
Source.xml
HTML
Browser
HTML
Presentation
Stylesheet.xsl
Rev. 4.0
290
XmlCs
Chapter 10
Templates
A template is defined using the top-level element
xsl:template.
To function as a rule, a template must define two
things:
What elements to use as source materialthis is defined in
the match attribute
What to produce based on these elementsthis is the
template content itself, some combination of XSLT elements,
other child elements, and text
Rev. 4.0
291
XmlCs
Chapter 10
Rev. 4.0
292
XmlCs
Chapter 10
293
XmlCs
Chapter 10
Transform Tab
The Transform tab allows you to manage an XSLT
transform or stylesheet.
Rev. 4.0
294
XmlCs
Chapter 10
Result Tab
The Result tab shows the results of the transform.
Clicking the Result tab triggers the transformation,
so it is performed just in time to view it.
Rev. 4.0
295
XmlCs
Chapter 10
Transform Examples
In the Chemistry folder in the Data directory, there
are a number of different transforms that operate on
the document PeriodicTable.xml.
This document expresses a great deal of information about
the periodic table of chemical elements:
Load this document as the source. It will take a while.
Rev. 4.0
296
XmlCs
Chapter 10
HTML Transform
First well look at a transform that produces a
bulleted list of elements.
In the Transform tab, load HTML.xsl:
Rev. 4.0
297
XmlCs
Chapter 10
You can see the raw text by clicking the TXT radio button:
Rev. 4.0
298
XmlCs
Chapter 10
XML Transform
The XML.xsl transform filters the source document to
the first 54 elements, and sorts them as the HTML
one does.
However this one creates a deep copy of each element,
producing a smaller, but similar, XML document:
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="xml" indent="yes" />
<xsl:template match="/" >
<PERIODIC_TABLE>
<xsl:apply-templates
select="//ATOM[ATOMIC_NUMBER < 55]" >
<xsl:sort select="ATOMIC_NUMBER"
order="ascending" data-type="number" />
</xsl:apply-templates>
</PERIODIC_TABLE>
</xsl:template>
<xsl:template match="ATOM" >
<xsl:copy-of select="." />
</xsl:template>
</xsl:transform>
Rev. 4.0
299
XmlCs
Chapter 10
Rev. 4.0
300
XmlCs
Chapter 10
Rev. 4.0
301
XmlCs
Chapter 10
MeltBoil.xsl
Rev. 4.0
302
XmlCs
Chapter 10
Rev. 4.0
303
XmlCs
Chapter 10
Rev. 4.0
304
XmlCs
Chapter 10
Rev. 4.0
305
XmlCs
Chapter 10
Browser Display
To see the result of applying the style sheet in the
browser, double click on the XML file courses-w.xml.
Rev. 4.0
306
XmlCs
Chapter 10
Rev. 4.0
307
XmlCs
Chapter 10
Rev. 4.0
308
XmlCs
Chapter 10
Sample Program
The program XTran uses a stylesheet to transform an
XML document.
File names are entered at the command line.
string source, sheet;
if (args.Length != 2)
{
Console.WriteLine("Requires two arguments:");
Console.WriteLine("
XSL styleheet");
Console.WriteLine("
XML document");
return;
}
sheet = args[0];
source = args[1];
Rev. 4.0
309
XmlCs
Chapter 10
Lab 10
A Simplified XSLT Console
In this lab you will use .NET Framework XML classes to
implement a simplified version of the XSLT console tool. Like the
full-blown tool weve used in this chapter, your program will have
a tabbed user interface. It will simply allow you to load XML and
XSL files from the current directory and perform the transform.
You are provided with a starting UI.
Rev. 4.0
310
XmlCs
Chapter 10
Summary
XSLT was originally designed to support XSL.
It was never intended to be a general-purpose
transformations language, but it has become the defacto standard nonetheless.
And it is an excellent solution!
It is however an unusual language, and particularly
tricky for programmers of structured and objectoriented languages to learn.
It is first and foremost based on matching rules.
Although it has procedural aspects, it is not a programming
language, and it is a mistake to approach it as such.
Rev. 4.0
311
XmlCs
Rev. 4.0
Chapter 10
312