IN A NUTSHELL
A Desktop Quick Reference
O'REILLY® Elliotte Rusty Harold & W Scott Means
001 ServiceNow, Inc.'s Exhibit 1010
XML in a Nutshell
by Elliotte Rusty Harold and W. Scott Means
Copyright © 2001 O'Reilly & Associates, Inc. All rights reserved.
Printed in the United States of America.
Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.
Contributor: Stephen Spainhour
Editors: Laurie Petrycki and John Posner
Production Editor: Ann Schirmer
Cover Designer: Ellie Volckhausen
Printing History:
january 2001: First Edition.
Nutshell Handbook, the Nutshell Handbook logo, and the O 'Reilly logo are registered
trademarks of O'Reilly & Associates, Inc. The association of the image of a peafowl
and the topic of XML is a trademark of O'Reilly & Associates, Inc.
Many of the designations used by manufacturers and sellers to distinguish their
products are claimed as trademarks. Where those designations appear in this book,
and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations
have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher
assumes no responsibility for e rrors or omissions, or for damages resulting from the
use of the information contained herein.
Library of Congress Cataloging-in-Publication Data can be found at:
http:l/[Link]/cataloglxmlnut.
ISBN: 0-596-00058-8
[M] [8/01]
002 ServiceNow, Inc.'s Exhibit 1010
Table of Contents
Preface ....................................................................................................... xi
Part L· XML Concepts
Chapter 1-Introducing XML .. .. .. .. .. .. .. ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... . 3
What XML Offers .................... ........................ ........................................ 3
Portable Data .......................................................................................... 6
How XML Works ....... ............................................................................. 6
The Evolution of XML ............................................................................ 8
Chapter 2-XML Fundamentals .. .. .. ... .. ... .. ... .. .. .. ... .. .. .. .. .. ... .. ... .. ... 11
XML Documents and XML Files .. .... ... .. ... .... ...... ..... .... .. .. .. .... .. .. .. .. ... .. .. 11
Elements, Tags, and Character Data .................................................... 12
Attributes ............................................................................................... 15
XML Names .... .... .. .. .. .. .. .. ...... ...... ... ... .. ... ... . .. ... .. ...... ..... ... .. ... .. ..... ... ... ..... 17
Entity References .... ... .. .. .. .. ... ... .. ... .. .. .. .. .... .. ... .. .. .... .. .. .... .. .... .. ... ...... .... .. 18
CDATA Sections ................................................................................... 19
Comments ... ... .... .. .... .. .... ..... ... .... .. ..... ..... .. ... ..... ...... .... .... .. .... ..... ...... ...... 20
Processing Instructions ......................................................................... 20
The XML Declaration ........................................................................... 21
Checking Documents for Well-Formedness ................... ......... ........... 23
v003 ServiceNow, Inc.'s Exhibit 1010
Chapter 3 -Document Type Definitions .................................... 26
Validation ....................................................................... ... .................... 26
Elen1ent Declarations ........................................................................... 34
Attribute Declarations ...................... ............................ ................... ...... 39
General Entity Declarations ...... ........................................................... 46
External Parsed General Entities ........................... .... ....... .............. . ..... 48
External Unparsed Entities and Notations ........................................... 49
Parameter Entities .. .......... ................... ..... ................. ........ .................. .. 51
Conditional Inclusion ........................................................................... 53
Two DTD Examples .. .... ..... .... .. ... .. ... ... .. ..... .. .. ....... .. .. ...... .... ... ... .. ... .. .. .. 54
Locating Standard DTDs ...................................................................... 56
Chapter 4 -Namespaces .................................................................. 58
The Need for Namespaces .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 58
Namespace Syntax ................................ .......................... ... ................... 61
How Parsers Handle Namespaces ....................................................... 66
Namespaces and DTDs ........................................................................ 67
Chapter 5-Internationalization .... ... ............ .. ....... .. ..... .............. 69
The Encoding Declaration .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 69
Text Declarations .................................................................................. 70
XML-Defined Character Sets ................................................................ 71
Unicode ................................................................................................. 72
ISO Character Sets ................................................................................ 74
Platform-Dependent Character Sets ..................................................... 75
Converting Between Character Sets .................................................... 76
The Default Character Set for XML Documents .................................. 77
Character References ............................. ...... ............... ..................... ..... 78
xml:lang ................. .. ............... ..................... ....................... ............ ...... 81
Part II· Narrative-Centric Documents
Chapter 6-XML as a Document Format .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 85
SGML's Legacy ....... .. ........................................................... .. ................ 85
Narrative Document Structures ..................... .... .......................... ........ . 86
TEl ......................................................................................................... 88
DocBook ............................................................................................... 91
Documen t Permanence .............................................. ..................... ..... 94
Transformation and Presentation ......................................................... 96
vi Table of Contents 004 ServiceNow, Inc.'s Exhibit 1010
Chapter 7- XML on the Web ............ 00 00 .•...•..... oo 00 .. oo ..... ......... 00 00 ... . . 98
XliTML .................................................................................................. 99
Direct Display of XML in Browsers ................................................... 105
Authoring Compound Documents with Modular XliTML ............... 110
Prospects for Improved Web Search Methods ................................. 124
Chapter 8 - XSL Transformations .............................................. 129
An Example Input Document ............................................................ 129
xsl:stylesheet and xsl:transform ......................................................... 130
Stylesheet Processors .. ... ........ ........... ......... ..................... .. .... ........... ... 132
Templates .. .................. ............................... ..................... ................. ... 133
Calculating the Value of an Element with xsl:value-of ...... .. .. .... ... ... 134
Applying Templates with xsl:apply-templates .................................. 135
The Built-in Template Rules ......... ...... ... ..... .. .... ... ... .... ...... .. ... ........ ... . 138
Modes ····· ························oo·······························································oo·· 142
Attribute Value Templates .. 00 00 ... 00.00 .. 00 ......... 00 ... 00 ...... 00........................ 144
XSLT and Namespaces ......................................................... 00............ 144
Other XSLT Elements . ............................... ..... .................................... 146
Chapter 9 - XPatb 00 ........................ 00 00 ............. 00 .... 00 ... 00..................... 14 7
The Tree Structure of an XML Document .. 00 .. 00 00 .. 00 .. 00 00 ... 00 .... 00 .. 00 00 00.. 147
Location Paths ....... 00 ...................... ........................ 00. ... . . . . .. . .. .. . . .. . . . ... . . . 150
Compound Location Paths oooooo ................ oo .. oo ........................ 00 ........... 155
Predicates ................... ......................................................................... 157
Unabbreviated Location Paths ............................... 00 .......................... 158
General XPath Expressions .............. oo .............................. oo .. oo .. oo .... oo .. 160
XPath Functions .................................................................................. 163
Chapter 1 0- XLinks .... .... .. ...... ...... ...... ..... ..... ..... ...... .... .. .... ..... ...... .. 168
Simple Links ........................................................................................ 169
Link Behavior .. .. ..... ..... ..... .. .... .. ... .. .. ... .. ... .. .. .. .. .. .. .. .... .... .. ... .. .. .. .... .. .... 170
Link Semantics .. ...... ..... .. ... .. ...... ...... .... .. .... .. .... ..... .... .. .... .. .. .. .. ... ..... ..... 173
Extended Links .. ..... ............ ....... ..... ...... ..... ..... ........ ... .... .. .... ..... ...... .... 173
Linkbases ............................................................................................ 180
DTDs for XLinks .................................................... 00........................... 181
Chapter 11- XPointers ..... ...... ........ .. ... .......... .... ...... .......... .. .... .... ... 182
XPointers on 1JRLs ... ...... .. .. .. ... .. ..... .. ... .. ...... .... ...... .... .. ... .. .. .. .. .. .. .. .. ..... 182
XPointers in Links .. ... ..... .. .. .. ... .. ... ... .... .. .... .. ... .. ..... ... .. ... . ... ... .. .. .. .... ..... 184
Bare Names .......................................................................... 00 .. 00 .. 00 ... 00 185
005 ServiceNow,
Table ofInc.'s Exhibit
Contents vii 1010
Child Sequences ................................................................................. 186
Points ....... ... ..... .. ...... .. ..... .. ....... .. .. .... .. ... .. .. .. .... .. .. ... .. . .. .. ... ... .. ....... .. ..... 186
Ranges .... .. ... ... .. ....... .. ...... .. ....... ........ ........ ....... .. .... ... . .. ..... ... .. ....... .. ..... 189
Chapter 12-Cascading Stylesheets (CSS) ............................... 191
The Three Levels of CSS .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 193
CSS Syntax .. ... .... .. .. .. ... .. ....... .. .... .. .... ... .. .... ... ... .... .. .. .. .... .. .. ...... .. ..... .. ... 193
Associating Stylesheets with XML Documents .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 195
Selectors .............................................................................................. 197
The Display Property .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 200
Pixels, Points, Picas, and Other Units of Length .............................. 201
Font Properties ................................................................................... 202
Text Properties ................................................................................... 203
Colors ..... .. ...... .. ..... . .. ... ... .. ....... .. .... .. ...... .. .. ...... .. .. ...... . .. ....... ... ...... .. .. ... 204
Chapter 13-XSL Formatting Objects (XSL-FO) ................... 206
XSL Formatting Objects ...................................................................... 208
The Structure of an XSL-FO Document ............................................. 209
Master Pages .. ...... .. ..... .. ....... .. .... .. .. .... .. . ...... .. ... ....... ... ...... .......... .. .. ..... 210
XSL-FO Properties . .. ... .. ...... ....... .. .... .. .. . ...... .. ........ ..... ..... .. ......... ... . .. .. . 216
Choosing Between CSS and XSL-FO .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 221
Part Ill' Data-Centric Documents
Chapter 14-XML as a Data Format ........................................ 225
Programming Applications of XML .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. 225
Describing Data .... . .. ..... ......... ....... .. ...... ... ... .... ....... .. .. ...... . ... ........ ... .... 227
Support for Programmers .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. 229
Chapter 15-Programming Models .......................................... 23 0
Event- Versus Object-Driven Models .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. 230
Programming Language Support ....................................................... 231
Non-Standard Extensions ................ ...... .................... ..................... .... 232
Transformations ........ .. ........ ...... ... ..... .. .. . .. ... ....... ........ ..... ........ .. ...... ... . 232
Processing Instructions .......................... .................... ..................... .... 233
Links and References .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 233
Notations ............................................................................................. 234
What You Get Is Not What You Saw ................................................ 234
viii Table of Contents 006 ServiceNow, Inc.'s Exhibit 1010
Chapter 16- Document Object Model (DOM) .... .................. 23 6
DOM Core ....................................... .......... ....................................... ... 237
DOM Strengths and Weaknesses ... ............................................... ..... 237
Parsing a Document with DOM .... ...... ..... ..... ....... ............... .............. 238
The Node Interface ... . ... .. ... .. .. ...... .. ... ...... .. .. ... ... .. .... .. ... .. .. .. ... .. .. .. .. .. .. . 238
Specific Node Types .. . .. .. ... .. .. ... .. ... .. .. .. .. .. .. .. .... .. .. .... .. .. .. . .. .. .. .. .. .. .. .. .. .. 240
The DOMimplementation Interface ... .. .. .. .. ... .. .. .. .. ... .. ... .. .. .. .. ... .. ... .. .. 245
A Simple DOM Ap plication ... ......... ..... ... .. .................... ................ ..... 245
Chapter 17- SAX .............................................................................. 250
The ContentHandler Interface .. ... .. .. ... .. .. .. .. ... .. .. ... .. .. ... .. .. ... .. ... .. .. .. .. .. 252
SAX Features and Properties .. .. .. ... ... ... .. ... . .. ... .. ... .. .. .. .. .. .. ... .. .. .. .. .. .. . .. . 259
Part IV: Reference
Chapter 18-XML 1. 0 Refe rence .............................. ................... 265
How to Use This Reference ..................... .................. ... .................. ... 265
Annotated Sample Documents· ... .. .. .. ... .. .. ... .. .. ... ... .. .. .. ... .. .. .. .. .. ... ... . .. . 265
Key to XML Syntax .... ... ... ..... ..... ... .. .. .. ..... . ...... ..... ... .. .... . ... .... .. .. ... .. ..... 266
Well-Formedness ................................................................................ 266
Validity .. ........ ....................... ..... ....... ..... ......... ...... ..... ..................... ..... 273
Global Syntax Structures ................ ............... ...... .......... ....... ......... ..... 279
DTD (Document Type Definition) .. .. .. .. ... ... ... .. ... .. ... .. .. ... .. .. .. .. .. .. .. .. .. 285
Document Body .. .. ... . .... .. ..... .. ... .. .. .... .. ..... ... .. .. .. .. .... .... .. ...... .. .. .... .... ... 294
XML Document Grammar .............. ............... ........................ ............. 295
Chapter 19- XPath Reference .. .. .. ... .. .. .. .. .. .. .. .. .. .. .. .... .. ... .. .... .. ... .. 299
The XPath Data Model .................. ..................................................... 299
Datatype ............ ........ ..... ............ .. .............................. ......... ... ... .......... 300
Location Paths ............................. .... ................... .. ...... ..... .................... 301
Predicates .. .. .... ... .. .. .. .. .. .. .. ... ... .. .. .. .. .... .. .. .. . .. .... .. .. .. ... .... .. .. .... . .. .. ... .. .. .. . 305
XPath Functions ....... ..... ......................... .... ............... ........... ............... 305
Chapter 20-XSLT Reference ........................................................ 3 15
The XSLT Namespace .................. :....................................... .............. 315
XSLT Ele n1ents ... ...... ...... ..... .. ...................... ............... ....... ............. ..... 315
XSLT Functions ................................. .................................................. 339
007 ServiceNow, Inc.'s
Table of Exhibit
Contents ix 1010
Chapter 21 - DOM Reference ....................................................... 345
Object Hierarchy . ............................... ..................... ....... .... ......... ....... 346
Object Reference ................................ ..................... ....... .... ......... ....... 346
Chapter 22-SAX Reference .......................................................... 400
The [Link] Package ... ................ ................. .... ....... ............. ....... 400
The [Link] Package ...................................................... 407
SAX Features and Properties .............................................................. 413
The [Link] Package ............................................................. 415
Chapter 23-Character Sets .......................................................... 417
Character Tables ................................................................................. 419
HTML4 Entity Sets .............................. ...... ......... .. .... .................... .... ... 424
Other Unicode Blocks ........................................................................ 432
Index ...................................................................................................... 459
x Table of Contents 008 ServiceNow, Inc.'s Exhibit 1010
Preface
XML is one of the most important developments in document syntax in the history
of computing. In the last few years it has been adopted in fields as diverse as law,
aeronautics, finance, insurance, robotics, multimedia, hospitality, travel, art,
construction, telecommunications, software design, agriculture, physics, journalism,
theology, retail, and medieval lite rature. XML has become the syntax of choice for
newly designed document formats across almost all computer applications. It's used
on Linux, Windows, Macintosh, and many other computer platforms. Mainframes
on Wall Street trade stocks w ith one another by exchanging XML documents. Chil-
dren playing games on their home PCs save their documents in XML. Sports fans
receive real-time game scores on their cell phones in XML. XML is simply the most
robust, reliable, and flexible document syntax ever invented.
XML in a Nutshell is a comprehensive guide to the rapidly growing world of XNIL.
It covers all aspects of XML, from the most basic syntax rules, to the details of
DTD creation, to the APis you can use to read and write XML documents in a
variety of programming languages.
What This Book Covers
There are hundreds of formally established XML applications from the W3C and
other standards bodies, such as OASIS and the Object Management Group. There
a re even more informal, unstandardized applications from individuals and corpora-
tions, such as Microsoft's Channel Definition Format and John Guajardo's Mind
Reading Markup Language. This book cannot cover them all, any more than a
book on Java could discuss every program that has ever been or might ever be
written in Java. This book focuses primarily on XML itself. It covers the funda-
mental rules that all XML documents and authors must adhere to, whether a web
designer uses SMIL to add animations to web pages or a C++ programmer uses
SOAP to serialize objects into a remote database.
009 ServiceNow, Inc.'s Exhibit 1010
This book also covers generic supporting technologies that have been layered on
top of XML and are used across a wide range of XML applications. These technol-
ogies include:
XL inks
An attribute-based syntax for hyperlinks between XML and non-XML docu-
ments that provide the simple, one-directional links familiar from HTML,
multidirectional links between many documents, and links between docu-
ments you don't have write access to.
XSLT
An XML application that describes transformations from one document to
another, in either the same or different XML vocabularies.
XPointers
A syntax for identifying particular parts of an XML document referred to by a
URI; often used in conjunction with an XLink.
XPath
A non-XML syntax used by both XPointers and XSLT for identifying particular
pieces of XML documents. For example, an XPath can locate the third
address element in the document, or all elements with an email attribute
whose value is elharo@rnetalab. unc. edu.
Namespaces
A means of distinguishing between elements and attributes from different XML
vocabularies that have the same name; for instance, the title of a book and
the title of a web page in a web page about books.
SAX
The Simple API for XML, an event-based Java application programming inter-
face implemented by many XML parsers.
DOM
The Document Object Model, a tree-oriented API that treats an XML docu-
ment as a set of nested objects with various properties.
All these technologies, whether defined in XML (XLinks, XSLT, and Namespaces)
or in another syntax (XPointers, XPath, SAX, and DOM), are used in many
different XML applications.
This book does not specifically cover XML applications that are relevant to only
some users of XML. These include:
SVG
Scalable Vector Graphics is a W3C-endorsed standard used for encoding line
drawings in XML.
MatbML
The Mathematical Markup Language is a W3C-endorsed standard XML appli-
cation used for embedding equations in web pages and other documents.
xii Preface 010 ServiceNow, Inc.'s Exhibit 1010
CML
The Chemical Markup Language was one of the first XML applications. It
describes chemistry, solid-state physics, molecular biology, and the other
molecular sciences.
RDF
The Resource Description Framework is a W3C-standard XML application
used for describing resources, with a particular focus on the son of meradara
one might find in a library card catalog.
CDF
The Channel Definition Framework is a nonstandard, Microsoft-defined X1v1L
application used to publish web sites to Internet Explorer for offline browsing.
Occasionally we use one or more of these applications in an example, but we do
not cover all aspects of the relevant vocabulary in depth. While interesting and
important, these applications (and hundreds more like them) are intended prima-
rily for use with special software that knows their format intimately. For instance,
graphic designers do not work directly with SVG. Instead, they use their customary
tools, such as Adobe Illustrator, to create SVG documents. They may not even
know they're using XML.
This book focuses on standards that are relevant to almost all developers working
with XML. We investigate Xlvll technologies that span a wide range of XlviL appli-
cations, not those that are relevant only within a few restricted domains.
Organization of the Book
Part I, XML Concepts, introduces you to the fundamental standards that form the
essential core that all XML applications and software must adhere to. It teaches
you about well-formed XML, DTDs, namespaces, and Unicode as quickly as
possible.
Part II, Narrative-Centric Documents, explores technologies that are used mostly
for narrative XML documents, such as web pages, books, articles, diaries, and
plays. You'll learn about XSLT, CSS, XSL-FO, XLinks, XPointers, and XPath.
One of the most unexpected developments in XML was its enthusiastic adoption
of data-heavy structured documents such as spreadsheets, financial statistics, math-
ematical tables, and software file formats. Part III, Data-CenMc XML, explores the
use of XML for such data-intensive documents. This part focuses on the tools and
APis needed to write software that process XML, including SAX, the Simple API for
XML, and the W3C's Document Object Model.
Finally, Part IV, Reference, is a series of quick-reference chapters that form the
core of any Nutshell handbook. These chapters give you detailed syntax rules for
the core XML technologies, including XML, DTDs, XPath, XSLT, SAX, and DOM.
Turn to this section when you need to quickly find out the precise syntax for
something you know you can do but don't remember exactly how to do.
011 ServiceNow, Inc.'s Exhibit
Preface xiii 1010
Conventions Used in This Book
Body text, Jjke the text you're reading now, is written in Garamond.
Constant width is used for:
• Code examples and fragments.
• Anything that might appear in an XML document, including element names,
tags, attribute values, entity references, and processing instructions.
• Anything that might appear in a program, including keywords, operators,
method names, class names, and literals.
Constant-width bold
• User in put.
• Signifies emphasis should be deleted.
Constant-width italic is used for:
• Replaceable elements in code statements.
Italic is used for:
• New terms where they are defined.
• Pathnames, filenames, and program names. (However, if the program name is
also the name of a Java class, it is written in constant-width font, like other
class names.)
• Host and domain names ([Link]).
• URLs ([Link]
Significant code fragments, complete programs, and documents are generally
placed into a separate paragraph like this:
<?xrnl version="l.O"?>
<?xrnl-stylesheet href="[Link]" type="text/css"?>
<person>
Alan Turing
</person>
XML is case sensitive. The PERSON element is not the same thing as the person or
Person element. Case-sensitive languages do not always allow authors to adhere
to standard English grammar. It is usually possible to rewrite the sentence so the
two do not conflict, and when possible we have endeavored to do so. However,
on rare occasions when there is simply no way around the problem, we let stan-
dard English come up the loser.
Finally, although most of the examples used here are toy examples unlikely to be
reused, a few have real value. Please feel free to reuse them or any parts of them
in your own code. No special permission is required. As far as we are concerned,
they are in the public domain (though the same is definitely not true of the
explanatory text).
xiv Preface 012 ServiceNow, Inc.'s Exhibit 1010
Request for Comments
We enjoy hearing from readers with general comments about how this book could
be better, specific corrections, or topics you would like to see covered. You can
reach the authors by sending email to elharo@[Link] and
smeans@[Link]. Please realize, however, that we each
receive several hundred pieces of email a day and cannot resp ond to every one
personally. For the best chances of getting a personal response, please identify
yourself as a reader of this book. And please send the message from the account
you want us to reply to and make sure that your Reply-to address is properly set.
There's nothing quite so frustrating as spending an hour or more carefully
researching the answer to an interesting question and composing a detailed
response, only to have it bounce because the correspondent sent the message
from a public terminal and neglected to set the browser preferences to include
their actual email address.
The information in this book has been tested and verified, but you may fmd that
features have changed (or you may even fmd mistakes). We believe the old
saying, '·If you like this book, tell your friends. If you don't like it, tell us." We're
especially interested in hearing about mistakes. As hard as the authors and editors
worked on this book, inevitably there are a few mistakes and typographical errors
that slipped by us. If you find a mistake or a typo, please let us know so we can
correct it. You can send any errors. you find, as well as suggestions for future
editions, to:
O'Reilly & Associates, Inc.
101 Morris Street
Sebastopol, CA 95472
1-800-998-9938 (in the United States or Canada)
1-707-829-0515 (intemationaVlocal)
1-707-829-0104 (fax)
We have a web site for the book, where we list errata, examples, and any addi-
tional information. You can access this site at:
hllp:l! www. oreilly. com/catalog!xm In ut
Before reporting errors, please check this web site to see if we already posted a
fix. To ask technical questions or comment on the book, send email to:
bookquestions®01·eilly. com
For more information about our books, conferences, software, Resource Centers,
and the O'Reilly Network, see our web site at:
[Link]
Acknowledgments
Many people were involved in the production of this book. The original editor,
John Posner, got this book rolling and provided many helpful comments that
substantially improved the book. When John moved on, Laurie Petrycki shep-
herded this book to its completion. Stephen Spainhour deserves special thanks for
013 ServiceNow, Inc.'s Exhibit
Preface xv 1010
his work on the reference section. His efforts in organizing and reviewing material
helped create a better book. We'd like to thank Matt Sergeant and Didier P. H.
Martin for their thorough technical review of the manuscript and thoughtful
suggestions.
We'd also like ro thank everyone who has worked so hard to make XML such a
success over the last few years and thereby give us something to write about. There
are so many of these people that we can only list a few. In alphabetical order we'd
like to thank Tim Berners-Lee, Jon Bosak, Tim Bray, James Clark, Charles Gold-
farb, Jason Hunter, Michael Kay, Brett McLaughlin, David Megginson, David
Orchard, Walter E. Perry, Simon St. Laurent, C. M. Sperberg-McQueen, James
Tauber, B. Tommie Usdin, and Mark Wutka. Our apologies to everyone we unin-
tentionally omitted.
Elliotte would like to thank his agent, David Rogelberg, who convinced him that it
was possible to make a living writing books like this rather than working in an
office. The entire Sunsite crew (now [Link]) has also helped him to communi-
cate better with his readers in a variety of ways over the last several years. All
these people deserve much thanks and credit. Finally, as always, he offers his
largest thanks to his wife, Beth, without whose love and support this book would
never have happened.
Scott would most like to thank his lovely wife, Celia, who has already spent way
too much time as a "computer widow." He would also like to thank his daughter
Selene for understanding why Daddy can't play with her when he's "working,"
and Skyler for just being himself. Also, he'd like to thank the team at Enterprise
Web Machines for helping him make time to write. Finally, he would like to thank
John Posner for getting him into this and Laurie Petrycki for working with him
when things got tough.
Elliotte Rusty Harold
elharo@[Link]
W. Scott Means
smeans@[Link]
xvi Preface 014 ServiceNow, Inc.'s Exhibit 1010
CHAPTER 1
Introducing XML
XML, the Extensible Markup Language, is a W3C-endorsed standard for document
markup. It defines a generic syntax used to mark up data with simple, human-
readable tags. It provides a standard format for computer documents. This format
is [Link] enough to be customized for domains as diverse as web sites, elec-
tronic data interchange, vector graphics, genealogy, real estate listings, object
serialization, remote procedure calls, and voice mail systems.
You can write your own programs that interact with, massage, and manipulate
data in XML documents. lf you do, you'll have access to a wide range of free
libraries in a variety of languages that can read and write XML so that you can
focus on the unique needs of your program. Or you can use off-the-shelf software
like web browsers and text editors to work with XML documents. Some tools are
able to work with any XML document. Others are customized to support a partic-
ular XML application in a particular domain like vector graphics and may not be of
much use outside that domain. But in all cases, the same underlying syntax is
used, even if it's deliberately hidden by more user-friendly tools or restricted to a
single application.
What XML Offers
XML is a meta-markup language for text documents. Data is included in XML
documents as strings of text, and the data is surrounded by text markup that
describes the data. A particular unit of data and markup is called an element. The
XML specification defines the exact syntax this markup must follow: how elements
are delimited by tags, w hat a tag looks like, what names are acceptable for
elements, where attributes are placed, and so forth. Superficially, the markup in an
XML document looks much like that in an HTML document, but some crucial
differences [Link].
Most importantly, XML is a meta-markup language. That means it doesn't have a
fixed set of tags and elements that are always supposed to work for everyone in
3015 ServiceNow, Inc.'s Exhibit 1010
all areas of interest. Attempts to create a finite set of such tags are doomed to
failure. Instead, XML allows developers and writers to define the elements they
need as they need them. Chemists can use tags that describe elements, atoms,
bonds, reactions, and other items encountered in chemistry. Real estate agents can
use elements that describe apartments, rents, commissions, locations, and other
items needed in real estate. Musicians can use elements that describe quarter
notes, half notes, G clefs, lyrics, and other objects common in music. The X in
XML stands for Extensible. Extensible means that the language can be extended
and adapted to meet many different needs.
Although XML is flexible in the elements it allows to be defined, it is strict in
many other respects. It provides a grammar for XML documents that regulates
placement of tags, where tags appear, which element names are legal, how
attributes are attached to elements, and so forth. This grammar is specific enough
to allow development of XML parsers that can read and understand any XML
document. Documents that satisfy this grammar are said to be well-formed. Docu-
ments that are not well-formed are not allowed any more than a C program
containing a syntax error would be. XML processors reject documents that contain
well-formedness errors.
To enhance interoperability, individuals or organizations may agree to use only
certain tags. These tag sets are called XML applications. An XML application is not
a software application that uses XML, like Mozilla or Microsoft Word. Rather, it's
an application of XML to a particular domain, such as vector graphics or cooking.
The markup in an XML document describes the document's structure. It lets you
see which elements are associated with which other elements. In a well-designed
XML document, the markup also describes the document's semantics. For instance,
the markup can indicate that an element is a date, a person, or a bar code. In
well-designed XML applications, the markup says nothing about how the docu-
ment should be displayed. That is, it does not say that an element is bold,
italicized, or a list item. XML is a structural and semantic markup language, not a
presentation language:
The markup permitted in a particular XML application can be documented in a
document type definition (DTD). The DTD lists all legal markup and specifies
where and how the markup may be included in a document. Particular document
instances can be compared to the DTD. Documents that match the DTD are said
to be valid. Documents that do not match are invalid. Validity depends on the
DTD; whether a document is valid or invalid depends on which DTD you
compare it to.
Not all documents need to be valid. For many purposes, a well-formed document
is enough. DTDs are optional in XML. On the other hand, DTDs may not always
be sufficient. The DTD syntax is limited and does not allow you to make many
• A few XML applications, like XSL Formaning Objec[S, are designed LO describe text presema-
tion. However, these arc exceptions that prove the rule. Although XSL-FO describes presen-
tation, you'd never write an XSL-FO document directly. Instead, you'd write a more
scmamically marked-up XML document, then use an XSL Transformations stylesheet w
change the semantic-oriented Xlvll into presentation-oriented XJvll.
4 Chapter 1 - Introducing XML 016 ServiceNow, Inc.'s Exhibit 1010
useful statements such as, "This element contains a number" or "This string of text
is a date between 1974 and 2032." If you're writing programs to read XML docu-
ments, you may want to add code to verify statements like these, just as you
would if you were writing code to read a tab-delimited text file. The difference is
that XML parsers present you with the data in a much more convenient format to
work with and do more of the work for you before you have to resort to your
own custom code.
What XML Is Not
XML is a markup language, and only a markup language. It's important to
remember this fact. The XML hype has become so extreme that some people
expect XML to do everything up to, and including, washing the family dog.
First of aU, XML is not a programming language. There's no such thing as an XML
compiler that reads XML files and produces executable code. You might define a
scripting language that uses a native XML format and is interpreted by a binary
program, but even this application would be unusual.· XML can be used as an
instruction format for programs that make things happen. A traditional program,
for example, may read a text config file and take different actions, depending on
what it sees in the file. There's no reason why a config file can't be written in XML
instead of unstructured text. Indeed, some recent programs are beginning to use
XML config files. But in all cases tl}e program, not the XML document, takes
action. An XML document simply is. It does not do anything.
Furthermore, XML is not a network-transport protocol. XML, like HTML, won't
send data across the network. Ho~ever, data sent across the network using HTIP,
FTP, NFS, or some other protocol might be in an XML format. XML can be the
format for data sent across the network, but again, software outside the XML docu-
ment must actually do the sending.
Finally, to mention the example in which the hype most often obscures reality,
XML is not a database. You won't replace an Oracle or MySQL server with XML. A
database can contain XML data as a VARCHAR, a BLOB, or a custom XML
datatype; but the database itself is not an XML document. You can store XML data
in a database or on a server or retrieve data from a database in an XML format, but
to do so you need to run software written in a real programming language like C
or Java. To store XML in a database, software on the client side sends the XML
data to the server using an established network protocol like TCP/IP. Software on
the server side receives the Xl\1L data, parses it, and stores it in the database. To
retrieve an XML document from a database, you generally pass through a middle-
ware product like Enhydra that makes SQL queries against the database and
formats the result set as XML before returning it to the client. Indeed, some data-
bases may integrate this software code into their core server or provide plug-ins,
such as the Oracle XSQL servlet, to do it. XML serves very well as a ubiquitous,
platform-independent transport format in these scenarios. However, XML is not the
database and shouldn't be used as one.
• Al least one XML application, XSL Transformations, has been proven to be Turing complete.
017 ServiceNow, Inc.'s
What XML Exhibit
Offers 5 1010
Portable Data
XNIL offers the tantalizing possibility of truly cross-platform, long-term data
formats. It's long been the case that a document written by one piece of software
on one platform is not necessarily readable on a different platform, by a different
program on the same platform, or even by a future or past version of the same
software on the same platform. When the document can be read, all the informa-
tion may not necessarily come across. Much of the data from the original moon
landings in the late 1960s and early 1970s is now effectively lost. Even if you can
find a tape drive that reads the now obsolete tapes, nobody knows what format
the data on the tapes is stored in!
XML is an incredibly simple, well-documented, straightforward data format. XML
documents are text, and any tool that can read a text file can read an XML docu-
ment. Both XML data and markup are text, and the markup is present in the XML
file as tags. You don't have to wonder whether every eighth byte is random
padding, guess whether a four-byte quantity is a two's complement integer or an
IEEE 754 floating point number, or try to decipher which integer codes map to
which formatting properties. You can read the tag names directly to see exactly
what's in the document. Similarly, since tags define element boundaries, you aren't
likely to get tripped up by unexpected line ending conventions or the number of
spaces mapped to a tab. All the important details about the document's structure
are explicit. You don't have to reverse engineer the format or rely on question-
able, and often unavailable, documentation.
A few software vendors may want to lock in their users with undocumented,
proprietary binary file formats. However, in the long run we are all better off if we
can use the cleanly documented, well-understood, easy to parse, text-based
formats that XML provides. XML allows documents and data to move from one
system to another with a reasonable hope that the receiving system can make
sense out of it. Furthermore, validation lets the receiving side ensure that it gets
what it expects. Java promised portable code. XML delivers portable data. In many
ways, XML is the most portable and flexible document format designed since the
ASCII text file.
How XML Works
Example 1-1 shows a simple XML document. This particular XML document might
appear in an inventory control system or a stock database. It marks up the data
with tags and attributes describing the color, size, bar code number, manufac-
turer, and product name.
Example 1-1: An XML Document
<?xml version="l . O"?>
<product barcode::;: "2394287410">
<manufacturer>Verbat im</ manufacturer>
<name>DataLife MF 2HD</ name>
<quantity>lO</ quantity>
<size>3 . 5"</ size>
6 Chapte·r 1 - Introducing XML 018 ServiceNow, Inc.'s Exhibit 1010
Example 1-1: An XML Document (continued)
<color>black</color>
<description>floppy disks</description>
</product>
This document is text and might well be stored in a text file. You can edit this file
with any standard text editor, such as BBEdit, UltraEdit, Emacs, or vi. You do not
need a special X}.1L editor; in fact, we find that most general-purpose XML editors
are far more trouble than they're worth and much harder to use than a simple text
editor.
Then again, this document might not be a file at all. It might be a record in a
database. It might be assembled on the fly by a CGI query to a web server and
exist only in a computer's memory. It might even be stored in multiple files and
assembled at runtime. Even if it isn't in a file, however, the document is a text
document that can be read and transmitted by any software capable of reading
and transmitting text.
Programs that actually try to understand the contents of the XML document, that is,
do not merely treat it as any other text file, use an XML parser to read the docu-
ment. The parser is responsible for dividing the document into individual
elements, attributes, and other pieces. It passes the contents of the XML document
to the application piece by piece. If at any point the parser detects a violation of
Xl\llL rules, it reports the error to Llie application and stops parsing. In some cases
the parser may read past the original error in the document so it can detect and
report other errors that occur later in the document. However, once it has detected
the first error, it no longer passes along the contents of the elements and attributes
it encounters to the application.
Individual XML applications normally dictate precise rules about which elements
and attributes are allowed where. You wouldn't expect to find a G_Clef element
when reading a biology document, for instance. Some of these rules can be speci-
fied precisely using a DTD. A document may contain either the DTD itself or a
pointer to a URI where the DTD is found. Some XML parsers notice these details
and compare the document to its DTD as they read it to see if the document satis-
fies the specified constraints. Such a parser is called a validating parse1: A
violation of those constraints is a validity error, and the whole process of checking
a document against a DTD is called validation. If a validating parser finds a
validity error, it reports it to the application on whose behalf it parses the docu-
ment. This application can then decide whether it wishes to continue parsing the
document. However, validity errors, unlike well-formedness errors, are not neces-
sarily fatal; an application may choose to ignore them. Not all parsers are
validating parsers. Some merely check for well-formedness.
The application that receives data from the parser may be:
• A web browser, such as Netscape or Internet Explorer, that displays the docu-
ment to a reader
• A word processor, such as StarOffice Writer, that loads the XML document for
editing
• A database server, such as Oracle, that stores XML data in a database
019 ServiceNow,
HowInc.'s Exhibit7 1010
XML Works
• A drawing program, such as Corel Draw, that interprets XML as two-dimen-
sional coordinates for the contents of a picture
• A spreadsheet, such as Gnumeric, that parses XML to find numbers and func-
tions used in a calculation
• A personal finance program, such as Microsoft Money, that sees XML as a
bank statement
• A syndication program that reads the XML document and extracts the head-
lines for today's news
• A program that you wrote in Java, C, Python, or some other language that
does exactly what you want it to do
• Almost anything else
XML is an extremely flexible format for data. It can be used in all of these
scenarios and many more. These examples are real. In theory, any data that can
be stored in a computer can be stored in XML format. In practice, XML is suitable
for storing and exchanging any data that can be plausibly encoded as text. Its use
is unsuitable only for multimedia data, such as photographs, recorded sound,
video, and other very large bit sequences.
The Evolution of XML
XML is a descendant of the Standard Generalized Markup Language (SGML). The
language that would eventually become SGML was invented by Charles Goldfarb,
Ed Mosher, and Ray Lorie at IBM in the 1970s and developed by several hundred
people around the world until its eventual adoption as ISO standard 8879 in 1986.
SGML was intended to solve many of the same problems [Link] solves. It was and is
a semantic and structural markup language for text documents. SGML is extremely
powerful and achieved some success in the U.S. military and government, the
aerospace sector, and other domains tl1at needed ways of managing technical
documents that were tens of thousands of pages long efficiently.
SGML's biggest success was HTML, an SGML application. However, HTML is just
one [Link] application. It does not have or offer the full power of SGML. Since
[Link] restricts authors to a finite set of tags designed to describe web pages in a
fairly presentationally oriented way, it's really little more than a traditional markup
language that has been adopted by web browsers. It simply doesn't lend itself to
use beyond the single application of web page design. You would not use HTML
to exchange data between incompatible databases or send updated product cata-
logs to retailer sites, for example. HTML is useful for creating web pages, but it
isn't capable of much more than that.
The obvious choice for other applications that took advantage of the Internet, but
were not simple web pages, was SGML. SGML's main problem is its complexity.
The official SGML specification is more than 150 very technical pages. It covers
many special cases and unlikely scenarios. It is so complex that almost no soft-
ware has ever implemented it fully. Programs that implemented or relied on
different subsets of [Link] were often incompatible with one another. The special
feature one program considered essential would be considered extraneous fluff
and omitted by the next program.
8 Chapter 1 - Introducing XML 020 ServiceNow, Inc.'s Exhibit 1010
In 1996 Jon Bosak, Tim Bray, C. M. Sperberg-McQueen, James Clark, and several
others began work on a "lite" version of SGML. This version retained most of
SGML's power, but trimmed many featu res that were redundant, too complicated
to implement, confusing to end users, or that had simply not been proven useful
over the previous 20 years of experience with SGML. The result, in February 1998,
was XML 1.0, and it was an immediate success. Many developers who knew they
need ed a structural markup language but couldn't bring themselves to accept
SGML's complexity adopted X!VIL wholeheartedly. It was ultimately used in
domains ranging from legal court filings to hog farming.
However, XML 1.0 was just the beginning. The next standard out of the gate was
Names paces in XML, an effort to allow conflict-free use of markup from different
XML applications in the same document. A web page about books, for example,
could have a title element that referred to the page's title and title elements
that referred to the book's title, and the two would not conflict.
The Extensible Stylesheet Language, an XML application that transforms other XML
documents into a form that is viewable in web browsers, was the next develop-
ment. This language soon split into XSL Transformations (XSLn and XSL
Formatting Objects (XSL-FO). XSLT has become a general-purpose language for
transforming one XML document into another for web page display and other
purposes. XSL-FO is an XML application that describes the layout of both printed
and web pages. This application rivals PostScript for its power and expressiveness.
However, XSL is n ot the only option for styling XtviL documents. The Cascading
Stylesheet Language (CSS) was already in use for HTML documents when XML
was invented, and it was a reasonable fit to XML, as well. With the advent of CSS
Level 2, the W3C made styling XML documents an explicit goal for CSS and gave it
equal importance to HTML. The preexisting Document Style Sheet and Semantics
Language (DSSSL) was also adopted from its roots in the SGML world to style XML
documents for print and use on the Web.
The Extensible Linking Language (XLL) defined more powerful Linking constructs
that could connect XML documen ts in a hypertext network, vastly overpowering
HTML's A tag. It also divided into two separate standards: XI..ink, which described
connections between documents, and XPointer, which addressed the individual
parts of an XML document. At this point, it was noticed that both XPointer and
XSLT were developing fairly sophisticated, yet incompatible, syntaxes to do exactly
the same thing: identify particular elements of an XML document. Consequently,
the addressing parts of both specifications were split off and combined into a third
specification, XPath.
A similar phenomenon occurred when it was noticed that XML 1.0, XSLT, XML
Schemas, and the Document Object Model (DOM) all had similar, but subtly
different, conceptual models of the structure of an XML document. For instance,
XML 1.0 considers a document's root element as its root, while XSLT uses a more
abstract root that includes the root element and several o ther pieces. Thus the
W3C XML Core Working Group began work on an XML Information Set that all
these standards could rely on and refer to.
Another piece of the puzzle was a uniform interface for accessing the contents of
the XML document from inside a Java, JavaScript, or C++ program. The simplest
021 ServiceNow, Inc.'s
Tbe Evolution Exhibit
ofXML 9 1010
API was to merely treat the document as an object that contained other objects.
Indeed, work was already underway inside and outside the W3C to define such a
Document Object Model for HTML. Expanding thjs effort to cover XML was not
difficult.
Outside the W3C, Peter Murray-Rust, David Megginson, Tim Bray, and other
members of the xml-dev mailing list recognized that XML parsers, while all
compatible in the documents they could parse, were incompatible in their APis.
This observation led to the development of the Simple API for XML, SAX. SAX2
was released in 2000 to add greater configurability, namespace support, and
several optional features.
Development of extensions to the core XML specification continues. Future direc-
tions include:
XFragment
An effort to make sense out of XML document pieces that may not be consid-
ered well-formed documents in isolation.
XMLSchemas
An XML application that can describe the allowed content of documents
conforming to a particular XML vocabulary.
XHTML
A reformulation of HTML as a well-formed, modular, potentially valid XML
application.
XML Query Language
A language for finding the elements in a document that meet specified
criteria.
Canonical XML
A standard algorithm used for determining whether two XML documents are
the same after throwing away insignificant details, such as whether single or
double quotes are used around attribute values.
XML Signatures
A standard means of digitally signing XML documents, embedding signatures
in XML documents, and authenticating the resulting documents.
Many new extensions of XML remain to be invented. XML has proven itself a solid
foundation for many other technologies.
10 Chapter 1 - Introducing XML 022 ServiceNow, Inc.'s Exhibit 1010