XML进阶:Level 1 - XML简介

By Rob Sheldon, 2014/03/26 (首次发表于: 2012/09/20)

关于系列

本文属于进阶系列:XML进阶

自2003年以来,XML一直是SQL标准的一部分,许多动态管理视图也返回XML数据,所以,对于每一个数据库管理员而言,同样是至关重要的。现在,这个行业更多地用于由文档标记定义的数据,数据库开发人员和数据库管理员比以往任何时候都更了解XML技术和使用它的方式。 在这一系列文章中,罗伯特·谢尔顿(Robert Sheldon)将尽力把复杂的东西简单化。 注意:这一系列的文章已经可作为电子书获取。


可扩展标记语言(XML)首先在SQL Server的SQL Server 2000发行版中提供支持。然而,与XML相关的功能仅限于专注于映射关系数据和XML数据的数据管理功能。例如,SQL Server 2000中添加了FOR XML子句,可以将关系查询结果作为XML返回。

但是到添加了XML数据类型的SQL Server 2005发布时,支持XML就变得有意义起来。 XML数据类型允许将XML文档本身存储在使用该类型配置的列和变量中。数据类型还支持一组可用于检索和修改XML文档中特定组件的方法。

为了充分利用SQL Server支持的与XML相关的功能,你可能会发现对XML本身有一个基本的了解是很有用的。为此,XML进阶系列的第一部分将介绍XML是什么,并对构成XML文档的各种组件进行介绍。

XML概述

和超文本标记语言(HTML)相似,XML也是一种标记语言,它使用标签来描述和这些标签相关联的数据的性质。使XML可扩展的是它的自描述性质,也就是创建特定于XML文档中包含的数据值的标签。在HTML中,这些标签是预定义的。 (通过XML组件,XML的可扩展性将变得更加清晰。)

尽管XML具有可扩展性,但它仍然是一种标准化语言,必须符合由万维网联盟(W3C)定义的特定格式规则集。由于这种标准化与用于显示数据的HTML不同,这种语言被广泛地用于传输和存储数据。

XML可以轻松地在异构系统之间共享数据,而不管硬件,操作系统或应用程序类型如何,XML的普遍应用意味着数据可以更少的进行人为干预处理。同时可以控制数据的描述、排列和显示方式。

XML 组件

构成XML文档的主要组件和支配这些组件使用的规则通常非常简单,但为了使XML文档能够被SQL Server XML解析器正确处理,你必须严格遵守一些规则。

XML文档中主要包含两种类型的信息:要存储的数据和描述数据的标签。 标签由一组尖括号(<>)组成,它们包含描述与标签相关联的数据的描述性词或复合词(无空格)。 正是由于这些标签的自描述性质,XML通常被认为是元语言。

每个离散的存储数据片段都被封装在开始标签和结束标签中,如下例所示:

<Person>John Doe</Person>

在这种情况下,开始标签是<Person>,结束标签是</ Person>。 请注意,正斜杠位于结束标记中的标记描述之前。 正斜杠必须在所有结束标签之前,但标签的语言必须与开头标签相同,上面的示例是Person。 我可以选择一个除Person之外的名称,包括与人无关的名称,但是一个好的做法是始终提供最能描述打开和关闭标签中包含的数据的标签名称。 在这种情况下,标签描述了一个人的名字John Doe,因此标签名称为<Person>

同样的,标签和封闭的数据代表一个单一的元素。 但是,元素并不总是包含数据。 一个空的元素可以以两种方式之一呈现。 第一个是通过指定打开和关闭标签,但不包括数据,如下例所示:

<Person></Person> 

表示一个空白元素的另一种方法是只使用一组括号,但仍包括正斜杠:

<Person />

再者,只有当元素不包含值时,才能使用这个方法。 以后你在XML进阶系列中可能看到模式需要一个没有值的元素。 在这种情况下,可以使用缩写格式来表示元素的两个标签。

无论元素是否包含值,只要使用两个标签,开始和关闭标签必须完全匹配,精确到大小写(除了结束标签中的正斜杠)。 例如,以下元素在SQL Server XML解析器中生成错误,因为两个标签不能匹配:

<person> John Doe </ Person>

开始标签中的描述性文字都是小写的; 然而,结束标签中的描述性词应以大写字母开头。 开始和结束标签必须匹配,以被认为是适当的或者格式良好的XML。

但是你可以将元素嵌入到彼此中。 在以下示例中,我将<Person>元素的两个实例嵌入到<People>元素中:

<People>
<Person>John Doe</Person>
<Person>Jane Doe</Person>
</People>

请注意,每个<Person>元素本身都是完整的。 它包括打开和关闭标签及其附带的数据。 嵌入其他元素中的元素称为子元素( child),或在某些情况下称为subelements。 外部元素(在这种情况下为<People>)是父元素。 XML文档最高级别的父元素被认为是根元素。 所有XML文档必须有且只有一个根元素。 因此,上面示例中的<People>元素是两个<Person>元素的父元素,它是XML文档的根元素。

SQL Server还允许将XML片段存储在XML列或变量中。 片段是没有根元素的XML代码块,例如以下示例中显示的两个元素:

<Person>John Doe</Person>
<Place>Seattle, WA</Place>

元素必须仍然是格式良好的XML,也就是具有包含数据的匹配标签,但它们不一定是XML文档。 像以后你将在XML进阶系列中看到的那样,你可以指定只允许在XML列或变量中使用XML文档,但现在只要知道SQL Server可以区分和存储XML文档和片段即可。

在其他元素中嵌入元素时,必须确保子元素在完成父元素之前已经完成。 例如,在以下示例中,我在<Person>元素之前结束了<People>元素,这会导致SQL Server XML解析器生成错误:

<People><Person>John Doe</People></Person>

你必须确保无论多少级别包含嵌入元素,你的子元素都是完整的。 在下面的示例中,<FirstName><LastName>元素嵌入在每个<Person>元素中,<Person>元素嵌入在<People>元素中:

<People>
<Person>
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

在这种情况下,<Person>元素充当子元素和父元素。 但是请注意,每个嵌入的元素,无论级别如何,必须落在父元素的开始和结束标记之内。 例如,<FirstName><LastName>元素的第一个实例完全落在<Person>元素的第一个实例中,并且<Person>元素的两个实例完全落在<People>元素中,即 文档的根元素。

元素也可以具有与它们相关联的属性。 属性是可以赋值的属性。 该属性被定义为元素的开始标签的一部分。 在以下示例中,我已经将id属性添加到<Person>元素的每个实例中:

<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

如示例所示,属性由属性名称(在这种情况下为id),后跟等号,属性值由双引号括起来。因此,<Person>元素的第一个实例的id属性的值为1234,而<Person>元素的第二个实例的id属性的值为5678。

许多XML文档中包含的另一个组件是声明,它至少指定了文档符合的XML标准的版本。到目前为止,只有两个版本:1.0和1.1。如果使用XML 1.0,声明是不必要的;但是XML 1.1需要声明。因此,你应该知道如何在XML文档中包含声明。

如果你包含声明,则必须将其放在文档的开头,用<?打开标签,并使用?>关闭标签结束。另外,你必须包含xml关键字(小写)和版本属性(也是小写)。通常包括的另一个属性(尽管是可选的)是编码,它指定用于XML文档的字符编码。在下面的示例中,我包括一个声明,它指定版本1.0和UTF-8的编码,这意味着数据被存储为8位Unicode字符序列:

<?xml version="1.0" encoding="UTF-8"?>
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

您还可以向XML文档添加注释。 要做到这一点,只需在<! - 标签之前注释,并将其结束于 - >标签,就像我在下面的例子中所做的那样:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

正如你所看到的那样,我已经添加了评论当前客户端的列表,附加在注释标签中。 SQL Server XML解析器将忽略标签内的任何内容,因此你不仅可以使用注释功能提供有关XML文档及其数据的信息,还可以挂起保留不想作为文档进行处理的XML代码部分。

另一个使用XML时要考虑的是,当元素值出现时,某些字符将无法解析。 例如,您不能在元素的值中包含&符(&),就像我在以下示例中的<FavoriteBook>子元素中所做的那样:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime & Punishment</FavoriteBook>
</Person>
</People>

如果我尝试将此XML文档分配给XML列或变量,则<FavoriteBook>子元素将导致解析器生成错误,因为“Crime & Punishment”值包含&符号。 你必须使用实体引用替换此类型的字符,该实体引用会告知解析器保留原始打算的字符。 实体引用以&符号开头,以分号结尾,其间包含代表原始值的多字符代码。 对于&符号,实体参考应该是&amp;,我将在下面的例子中使用:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime &amp; Punishment</FavoriteBook>
</Person>
</People>

请注意,我已经用了&amp;实体参考。现在XML解析器处理<FavoriteBook>元素将没有任何问题。但是请注意,&符号不是唯一会产生错误的字符。 XML标准标识了应该用实体引用替换的五个字符,就像我在上面的例子中所做的那样:

  • 小于(<):替换为&lt;
  • 大于(>):替换为&gt;
  • 和号(&):替换&amp;
  • 撇号('):替换为'
  • 引号(“):替换为”

该示例提出的另一个问题是子元素不一定要从一个父实例到另一个父实例相同。你可以看到,<Person>元素的第一个实例只包含<FirstName><LastName>子元素,但<Person>元素的第二个实例包含<FirstName><LastName>子元素以及<FavoriteBook>元素。只要你的子元素形成良好,就可以包括描述和定义数据所需的任何元素。

总结

在这个级别,我们已经看到组成一个XML文档需要的主要组件。 元素作为所有XML文档的基本构建块,每个元素由开始标签和结束标签划定,数据值本身被包含在这两个标签之间。 元素可以彼此嵌入,但是一个元素 - 根 - 必须作为XML文档中所有其他元素的父元素。 元素还可以包括被定义为元素的开始标签的一部分的属性。

我们已经尽可能方便地了解了如何组织一个XML文档,但本课程的目的并不是培养如何创建这些类型的文档,而是提供XML的介绍以便大家能更有效地在SQL Server中使用XML工作 。 在下一节,我们将介绍如何在SQL Server中实现XML数据类型,以及如何将其分配给列和变量以便存储XML文档和XML片段。


原文

Stairway to XML: Level 1 - Introduction to XML

By Rob Sheldon, 2014/03/26 (first published: 2012/09/20)

The Series

This article is part of the Stairway Series: Stairway to >XML

XML has been part of the SQL Standard since 2003, and it is also essential for any DBA because so many of the dynamic management views return XML data. Now that the industry is more used to data defined by document markup, it is becoming more important than ever for Database Developers and DBAs to understand the technology and to know where it makes sense to use XML. In this series of articles, Robert Sheldon flexes his talent to make the complicated seem simple. Note: This series of articles is now available as an eBook.


Support for the eXtensible Markup Language (XML) was first introduced in SQL Server with the release of SQL Server 2000. However, XML-related features were limited to data management capabilities that focused on mapping relational and XML data. For example, SQL Server 2000 added the FOR XML clause, which lets you return relational query results as XML.

However, it wasn’t until the release of SQL Server 2005—when the XML data type was added—that support for XML got interesting. The XML data type lets you natively store XML documents in columns and variables configured with that type. The data type also supports a set of methods you can use to retrieve and modify specific components within the XML document.

To take full advantage of the XML-related features supported in SQL Server, you might find it useful to have a fundamental understanding of XML itself. To that end, this first Level of the Stairway to XML series explains what XML is and describes the various components that make up an XML document.

An Overview of XML

Similar to the HyperText Markup Language (HTML), XML is a markup language that uses tags to delineate and describe the nature of the data associated with those tags. What makes XML extensible is its self-describing nature, that is, you create tags that are specific to the data values contained in the XML document. In HTML, those tags are pre-defined. (XML’s extensible nature will become clearer as we work through the XML components.)

Despite its extensibility, XML is still a standardized language that must conform to a specific set of formatting rules, as defined by the World Wide Web Consortium (W3C). Because of this standardization, the language has been widely adopted in order to transport and store data, unlike HTML, which is used to display data. XML makes it possible to easily share data among heterogeneous systems, regardless of hardware, operating system, or application type, and XML’s universal adoption means that data can be processed with little human intervention. At the same time, you can control how the data is described, while also controlling how the data is ordered and displayed.

XML Components

The primary components that make up an XML document—and the rules that govern the use of >those components—are generally very straightforward, but you must adhere strictly to these rules in order for an XML document to be properly processed by the SQL Server XML parser.

There are primarily two types of information included in an XML document: the data to be stored and the tags that describe the data. A tag is made up of a set of angle brackets (< >) that enclose a descriptive word or compound word (no spaces) that describes the data associated with the tag. It’s because of the self-describing nature of these tags that XML is often considered a meta-language.

Each discrete piece of stored data is enclosed in an opening tag and a closing tag, as shown in the following example:

<Person>John Doe</Person>

In this case, the opening tag is <Person>, and the closing tag is </Person>. Notice that a forward slash precedes the tag description in the end tag. A forward slash must precede all end tags, but the language of the tag must be the same as the opening tag, which in the example above is Person. I could have chosen a name other than Person, including a name that has nothing to do with people, but a good practice is to always provide tag names that best describe the data enclosed in the opening and closing tags. In this case, the tags are describing the name of a person, John Doe, thus the tag name <Person>.

Together, the tags and enclosed data represent a single element. However, an element does not always have to contain data. An empty element can be rendered in one of two ways. The first is by specifying the opening and closing tags, but including no data, as I do in the following example:

<Person></Person> 

Another way to represent an empty element is use only one set of brackets, but still include the forward slash:

<Person />

Again, this method can be used only when an element contains no value. As you’ll see later in the Stairway to XML series, a schema might require an element for which there is no value. In that case, you can use the shortened format to represent the both tags of the element.

Whether or not an element contains a value, whenever both tags are used, the opening and closing tags must match exactly, down to the capitalization (except for the forward slash in the closing tag). For instance, the following element generates an error in the SQL Server XML parser because the case is different between the two tags:

<person>John Doe</Person> 

The descriptive word in the opening tag is all lowercase; however, the descriptive word in the closing tag begins with a capital letter. The opening and closing tags must match to be considered proper, or well formed, XML.

You can, however, embed elements within each other. In the following example, I embed two instances of the <Person> element within the <People> element:

<People>
<Person>John Doe</Person>
<Person>Jane Doe</Person>
</People> 

Notice that each <Person> element is complete in itself. It includes the opening and closing tags and the data they enclose. Elements embedded in other elements are referred to as child elements or, in some cases, subelements. The outer element, in this case, <People>, is the parent element. The parent element at the highest level of an XML document is considered the root element. All XML documents must have one, and only one, root element. So the <People> element in the example above is the parent element to the two <Person> elements, and it is the root element for the XML document.

SQL Server also permits you to store XML fragments in an XML column or variable. A fragment is a chunk of XML code without a root element, such as the two elements shown in the following example:

<Person>John Doe</Person>
<Place>Seattle, WA</Place> 

The elements must still be well formed XML, that is, have matching tags that enclose the data, but they don’t have to be an XML document. As you’ll see later in the Stairway to XML series, you can specify that only XML documents be permitted in an XML column or variable, but for now just know that SQL Server distinguishes between XML documents and fragments and can store both.

When you embed elements within other elements, you must ensure that the child elements are complete before you end the parent element. For instance, in the following example, I end the <People> element before the <Person> element, which causes the SQL Server XML parser to generate an error:

<People><Person>John Doe</People></Person> 

You must ensure that your child elements are complete no matter how many levels contain embedded elements. In this following example, the <FirstName> and <LastName> elements are embedded in each <Person> element, and the <Person> elements are embedded in the <People> element:

<People>
<Person>
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People> 

In this case, the <Person> elements act as both child and parent elements. Notice, however, that each embedded element, regardless of the level, falls completely within the opening and closing tags of the parent element. For example, the first instances of the <FirstName> and <LastName> elements fall completely within the first instance of the <Person> element, and the two instances of the <Person> elements fall completely within the <People> element, which is the document’s root element.

Elements can also have attributes associated with them. An attribute is a property to which you can assign a value. The attribute is defined as part of the element’s opening tag. In the following example, I’ve added the id attribute to each instance of the <Person> element:

<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People> 

As the example demonstrates, an attribute consists of the attribute name (in this case, id), followed by an equal sign and the attribute value, enclosed in double quotes. So the id attribute for the first instance of the <Person> element has a value of 1234, and the id attribute for the second instance of the <Person> element has a value of 5678.

Another component contained in many XML documents is the declaration, which at a minimum specifies the version of the XML standard that the document conforms to. To date, there are only two versions: 1.0 and 1.1. If using XML 1.0, the declaration is not necessary; however, XML 1.1 requires one. For that reason, you should be aware of how to include a declaration in your XML document.

If you include a declaration, you must place it at the beginning of the document, start the declaration with the <? opening tag, and end it with the ?> closing tag. In addition, you must include the xml keyword (lowercase) and the version attribute (also lowercase). Another attribute commonly included, although optional, is encoding, which specifies the character encoding used for the XML document. In the following example, I include a declaration that specifies version
1.0 and an encoding of UTF-8, which means the data is stored as a sequence of 8-bit Unicode characters:

<?xml version="1.0" encoding="UTF-8"?>
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People> 

You can also add comments to your XML documents. To do so, simply precede the comment with the <!-- tag and end it with the --> tag, as I’ve done in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People> 

As you can see, I’ve added the comment A list of current clients, which I’ve enclosed in the comment tags. The SQL Server XML parser will ignore anything within the tags, so you can use the commenting feature not only to provide information about the XML document and its data, but also to preserve parts of the XML code that you want to hang on to but you don’t want to have processed as part of the document.

Another consideration when working with XML is that certain characters cannot be parsed when they appear in element values. For example, you cannot include an ampersand (&) in an element’s value, as I’ve done in the <FavoriteBook> child element in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime & Punishment</FavoriteBook>
</Person>
</People> 

If I try to assign this XML document to an XML column or variable, the <FavoriteBook> child element will cause the parser to generate an error because the value Crime & Punishment includes the ampersand. You must replace this type of character with an entity reference that tells the parser to preserve the character as it is originally intended. An entity reference begins with an ampersand and ends with a semi-colon and in between includes a multi-character code that represents the original value. For an ampersand, the entity reference should be &amp;, which I use in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime &amp; Punishment</FavoriteBook>
</Person>
</People>

Notice that I’ve replaced the ampersand with the &amp; entity reference. Now the XML parser will handle the <FavoriteBook> element with no problem. But note that the ampersand is not the only character that will generate an error. The XML standard identifies five characters that should be replaced with entity references, as I’ve done in the above example:

  • Less than (<): replace with <
  • Greater than (>): replace with >
  • Ampersand (&): replace with &
  • Apostrophe ('): replace with '
  • Quotation mark ("): replace with "

Another issue that the example raises is the fact that the child elements do not have to be the same from one parent instance to the next. As you can see, the first instance of the <Person> element contains only the <FirstName> and <LastName> child elements, but the second instance of the <Person> element contains the <FirstName> and <LastName> child elements, as well as the <FavoriteBook> element. As long as your child elements are well formed, you can include whatever elements necessary to delineate and define your data.

Summary

In this Level, we’ve looked at the primary components that make up an XML document. Elements serve as the basic building blocks for all XML documents, with each element being delineated by an opening tag and a closing tag and the data value itself being enclosed between those two tags. Elements can be embedded within each other, but one element—the root—must act as the parent to all other elements in an XML document. An element can also include attributes, which are defined as part of an element’s opening tag.

As handy as it might be to know how to put together an XML document, the purpose of this Level has not been to train you in how to create these types of documents, but rather to provide an introduction to XML so you can more effectively work with XML in SQL Server. In the next Level, we’ll look at how the XML data type is implemented in SQL Server and how it can be assigned to columns and variables in order to store both XML documents and XML fragments.


作者:Judenpech原文地址:https://segmentfault.com/a/1190000011294734

%s 个评论

要回复文章请先登录注册