By now, you have done a lot of experimenting with the nonvalidating
parser. It's time to have a look at the validating parser to find out
what happens when you use it to parse the sample presentation.
You need to understand about two things about the validating parser at the outset:
- A schema or document type definition (DTD) is required.
- Because the schema or DTD is present, the
ignorableWhitespace method is invoked whenever possible.
Configuring the Factory
The first step is to modify the Echo program so that it uses the validating parser instead of the nonvalidating parser.
Note: The code in this section is contained in Echo10.java.
To use the validating parser, make the following highlighted changes:
Here, you configure the factory so that it will produce a validating parser when newSAXParser is invoked. To configure it to return a namespace-aware parser, you can also use setNamespaceAware(true).
Sun's implementation supports any combination of configuration options.
(If a combination is not supported by a particular implementation, it
is required to generate a factory configuration error.)
Validating with XML Schema
Although a full treatment of
XML Schema is beyond the scope of this tutorial, this section shows you
the steps you take to validate an XML document using an existing schema
written in the XML Schema language. (To learn more about XML Schema,
you can review the online tutorial, XML Schema Part 0: Primer, at http://www.w3.org/TR/xmlschema-0/.
You can also examine the sample programs that are part of the JAXP
download. They use a simple XML Schema definition to validate personnel
data stored in an XML file.)
Note: There are multiple
schema-definition languages, including RELAX NG, Schematron, and the
W3C "XML Schema" standard. (Even a DTD qualifies as a "schema,"
although it is the only one that does not use XML syntax to describe
schema constraints.) However, "XML Schema" presents us with a
terminology challenge. Although the phrase "XML Schema schema" would be
precise, we'll use the phrase "XML Schema definition" to avoid the
appearance of redundancy.
To be notified of validation errors in an XML document, the parser
factory must be configured to create a validating parser, as shown in
the preceding section. In addition, the following must be true:
- The appropriate properties must be set on the SAX parser.
- The appropriate error handler must be set.
- The document must be associated with a schema.
Setting the SAX Parser Properties
It's helpful to start by defining the constants you'll use when setting the properties:
Next, you configure the parser factory to generate a parser that is namespace-aware as well as validating:
You'll learn more about namespaces in Validating with XML Schema.
For now, understand that schema validation is a namespace-oriented
process. Because JAXP-compliant parsers are not namespace-aware by
default, it is necessary to set the property for schema validation to
work.
The last step is to configure
the parser to tell it which schema language to use. Here, you use the
constants you defined earlier to specify the W3C's XML Schema language:
In the process, however, there is an extra error to handle. You'll take a look at that error next.
Setting Up the Appropriate Error Handling
In addition to the error
handling you've already learned about, there is one error that can
occur when you are configuring the parser for schema-based validation.
If the parser is not 1.2-compliant and therefore does not support XML
Schema, it can throw a SAXNotRecognizedException.
To handle that case, you wrap the setProperty() statement in a try/catch block, as shown in the code highlighted here:
Associating a Document with a Schema
Now that the program is ready to validate the data using an XML Schema
definition, it is only necessary to ensure that the XML document is
associated with one. There are two ways to do that:
- By including a schema declaration in the XML document
- By specifying the schema to use in the application
Note: When the application specifies the schema to use, it overrides any schema declaration in the document.
To specify the schema definition in the document, you create XML such as this:
The first attribute defines the XML namespace (xmlns) prefix, xsi, which stands for XML Schema instance. The second line specifies the schema to use for elements in the document that do not have a namespace prefix--that is, for the elements you typically define in any simple, uncomplicated XML document.
Note: You'll learn about namespaces in Validating with XML Schema.
For now, think of these attributes as the "magic incantation" you use
to validate a simple XML file that doesn't use them. After you've
learned more about namespaces, you'll see how to use XML Schema to
validate complex documents that use them. Those ideas are discussed in Validating with Multiple Namespaces.
You can also specify the schema file in the application:
Now that you know how to use an XML Schema definition, we'll turn to
the kinds of errors you can see when the application is validating its
incoming data. To do that, you'll use a document type definition (DTD)
as you experiment with validation.
Experimenting with Validation Errors
To see what happens when the XML document does not specify a DTD, remove the DOCTYPE statement from the XML file and run the Echo program on it.
Note: The output shown here is contained in Echo10-01.txt. (The browsable version is Echo10-01.html.)
The result you see looks like this:
Note: This message was
generated by the JAXP 1.2 libraries. If you are using a different
parser, the error message is likely to be somewhat different.
This message says that the root element of the document must match the element specified in the DOCTYPE
declaration. That declaration specifies the document's DTD. Because you
don't yet have one, it's value is null. In other words, the message is
saying that you are trying to validate the document, but no DTD has
been declared, because no DOCTYPE declaration is present.
So now you know that a DTD is a requirement for a valid document. That
makes sense. What happens when you run the parser on your current
version of the slide presentation, with the DTD specified?
Note: The output shown here is produced using slideSample07.xml, as described in Referencing Binary Entities. The output is contained in Echo10-07.txt. (The browsable version is Echo10-07.html.)
This time, the parser gives a different error message:
This message says that the element found at line 29 (<item>) does not match the definition of the <slide> element in the DTD. The error occurs because the definition says that the slide element requires a title. That element is not optional, and the copyright slide does not have one. To fix the problem, add a question mark to make title an optional element:
Now what happens when you run the program?
Note: You could also remove the copyright slide, producing the same result shown next, as reflected in Echo10-06.txt. (The browsable version is Echo10-06.html.)
The answer is that everything runs fine until the parser runs into the <em>
tag contained in the overview slide. Because that tag is not defined in
the DTD, the attempt to validate the document fails. The output looks
like this:
The error message identifies the part of the DTD that caused validation to fail. In this case it is the line that defines an item element as (#PCDATA | item).
As an exercise, make a copy of the file and remove all occurrences of <em>
from it. Can the file be validated now? (In the next section, you'll
learn how to define parameter entries so that we can use XHTML in the
elements we are defining as part of the slide presentation.)
Error Handling in the Validating Parser
It is important to recognize
that the only reason an exception is thrown when the file fails
validation is as a result of the error-handling code you entered in the
early stages of this tutorial. That code is reproduced here:
If that exception is not
thrown, the validation errors are simply ignored. Try commenting out
the line that throws the exception. What happens when you run the
parser now?
In general, a SAX parsing error
is a validation error, although you have seen that it can also be
generated if the file specifies a version of XML that the parser is not
prepared to handle. Remember that your application will not generate a
validation exception unless you supply an error handler such as the one
here.
|