Sunday, March 22, 2009

Extensible Markup Language

1. What is XML?

XML is the Extensible Markup Language. It improves the फुन्क्शनालिटी of the Web by letting you identify your information in a more accurate,flexible, and adaptable way। It is extensible because it is not a fixed format like HTML (which is a single, predefined markup language)। Instead, XML is actually a meta language—a language for देस्क्रिबिंग other languages—which lets you design your own markup लंगुअगेस for limitless different types of documents. XML can do this बेकाउसे it’s written in SGML, the international standard meta language फॉर text document markup (ISO 8879).

2. What is a markup language?

A markup language is a set of words and symbols for देस्क्रिबिंग the identity of pieces of a document (for example ‘this इस a paragraph’, ‘this is a heading’, ‘थिस is a list’, ‘this is the caption of this figure’,etc)। Programs can use this with a style sheet to create output for screen, print, audio, video, Braille, etc।

Some markup languages (eg those used in word processors) only देस्क्रिबे appearances (’this is italics’, ‘this is bold’), but this method can only be used for display, and is not नोर्मल्ली re-usable for anything else.

3. Where should I use XML?

Its goal is to enable generic SGML to be served, received, एंड processed on the Web in the way that is now possible with HTML।XML has been designed for ease of implementation and for interoperability with both SGML and HTML। Despite early attempts, browsers never allowed other SGML, ओनली HTML (although there were plugins), and they allowed it (even एन्कोउरागेद it) to be corrupted or broken, which held development back for ओवर a decade by making it impossible to program for it reliably. क्स्म्ल fixes that by making it compulsory to stick to the rules, and बी making the rules much simpler than SGML.

But XML is not just for Web pages: in fact it’s very rarely उसेद for Web pages on its own because browsers still don’t provide रेलिअब्ले support for formatting and transforming it. Common uses for क्स्म्ल include:
Information identification because you can define your own markup,you can define meaningful names for all your information items। Information storage because XML is portable and non-proprietary,it can be used to store textual information across any platform। Because it is backed by an international standard, it will रमें accessible and processable as a data format. Information स्त्रुक्टुरे XML can therefore be used to store and identify any kind of (hierarchical) information structure, especially for long, deep, or complex दोचुमेंट sets or data sources, making it ideal for an information-मैनेजमेंट back-end to serving the Web. This is its most common Web application,
with a transformation system to serve it as HTML until such टाइम as browsers are able to handle XML consistently. Publishing थे original goal of XML as defined in the quotation at the start ऑफ़ this section। Combining the three previous topics (identity, storage,structure) means it is possible to get all the benefits of robust document management and control (with XML) and publish to the वेब (as HTML) as well as to paper (as PDF) and to other formats (एग Braille, Audio, etc) from a single source document by using थे appropriate stylesheets. Messaging and data transfer XML is अल्सो very heavily used for enclosing or encapsulating information इन order to pass it between different computing systems which वौल्ड otherwise be unable to communicate. By providing a lingua फ्रांका or data identity and structure, it provides a common envelope फॉर inter-process communication (messaging). Web services Building ओं all of these, as well as its use in browsers, machine-प्रोसस्सब्ले data can be exchanged between consenting systems, where before आईटी was only comprehensible by humans (HTML). Weather services, e-कामर्स sites, blog newsfeeds, AJaX sites, and thousands of other data-एक्सचेंज services use XML for data management and transmission, and the वेब browser for display and interaction.


4. Why is XML such an important development?

It removes two constraints which were holding back Web developments:
1. dependence on a single, inflexible document type (HTML) व्हिच was being much abused for tasks it was never designed for;

2. the complexity of full SGML, whose syntax allows many पोवेर्फुल but hard-to-program options।XML allows the flexible development of user-defined document types।It provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data बोथ on and off the Web; and it removes the more complex options of SGML,making it easier to program for।

5. Describe the differences between XML and HTML.

It’s amazing how many developers claim to be proficient प्रोग्रम्मिंग with XML, yet do not understand the basic differences between क्स्म्ल and HTML. Anyone with a fundamental grasp of XML should be अबले describe some of the main differences outlined in the table below.

XML
User definable tags

Content driven
End tags required for well formed documents
Quotes required around attributes values
Slash required in empty tags

HTML
Defined set of tags designed for web display

Format driven
End tags not required
Quotes not required
Slash not required

6. Describe the role that XSL can play when dynamically
generating HTML pages from a relational database.

Even if candidates have never participated in a project involving
this type of architecture, they should recognize it as one of the
common uses of XML. Querying a database and then formatting the
result set so that it can be validated as an XML document allows
developers to translate the data into an HTML table using XSLT rules.
Consequently, the format of the resulting HTML table can be modified
without changing the database query or application code since the
document rendering logic is isolated to the XSLT rules.

7. What is SGML?

SGML is the Standard Generalized Markup Language (ISO 8879:1986),
the international standard for defining descriptions of the structure
of different types of electronic document.

SGML is very large, powerful, and complex. It has been in heavy
industrial and commercial use for nearly two decades, and there
is a significant body of expertise and software to go with it.
XML is a lightweight cut-down version of SGML which keeps enough
of its functionality to make it useful but removes all the optional
features which made SGML too complex to program for in a Web environment.

8. Aren’t XML, SGML, and HTML all the same thing?

Not quite; SGML is the mother tongue, and has been used for describing
thousands of different document types in many fields of human activity,
from transcriptions of ancient Irish manuscripts to the technical
documentation for stealth bombers, and from patients’ clinical records
to musical notation. SGML is very large and complex, however, and
probably overkill for most common office desktop applications.

XML is an abbreviated version of SGML, to make it easier to use
over the Web, easier for you to define your own document types,
and easier for programmers to write programs to handle them. It
omits all the complex and less-used options of SGML in return for
the benefits of being easier to write applications for, easier to
understand, and more suited to delivery and interoperability over
the Web. But it is still SGML, and XML files may still be processed
in the same way as any other SGML file (see the question on XML
software).
HTML is just one of many SGML or XML applications—the one
most frequently used on the Web.
Technical readers may find it more useful to think of XML as being
SGML– rather than HTML++.

9. Who is responsible for XML?

XML is a project of the World Wide Web Consortium (W3C), and the
development of the specification is supervised by an XML Working
Group. A Special Interest Group of co-opted contributors and experts
from various fields contributed comments and reviews by email.
XML is a public format: it is not a proprietary development of any
company, although the membership of the WG and the SIG represented
companies as well as research and academic institutions. The v1.0
specification was accepted by the W3C as a Recommendation on Feb
10, 1998
.

10. Why is XML such an important development?

It removes two constraints which were holding back Web developments:

1. dependence on a single, inflexible document type (HTML) which
was being much abused for tasks it was never designed for;
2. the complexity of full question A.4, SGML, whose syntax allows
many powerful but hard-to-program options.
XML allows the flexible development of user-defined document types.
It provides a robust, non-proprietary, persistent, and verifiable
file format for the storage and transmission of text and data both
on and off the Web; and it removes the more complex options of SGML,
making it easier to program for.

11. Give a few examples of types of applications that can
benefit from using XML.

There are literally thousands of applications that can benefit
from XML technologies. The point of this question is not to have
the candidate rattle off a laundry list of projects that they have
worked on, but, rather, to allow the candidate to explain the rationale
for choosing XML by citing a few real world examples. For instance,
one appropriate answer is that XML allows content management systems
to store documents independently of their format, which thereby
reduces data redundancy. Another answer relates to B2B exchanges
or supply chain management systems. In these instances, XML provides
a mechanism for multiple companies to exchange data according to
an agreed upon set of rules. A third common response involves wireless
applications that require WML to render data on hand held devices.

12. What is DOM and how does it relate to XML?

The Document Object Model (DOM) is an interface specification maintained
by the W3C DOM Workgroup that defines an application independent
mechanism to access, parse, or update XML data. In simple terms
it is a hierarchical model that allows developers to manipulate
XML documents easily Any developer that has worked extensively with
XML should be able to discuss the concept and use of DOM objects
freely. Additionally, it is not unreasonable to expect advanced
candidates to thoroughly understand its internal workings and be
able to explain how DOM differs from an event-based interface like
SAX.

13. What is SOAP and how does it relate to XML?

The Simple Object Access Protocol (SOAP) uses XML to define a protocol
for the exchange of information in distributed computing environments.
SOAP consists of three components: an envelope, a set of encoding
rules, and a convention for representing remote procedure calls.
Unless experience with SOAP is a direct requirement for the open
position, knowing the specifics of the protocol, or how it can be
used in conjunction with HTTP, is not as important as identifying
it as a natural application of XML.

14. Why not just carry on extending HTML?

HTML was already overburdened with dozens of interesting but incompatible
inventions from different manufacturers, because it provides only
one way of describing your information.
XML allows groups of people or organizations to question C.13, create
their own customized markup applications for exchanging information
in their domain (music, chemistry, electronics, hill-walking, finance,
surfing, petroleum geology, linguistics, cooking, knitting, stellar
cartography, history, engineering, rabbit-keeping, question C.19,
mathematics, genealogy, etc).
HTML is now well beyond the limit of its usefulness as a way of
describing information, and while it will continue to play an important
role for the content it currently represents, many new applications
require a more robust and flexible infrastructure.

15. Why should I use XML?

Here are a few reasons for using XML (in no particular order).
Not all of these will apply to your own requirements, and you may
have additional reasons not mentioned here (if so, please let the
editor of the FAQ know!).
* XML can be used to describe and identify information accurately
and unambiguously, in a way that computers can be programmed to
‘understand’ (well, at least manipulate as if they could
understand).

* XML allows documents which are all the same type to be created
consistently and without structural errors, because it provides
a standardized way of describing, controlling, or allowing/disallowing
particular types of document structure. [Note that this has absolutely
nothing whatever to do with formatting, appearance, or the actual
text content of your documents, only the structure of them.]
* XML provides a robust and durable format for information storage
and transmission. Robust because it is based on a proven standard,
and can thus be tested and verified; durable because it uses plain-text
file formats which will outlast proprietary binary ones.
* XML provides a common syntax for messaging systems for the exchange
of information between applications. Previously, each messaging
system had its own format and all were different, which made inter-system
messaging unnecessarily messy, complex, and expensive. If everyone
uses the same syntax it makes writing these systems much faster
and more reliable.
* XML is free. Not just free of charge (free as in beer) but free
of legal encumbrances (free as in speech). It doesn’t belong to
anyone, so it can’t be hijacked or *. And you don’t have to
pay a fee to use it (you can of course choose to use commercial
software to deal with it, for lots of good reasons, but you don’t
pay for XML itself).
* XML information can be manipulated programmatically (under machine
control), so XML documents can be pieced together from disparate
sources, or taken apart and re-used in different ways. They can
be converted into almost any other format with no loss of information.
* XML lets you separate form from content. Your XML file contains
your document information (text, data) and identifies its structure:
your formatting and other processing needs are identified separately
in a style sheet or processing system. The two are combined at output
time to apply the required formatting to the text or data identified
by its structure (location, position, rank, order, or whatever).

16. Can you walk us through the steps necessary to parse
XML documents?

Superficially, this is a fairly basic question. However, the point
is not to determine whether candidates understand the concept of
a parser but rather have them walk through the process of parsing
XML documents step-by-step. Determining whether a non-validating
or validating parser is needed, choosing the appropriate parser,
and handling errors are all important aspects to this process that
should be included in the candidate’s response.

17. Give some examples of XML DTDs or schemas that you
have worked with.

Although XML does not require data to be validated against a DTD,
many of the benefits of using the technology are derived from being
able to validate XML documents against business or technical architecture
rules. Polling for the list of DTDs that developers have worked
with provides insight to their general exposure to the technology.
The ideal candidate will have knowledge of several of the commonly
used DTDs such as FpML, DocBook, HRML, and RDF, as well as experience
designing a custom DTD for a particular project where no standard
existed.

18. Using XSLT, how would you extract a specific attribute
from an element in an XML document?


Successful candidates should recognize this as one of the most
basic applications of XSLT. If they are not able to construct a
reply similar to the example below, they should at least be able
to identify the components necessary for this operation: xsl:template
to match the appropriate XML element, xsl:value-of to select the
attribute value, and the optional xsl:apply-templates to continue
processing the document.

Extract Attributes from XML Data
Example 1.

Attribute Value:





19. When constructing an XML DTD, how do you create an
external entity reference in an attribute value?

Every interview session should have at least one trick question.
Although possible when using SGML, XML DTDs don’t support defining
external entity references in attribute values. It’s more important
for the candidate to respond to this question in a logical way than
than the candidate know the somewhat obscure answer.

20. How would you build a search engine for large volumes
of XML data?

The way candidates answer this question may provide insight into
their view of XML data. For those who view XML primarily as a way
to denote structure for text files, a common answer is to build
a full-text search and handle the data similarly to the way Internet
portals handle HTML pages. Others consider XML as a standard way
of transferring structured data between disparate systems. These
candidates often describe some scheme of importing XML into a relational
or object database and relying on the database’s engine for searching.
Lastly, candidates that have worked with vendors specializing in
this area often say that the best way the handle this situation
is to use a third party software package optimized for XML data.

21. What is the difference between XML and C or C++ or
Java? Updated

C and C++ (and other languages like FORTRAN, or Pascal, or Visual
Basic, or Java or hundreds more) are programming languages with
which you specify calculations, actions, and decisions to be carried
out in order:
mod curconfig[if left(date,6) = "01-Apr",
t.put "April googlel!",
f.put days('31102005','DDMMYYYY') -
days(sdate,'DDMMYYYY')
" more shopping days to Samhain"];

XML is a markup specification language with which you can design
ways of describing information (text or data), usually for storage,
transmission, or processing by a program. It says nothing about
what you should do with the data (although your choice of element
names may hint at what they are for):

update=”2001-11-22″>
Camshaft end bearing retention circlip


y=”226″/> Ringtown
Fasteners Ltd

Angle-nosed insertion tool id=”GH25″/> is required for the removal

and replacement of this part.


On its own, an SGML or XML file (including HTML) doesn’t do anything.
It’s a data format which just sits there until you run a program
which does something with it.

22. Does XML replace HTML?

No. XML itself does not replace HTML. Instead, it provides an alternative
which allows you to define your own set of markup elements. HTML
is expected to remain in common use for some time to come, and the
current version of HTML is in XML syntax. XML is designed to make
the writing of DTDs much simpler than with full SGML. (See the question
on DTDs for what one is and why you might want one.)

23. Do I have to know HTML or SGML before I learn XML?


No, although it’s useful because a lot of XML terminology and practice
derives from two decades’ experience of SGML.
Be aware that ‘knowing HTML’ is not the same as ‘understanding
SGML’. Although HTML was written as an SGML application, browsers
ignore most of it (which is why so many useful things don’t work),
so just because something is done a certain way in HTML browsers
does not mean it’s correct, least of all in XML.

24. What does an XML document actually look like (inside)?


The basic structure of XML is similar to other applications of
SGML, including HTML. The basic components can be seen in the following
examples. An XML document starts with a Prolog:
1. The XML Declaration which specifies that this is an XML document;
2. Optionally a Document Type Declaration which identifies the type
of document and says where the Document Type Description (DTD) is
stored;

The Prolog is followed by the document instance:
1. A root element, which is the outermost (top level) element (start-tag
plus end-tag) which encloses everything else: in the examples below
the root elements are conversation and titlepage;
2. A structured mix of descriptive or prescriptive elements enclosing
the character data content (text), and optionally any attributes
(’name=value’ pairs) inside some start-tags.
XML documents can be very simple, with straightforward nested markup
of your own design:




Hello, world!
Stop the planet, I want to get
off!


Or they can be more complicated, with a Schema or question C.11,
Document Type Description (DTD) or internal subset (local DTD changes
in [square brackets]), and an arbitrarily complex nested structure:


SYSTEM “http://www.google.bar/dtds/typo.dtd”
[]>



size=”24/30″>Hello, world!


decoration is hand-colored, presumably
by the author –>

type=”URI” alignment=”centered”/>


style=”italic”>Vitam capias




Or they can be anywhere between: a lot will depend on how you want
to define your document type (or whose you use) and what it will
be used for. Database-generated or program-generated XML documents
used in e-commerce is usually unformatted (not for human reading)
and may use very long names or values, with multiple redundancy
and sometimes no character data content at all, just values in attributes:

ORDER-UPDATE-ISSUE=”193E22C2-EAF3-11D9-9736-CAFC705A30B3″
ORDER-UPDATE-DATE=”2005-07-01T15:34:22.46″ ORDER-UPDATE-DESTINATION=”6B197E02-EAF3-11D9-85D5-997710D9978F”
ORDER-UPDATE-ORDERNO=”8316ADEA-EAF3-11D9-9955-D289ECBC99F3″>

ORDER-UPDATE-QUANTITY=”2000″/>




25. How does XML handle white-space in my documents?

All white-space, including linebreaks, TAB characters, and normal
spaces, even between ’structural’ elements where no
text can ever appear, is passed by the parser unchanged to the application
(browser, formatter, viewer, converter, etc), identifying the context
in which the white-space was found (element content, data content,
or mixed content, if this information is available to the parser,
eg from a DTD or Schema). This means it is the application’s responsibility
to decide what to do with such space, not the parser’s:
* insignificant white-space between structural elements (space which
occurs where only element content is allowed, ie between other elements,
where text data never occurs) will get passed to the application
(in SGML this white-space gets suppressed, which is why you can
put all that extra space in HTML documents and not worry about it)
* significant white-space (space which occurs within elements which
can contain text and markup mixed together, usually mixed content
or PCDATA) will still get passed to the application exactly as under
SGML. It is the application’s responsibility to handle it correctly.

The parser must inform the application that white-space has occurred
in element content, if it can detect it. (Users of SGML will recognize
that this information is not in the ESIS, but it is in the Grove.)


<br />My title for <br />Chapter 1. <br /> <br />

text



In the example above, the application will receive all the pretty-printing
linebreaks, TABs, and spaces between the elements as well as those
embedded in the chapter title. It is the function of the application,
not the parser, to decide which type of white-space to discard and
which to retain. Many XML applications have configurable options
to allow programmers or users to control how such white-space is
handled.

26. Which parts of an XML document are case-sensitive?


All of it, both markup and text. This is significantly different
from HTML and most other SGML applications. It was done to allow
markup in non-Latin-alphabet languages, and to obviate problems
with case-folding in writing systems which are caseless.
* Element type names are case-sensitive: you must follow whatever
combination of upper- or lower-case you use to define them (either
by first usage or in a DTD or Schema). So you can’t say …:
upper- and lower-case must match; thus , ,
and are three different element types;

* For well-formed XML documents with no DTD, the first occurrence
of an element type name defines the casing;
* Attribute names are also case-sensitive, for example the two width
attributes in and
(if they occurred in the same file) are separate attributes, because
of the different case of width and WIDTH;
* Attribute values are also case-sensitive. CDATA values (eg Url=”MyFile.SGML”)
always have been, but NAME types (ID and IDREF attributes, and token
list attributes) are now case-sensitive as well;
* All general and parameter entity names (eg A), and your
data content (text), are case-sensitive as always.

27. How can I make my existing HTML files work in XML?


Either convert them to conform to some new document type (with
or without a DTD or Schema) and write a stylesheet to go with them;
or edit them to conform to XHTML. It is necessary to convert existing
HTML files because XML does not permit end-tag minimisation (missing
, etc), unquoted attribute values, and a number of other SGML shortcuts
which have been normal in most HTML DTDs. However, many HTML authoring
tools already produce almost (but not quite) well-formed XML.
You may be able to convert HTML to XHTML using the Dave Raggett’s
HTML Tidy program, which can clean up some of the formatting mess
left behind by inadequate HTML editors, and even separate out some
of the formatting to a stylesheet, but there is usually still some
hand-editing to do.

28. Is there an XML version of HTML?

Yes, the W3C recommends using XHTML which is ‘a reformulation
of HTML 4 in XML 1.0′. This specification defines HTML as
an XML application, and provides three DTDs corresponding to the
ones defined by HTML 4.* (Strict, Transitional, and Frameset). The
semantics of the elements and their attributes are as defined in
the W3C Recommendation for HTML 4. These semantics provide the foundation
for future extensibility of XHTML. Compatibility with existing HTML
browsers is possible by following a small set of guidelines (see
the W3C site).

29. If XML is just a subset of SGML, can I use XML files
directly with existing SGML tools?

Yes, provided you use up-to-date SGML software which knows about
the WebSGML Adaptations TC to ISO 8879 (the features needed to support
XML, such as the variant form for EMPTY elements; some aspects of
the SGML Declaration such as NAMECASE GENERAL NO; multiple attribute
token list declarations, etc).
An alternative is to use an SGML DTD to let you create a fully-normalised
SGML file, but one which does not use empty elements; and then remove
the DocType Declaration so it becomes a well-formed DTDless XML
file. Most SGML tools now handle XML files well, and provide an
option switch between the two standards.

30. Can XML use non-Latin characters?

Yes, the XML Specification explicitly says XML uses ISO 10646,
the international standard character repertoire which covers most
known languages. Unicode is an identical repertoire, and the two
standards track each other. The spec says (2.2): ‘All XML
processors must accept the UTF-8 and UTF-16 encodings of ISO 10646…’.
There is a Unicode FAQ at http://www.unicode.org/faq/FAQ.
UTF-8 is an encoding of Unicode into 8-bit characters: the first
128 are the same as ASCII, and higher-order characters are used
to encode anything else from Unicode into sequences of between 2
and 6 bytes. UTF-8 in its single-octet form is therefore the same
as ISO 646 IRV (ASCII), so you can continue to use ASCII for English
or other languages using the Latin alphabet without diacritics.
Note that UTF-8 is incompatible with ISO 8859-1 (ISO Latin-1) after
code point 127 decimal (the end of ASCII).
UTF-16 is an encoding of Unicode into 16-bit characters, which lets
it represent 16 planes. UTF-16 is incompatible with ASCII because
it uses two 8-bit bytes per character (four bytes above U+FFFF).

31. What’s a Document Type Definition (DTD) and where do
I get one?

A DTD is a description in XML Declaration Syntax of a particular
type or class of document. It sets out what names are to be used
for the different types of element, where they may occur, and how
they all fit together. (A question C.16, Schema does the same thing
in XML Document Syntax, and allows more extensive data-checking.)

For example, if you want a document type to be able to describe
Lists which contain Items, the relevant part of your DTD might contain
something like this:



This defines a list as an element type containing one or more items
(that’s the plus sign); and it defines items as element types containing
just plain text (Parsed Character Data or PCDATA). Validators read
the DTD before they read your document so that they can identify
where every element type ought to come and how each relates to the
other, so that applications which need to know this in advance (most
editors, search engines, navigators, and databases) can set themselves
up correctly. The example above lets you create lists like:


Chocolate
Music
Surfingv



(The indentation in the example is just for legibility while editing:
it is not required by XML.)
A DTD provides applications with advance notice of what names and
structures can be used in a particular document type. Using a DTD
and a validating editor means you can be certain that all documents
of that particular type will be constructed and named in a consistent
and conformant manner.
DTDs are not required for processing the tip in question Bwell-formed
documents, but they are needed if you want to take advantage of
XML’s special attribute types like the built-in ID/IDREF cross-reference
mechanism; or the use of default attribute values; or references
to external non-XML files (’Notations’); or if you simply
want a check on document validity before processing.
There are thousands of DTDs already in existence in all kinds of
areas (see the SGML/XML Web pages for pointers). Many of them can
be downloaded and used freely; or you can write your own (see the
question on creating your own DTD. Old SGML DTDs need to be converted
to XML for use with XML systems: read the question on converting
SGML DTDs to XML, but most popular SGML DTDs are already available
in XML form.
The alternatives to a DTD are various forms of question C.16, Schema.
These provide more extensive validation features than DTDs, including
character data content validation.

32. Does XML let me make up my own tags?

No, it lets you make up names for your own element types. If you
think tags and elements are the same thing you are already in considerable
trouble: read the rest of this question carefully.

33. How do I create my own document type?

Document types usually need a formal description, either a DTD
or a Schema. Whilst it is possible to process well-formed XML documents
without any such description, trying to create them without one
is asking for trouble. A DTD or Schema is used with an XML editor
or API interface to guide and control the construction of the document,
making sure the right elements go in the right places.
Creating your own document type therefore begins with an analysis
of the class of documents you want to describe: reports, invoices,
letters, configuration files, credit-card verification requests,
or whatever. Once you have the structure correct, you write code
to express this formally, using DTD or Schema syntax.

34. How do I write my own DTD?

You need to use the XML Declaration Syntax (very simple: declaration
keywords begin with



It says that there shall be an element called Shopping-List and
that it shall contain elements called Item: there must be at least
one Item (that’s the plus sign) but there may be more than one.
It also says that the Item element may contain only parsed character
data (PCDATA, ie text: no further markup).
Because there is no other element which contains Shopping-List,
that element is assumed to be the ‘root’ element, which
encloses everything else in the document. You can now use it to
create an XML file: give your editor the declarations:




(assuming you put the DTD in that file). Now your editor will let
you create files according to the pattern:


Chocolate
Sugar

Butter


It is possible to develop complex and powerful DTDs of great subtlety,
but for any significant use you should learn more about document
systems analysis and document type design. See for example Developing
SGML DTDs: From Text to Model to Markup (Maler and el Andaloussi,
1995): this was written for SGML but perhaps 95% of it applies to
XML as well, as XML is much simpler than full SGML—see the
list of restrictions which shows what has been cut out.
Warning
Incidentally, a DTD file never has a DOCTYPE Declaration in it:
that only occurs in an XML document instance (it’s what references
the DTD). And a DTD file also never has an XML Declaration at the
top either. Unfortunately there is still software around which inserts
one or both of these.

35. Can a root element type be explicitly declared in the
DTD?

No. This is done in the document’s Document Type Declaration, not
in the DTD.

36. I keep hearing about alternatives to DTDs. What’s a
Schema?

The W3C XML Schema recommendation provides a means of specifying
formal data typing and validation of element content in terms of
data types, so that document type designers can provide criteria
for checking the data content of elements as well as the markup
itself. Schemas are written in XML Document Syntax, like XML documents
are, avoiding the need for processing software to be able to read
XML Declaration Syntax (used for DTDs).

The term ‘vocabulary’ is sometimes used to refer to
DTDs and Schemas together. Schemas are aimed at e-commerce, data
control, and database-style applications where character data content
requires validation and where stricter data control is needed than
is possible with DTDs; or where strong data typing is required.
They are usually unnecessary for traditional text document publishing
applications.
Unlike DTDs, Schemas cannot be specified in an XML Document Type
Declaration. They can be specified in a Namespace, where Schema-aware
software should pick it up, but this is optional:

xmlns=”http://example.org/ns/books/”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation=”http://acme.wilycoyote.org/xsd/invoice.xsd”>



More commonly, you specify the Schema in your processing software,
which should record separately which Schema is used by which XML
document instance.
In contrast to the complexity of the W3C Schema model, Relax NG
is a lightweight, easy-to-use XML schema language devised by James
Clark with development hosted by OASIS.
It allows similar richness of expression and the use of XML as its
syntax, but it provides an additional, simplified, syntax which
is easier to use for those accustomed to DTDs.

No comments:

Post a Comment