faqts : Computers : Internet : Web : XML

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

15 of 18 people (83%) answered Yes
Recently 10 of 10 people (100%) answered Yes

Entry

What is XML?

Sep 16th, 2003 09:05
Knud van Eeden, Michael Claßen,


XML stands for Extensible Markup Language. It is an open standard for
creating structured documents. 
XML is a simplified version of SGML, a well-established standard for
structured documents in the publishing industry. 
As opposed to plain text an XML document contains a mixture of text and
nested sets of so-called tags. Tags are words enclosed in pointy
brackets, for instance <name>Michael Claßen</name>. XML supports text 
in
various encodings, allowing for the creation and exchange of 
documents in different international character sets.
-----------------------------------------------------------------------
--- Knud van Eeden - 20 September 2020 - 21:25 ------------------------
The main goal of XML is to separate the data from the representation.
---
To achieve this goal, standardized ways to handle and store this
data have been developed.
---
The idea is that you store your data only once, and use it everywhere.
---
XML can be used to store data in a standardized format that can be
deciphered by a large number of tools.
---
Because of the standardized way the data is stored, easy exchange
of data (where the XML format plays an intermediary role), e.g.
between different databases of companies, is possible.
---
If the involved systems follow this XML standard, automatic exchange
(one of the goals when introducing XML) of data (e.g. via the Internet)
without the intervention of a human operator is a possibility.
---
If you look how this data is stored -- in a file with the 
extension .xml --
it looks very similar to plain HTML, like <BODY>, <HTML>, but here you 
can
per definition choose you own tags. This because HTML (=HyperText 
Markup
Language) is a special case of XML (=eXtensible Markup Language), and 
XML
is in turn a special case of SGML (=Standard Generalized Markup 
Language).
---
To handle and show this data:
You typically have 3 files:
-one file containing your data with extension .xml
-one file describing this data with extension .xsd (or similarly .dtd)
-one file informing how to represent this data with extension .xsl
---
Here the function of this 3 files is:
-this .xml file is the INPUT
 (containing your variables (their names and values), similar to the
  DATA keyword in BASIC,
  or { mydata } in C++)
-this .xsd or .dtd file describes how the structure of this input data
 should look like (so it fixes the order, that is what comes after
 what, how many times (e.g. only 1 time or more) and the fixed set of
 tag names to be used)
-this .xsl file takes its input from this .xml file, and produces
 some (text) OUTPUT with it.
 (so what you basically do when writing this output .xsl file is
 telling how this variable values can be found in this input .xml
 file).
So alltogether it is just some special case of an INPUT-OUTPUT model.
---
The quick way to generate an input .xml and output .xsl file, is:
1. To put your or the wanted original file (e.g. HTML file) as a whole
   in the .xsl file
2. Then manually putting everything which is variable (so the DATA) in
   it (e.g. filenames, sizes, ...) in the .xml file
   e.g. you work from the top to the bottom through your original
   .html file. Everytime you find something which might vary (e.g.
   filenames) in the .xml file, put this in the .xml file, together
   with some tags in order to be able to find this variables.
   If you have to introduce new tags, adapt your .dtd file accordingly.
3. Then you tell in your .xsl file where to find this DATA
   (for example one step at a time, step by step, and all the time
   checking the new output result in your browser) replacing this
   original DATA information in the .xsl file with the XPath in the
   .xml file, to this variable values.
   ---
   This is similar to using e.g.
    READ DATA
   in BASIC, only that when using it in .xsl, you are using
   some specialized language for it.
---
A similar approach I myself have used a lot of times before
in e.g. BASIC when writing programs which created other (text)
programs.
I just putted everything what was variable in DATA
statements,
then generated the output program text, by using PRINT statements
(which got its information via READ DATA statements)
That worked just fine.
But it is just some ad hoc solution.
If you are using XML and XSL the idea is the same. It is more complex
to learn (because of e.g. this specialized xsl language), but what you
gain is that you adapt yourself to a state of the art, generally
accepted, industry standard. Which makes life much easier once you have
created this xml, xsl and dtd file, as everything is clearly separated
in seperate files, and rather independent of each other. So if you have
to change some DATA values, you just go to the .xml file, and change it
there only.
---
-You can then use e.g. XSLT (=eXtensible Stylesheet Language
Transformation) -- in a file, also in XML format, with the 
extension .xsl
-- to transform this data to the representation of your wishing (e.g. 
most
of the time HTML and further PDF (=Portable Document Format), 
TeX, ...).
---
-If you want to produce vector graphics (e.g. statistics presented via 
line
graphs, bar charts, pie charts, ...), you can use e.g. SVG (=Scalable
Vector Graphics) -- in a file, also in XML format, with the 
extension .svg.
[Internet: see also: http://downloads-zdnet.com.com/3120-20-0.html?
qt=6wdsvg6&tg=dl-2001&SWLink=n]
---
-You can inform (others) about the structure of your data by storing 
this
information in a file with extension .DTD (=Document Type Definition, 
which
describes the data in a way similar to Backus Naur form used in 
describing
computer languages), or .XSD (=Xml Software Description). Both are 
similar,
but the advantage of using XSD is, that it is also in XML format.
---
To access all this information, you use e.g. JavaScript with the DOM
(=Document Object Model) or SAX (=Simple API for XML) libraries, which
contain prebuilt methods to access and parse this data
(to debug this JavaScript, you can use the free JavaScript debugger
from Microsoft)
Further possibilities are e.g. using Java, C++ or Delphi (but the 
parsing can,
though with sometimes much more effort, be done in basically any 
computer
language, as you might have to develop this routines yourself).
---
Every XML format is basically a so called 'tree', so you use methods 
like
create, add, remove, walk this tree, copy, ... to access your data 
stored
in that tree in that way. This tree in this particular case contains a
total of 12 different types, e.g. nodes (which can have attributes) and
text.
Because all information is in a standard XML format, you can access,
change, handle the information with one set of similar methods (e.g. 
the
XML file with your data, the XSD file describe this data, the XSLT file
creating a representation of this data, ..., you can change or read 
with
one similar tree transversing method, e.g. written in JavaScript).
---
All these files (XML, DTD, XSD, XSL, XSLT, ...) are in ASCII format, 
so you
can easily read the content of this files and also create these files 
in
almost any wordprocessor. You type the text in the correct format, 
then you
save it as plain text, respectively with the extension 
(.xml, .dtd, .xsd,
.xsl, ...).
A good XML editor (XMLSpy v4.0) with a lot of extras can be found and a
trial version downloaded at http://www.xmlspy.com
---
A good book is 'inside XML', by Steve Holzner -- see 
http://www.amazon.com
---
A good overview of all functions used or which you can use in XML, XSL,
CSS, HTML, JavaScript, ..., to handle and represent your data can be 
found
at http://www.devguru.com (click on the left on e.g. 'XML', then click 
on
'View index').
---
A typical simple minimum scenario when developing with XML is:
1. you have the browser Microsoft Explorer v5.0 (preferably v6.0, as it
   contains the latest versions to handle XML) installed
2. you type the XML text, containing your data, in your favorite
   wordprocessor, and save it as plain text with extension .xml. In 
this
   file you put references, similar to HREF="..." in plain HTML, to the
   whereabouts of the other needed files (typically the XSD and XSL 
file,
   so that the browser knows where to look).
3. you type the XSD or DTD text, containing the description of your 
data,
   in your favorite wordprocessor, and save it as plain text with 
extension
   .xsd or .dtd
4. you type the XSL text, containing the representation of your data
   (typical output is an HTML file, which shows your data in a 
browser),
   and save it as plain text with extension .xsl.
5. If you have typed your information without errors (the format will 
be
   checked, and it will only run if found correct), you load this .xml 
file
   in your Microsoft Internet Explorer, and view the results.
6. To change and handle the XML information stored in this different 
files,
   you can e.g. create HTML files containing JavaScript, added by the
   inbuild DOM functions, then load this files in your browser and 
view the
   changes in the representation of your data.
---
example:
Create a book database in XML:
I. Create XML file containing data itself
II. Create DTD file containing description structure data
III. Create XSL file containing what has to be printed when data is 
found
To keep it simple, keep e.g. all these 3 files, book.xml, book.dtd and
book.xsl in the same directory (e.g. c:\temp)
---
I. Create this XML file containing the data (type or copy the 
following in
   your favorite wordprocessor, and save it as MSDOS or ASCII text, 
e.g. as
   book.xml)
You are by definition free to create your own tags and structure, so 
let us
call one tag 'MYBOOKS' containing all my books. the books 
itself 'BOOK',
where each book has an 'AUTHOR' and a 'TITLE'.
So this becomes in XML:
<MYBOOKS>
 <BOOK>
  <AUTHOR>
  </AUTHOR>
  <TITLE>
  </TITLE>
 </BOOK>
</MYBOOKS>
---
When using XML, you are obliged to always use a begin and end tag
(e.g. <MYBOOKS> and </MYBOOKS>).
Further is XML case sensitive (thus
<mybooks> and <MYBOOKS> are not the same)
Also be aware of extra spaces, as this might have influence on
your results.
If you use a slash at the end of a tag, like <BR/>, <HR/>, ..., you
indicate that this is a begin and endtag in one, so avoiding having
to write <BR></BR>, or <HR></HR>, ... as you are obliged to do when
using XML.
---
Let us add a few books to this:
<MYBOOKS>
 <BOOK>
  <AUTHOR>
    Holzner, Steve
  </AUTHOR>
  <TITLE>
    Inside XML
  </TITLE>
 </BOOK>
 <BOOK>
  <AUTHOR>
    Knuth, Donald
  </AUTHOR>
  <TITLE>
    The art of computer programming
  </TITLE>
 </BOOK>
</MYBOOKS>
---
Now add some extra information describing:
1. the type of XML used:
   <?xml version="1.0" encoding="UTF-8"?>
2. the whereabouts of the DTD file
   <!DOCTYPE MYBOOKS SYSTEM "book.dtd">
3. the whereabouts of the XSL file.
   <?xml-stylesheet type="text/xsl" href="book.xsl"?>
Thus you get all together:
---
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE MYBOOKS SYSTEM "book.dtd">
<?xml-stylesheet type="text/xsl" href="book.xsl"?>
<MYBOOKS>
 <BOOK>
  <AUTHOR>
    Holzner, Steve
  </AUTHOR>
  <TITLE>
    Inside XML
  </TITLE>
 </BOOK>
 <BOOK>
  <AUTHOR>
    Knuth, Donald
  </AUTHOR>
  <TITLE>
    the art of computer programming
  </TITLE>
 </BOOK>
</MYBOOKS>
---
So this is your file containing your data.
---
---
II. Create the .dtd (or .xsd) file in your favorite wordprocessor,
   and save it (e.g. as book.dtd)  :
---
Now let us proceed to describe the structure of this data.
You have the structure of mybooks, containing one or more books, each
containing 1 author and 1 title.
Using the standard notation for this fact, you can write this as
(you analyse it all the time, by working top down, starting from the
big thing, whole, and descending into more and more detail):
---
1. The fact that mybooks contains one or more books, you can write in
   standard notation as:
 <!ELEMENT MYBOOKS (BOOK+)>
2. The fact that one book contains one author, then one title, you can
   write in standard notation as:
 <!ELEMENT BOOK (AUTHOR,TITLE)>
3. The fact that the author contains describing text you write as:
 <!ELEMENT AUTHOR (#PCDATA)>
4. The fact that the title contains describing text you write as:
 <!ELEMENT TITLE (#PCDATA)>
So you have until now:
 <!ELEMENT MYBOOKS (BOOK+)>
 <!ELEMENT BOOK (AUTHOR,TITLE)>
 <!ELEMENT AUTHOR (#PCDATA)>
 <!ELEMENT TITLE (#PCDATA)>
---
Here the symbols mean:
!ELEMENT indicates you are dealing with a tag (that is anything
between '<' and '>')
+ means one or more times
#PCDATA means text
---
So all together you get:
---
<!ELEMENT MYBOOKS (BOOK+)>
<!ELEMENT BOOK (AUTHOR,TITLE)>
<!ELEMENT AUTHOR (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)>
---
Save this as plain text, and call it e.g. book.dtd
---
If you want to have the XSD version of this file, use e.g. XMLSpy
(see http://www.xmlspy.com) which automatically converts DTD to XSD
and vice versa.
---
This information in this file book.dtd will be used by the browser to
check if you are filling in your
data correctly (e.g. here first the author, then the title, in that
order, and not the other way around). It knows this, as you have first
written (AUTHOR), then (TITLE) in the description above.
---
---
III. Now create the file which is going to create the representation of
     your data (type it also in your favorite wordprocessor, save it as
     plain MSDOS or ASCII text, and call it e.g. book.xsl)
You have the structure of mybooks, containing one or more books, each
containing 1 author and 1 title.
For all of this parts you have to write what should happen,
using some standard commands (from the XSL language) and describing
how to represent that particular part.
You analyse it all the time, by working top down, starting from the big
thing, whole, and descending into more and more detail.
You ask yourself all the time, now if I encounter this part, which code
(=printed text) do I like to see generated??, and you adapt the 
routines
below according to this wishes):
Let us generate a HTML file in this case.
---
1. Let us start with the part 'MYBOOKS', you can indicate how to 
handle the
   representation of this via:
   When the program encounters a 'MYBOOK', it should print out the
   overall structure of a simple HTML file, which is
   <HTML>
    <HEAD>
     <TITLE>
     </TITLE>
    </HEAD>
    <BODY>
    </BODY>
   </HTML>
---
In between I will put some extra information, so that at the end I will
want to have something like (by working backward from this result, this
solution, you can easily determine what each of the parts has to 
produce,
and code correspondingly. It might help to see the constant, non
changing parts and the variable parts, which depend on the values in my
data, in this structure below. The variable data information I get from
my XML, the varying structure via my XSL).
---
So in general you determine first (e.g. by hand coding) how your HTML
output code should look like.
This allows you to keep an eye on what you want to receive as an
endresult.
And after that you adapt your XSL accordingly until you get that wanted
HTML endresult.
So you work backwards from the result to your input, so thus backwards
from your HTML to your XSL (and XML).
In any case, if you should generate the HTML code yourself manually
(temporarily for testing purposes, say), or you let XML and XSL do the
job, the endresult should be functionally equivalent and or the same.
That is, both methods deliver some (functionally equivalent or even
the same) HTML source code text at the end.
---
2. So this is what I want to see produced (printed) as a final result 
at
the end:
   <HTML>
    <HEAD>
     <TITLE>
       My book database
     </TITLE>
    </HEAD>
    <BODY>
     <H3> Overview book database </H3>
     <BR>
     <BR>
     <HR>
     <HR>
      <FONT SIZE="3" COLOR="blue">
      <BR> book:
       <BR> title=Inside XML
       <BR>author=Steve Holzner
      </FONT>
     <HR>
      <FONT SIZE="3" COLOR="blue">
      <BR> book:
       <BR> title=the art of computer programming
       <BR> author=Donald Knuth
      </FONT>
     <HR>
    </BODY>
   </HTML>
---
How can I achieve this?
When MYBOOKS is encountered, I let it print the bulk of the above
structure, that is:
   <HTML>
    <HEAD>
     <TITLE>
       My book database
     </TITLE>
    </HEAD>
    <BODY>
     <H3> Overview book database </H3>
     <BR>
     <BR>
     <HR>
     <HR>
       ... and here comes the source code generated by the other parts
           (=book, author and title) ...
    </BODY>
   </HTML>
---
So what I usually do, is that I first create an example of the end HTML
code I want to see generated. Then I put this as a whole in the XSL
file (as above shown), connect it with the main node (e.g. '/')
and run the corresponding XML file. Usually you get some debugging
errors, mostly that you will have to add some extra end tags (like
/>, or </IMG>), because these are usually forgotten or not really
necessary in plain HTML.
Then systematically I replace the variable parts in this
HTML code by more specific xsl routines which will generate this
variable HTML parts from the given XML data.
So I work from the whole to the details, and backwards from the
endresult to the given input.
---
Each time when a book is encountered, I let it print the following 
source
code:
     <FONT SIZE="3" COLOR="blue">
       <BR> book:
         ... and here comes the source generated by the other parts
             (=author and title) ...
     </FONT>
     <HR>
---
Each time when a author is encountered, I let it print the following 
source
code:
       <BR> author= ... and here comes the current value of the 
author ...
---
Each time when a title is encountered, I let it print the following 
source
code:
       <BR> title= ... and here comes the current value of the 
title ...
---
Putting this together, using for example the XSL commands (see e.g.
http://www.devguru.com for the details)
xsl:template, xsl:apply and xsl:value-of.
Where:
1. xsl:template
    tells you for which part to look for.
2. xsl:apply
    means that you have to go further searching for other parts
3. xsl:value-of
    means you have to take the current found value of that part
---
So what you do is, you use the above story:
1. Just copy the corresponding text from above
2. Add <xsl:template match=""> above, and </xsl template> below
   that
3. filling in the name of that part after 'match='
4. at the last part filling in
    match="."
5. Replace
   '... and here comes ...'
   by
   <xsl:apply-templates/>
6. Add and extra '/' at the end of the single tags like
   <BR>, <HR>, ... because in XML each tag must have a
   begin tag and an end tag.
So keeping the order in the structure intact, and working very
systematically.
---
What happens during the analysis, if the XSL interpreter encounters
any non-XSL, it simply ignores this text, and prints it as such
to the screen or output.
---
So you get here for the part MYBOOKS:
 <xsl:template match="MYBOOKS">
   <HTML>
    <HEAD>
     <TITLE>
       My book database
     </TITLE>
    </HEAD>
    <BODY>
     <H3> Overview book database </H3>
     <BR/>
     <BR/>
     <HR/>
     <HR/>
      <xsl:apply-templates/>
     <HR/>
    </BODY>
   </HTML>
 </xsl:template>
---
 <xsl:template match="BOOK">
  <FONT SIZE="3" COLOR="blue">
   <BR/> book:
    <xsl:apply-templates/>
  </FONT>
  <HR/>
 </xsl:template>
---
So you get here for the part AUTHOR:
 <xsl:template match="AUTHOR">
  <BR/> author=
   <xsl:value-of select="."/>
 </xsl:template>
---
So you get here for the part TITLE:
 <xsl:template match="TITLE">
  <BR/> title=
   <xsl:value-of select="."/>
  </xsl:template>
---
You will have to add some extra information to this XSL command,
that is:
1. the type of XML used:
 <?xml version="1.0"?>
2. the type of XSL used, and making sure you have unique names:
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
3. In Microsoft Explorer, you will have to explicitely indicate
 what happens when the root '/' is encountered
 So here we add
 <xsl:template match="/">
  <apply-templates/>
 </xsl:template>
---
Putting this all together you get:
---
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
 <xsl:template match="/">
   <xsl:apply-templates/>
 </xsl:template>
 <xsl:template match="MYBOOKS">
   <HTML>
    <HEAD>
     <TITLE>
       My book database
     </TITLE>
    </HEAD>
    <BODY>
     <H3> Overview book database </H3>
     <BR/>
     <BR/>
     <HR/>
     <HR/>
      <xsl:apply-templates/>
     <HR/>
    </BODY>
   </HTML>
 </xsl:template>
 <xsl:template match="BOOK">
  <FONT SIZE="3" COLOR="blue">
   <BR/> book:
    <xsl:apply-templates/>
  </FONT>
  <HR/>
 </xsl:template>
 <xsl:template match="AUTHOR">
  <BR/> author=
   <xsl:value-of select="."/>
 </xsl:template>
 <xsl:template match="TITLE">
  <BR/> title=
   <xsl:value-of select="."/>
  </xsl:template>
</xsl:stylesheet>
---
Type or copy this in your favorite wordprocessor, and save it as plain
ASCII or MSDOS text as
book.xsl
---
---
Now load the file book.xml in the Microsoft Internet Explorer browser,
e.g. by typing
 c:\temp\book.xml
(given that you stored book.xml, book.dtd and book.xsl in the c:\temp
directory)
---
After some possible debugging (your browser will give you hints when
you made mistakes, like 'Multiple colons are not allowed', and the 
like)
you will see the result as the output of an HTML file in your browser
screen.
Showing the following text:
---------------------------------------------------------------
Overview book database
---------------------------------------------------------------
book:
author= Holzner, Steve
title= Inside XML
---------------------------------------------------------------
book:
author= Knuth, Donald
title= the art of computer programming
---------------------------------------------------------------
[Internet: see also:
http://www.faqts.com/knowledge_base/view.phtml/aid/11672
http://www.faqts.com/knowledge_base/view.phtml/aid/23933/fid/175
http://www.faqts.com/knowledge_base/view.phtml/aid/24207/fid/671
--- Knud van Eeden - 21 September 2020 - 05:07 ------------------------
-----------------------------------------------------------------------