eBook - ePub

Beginning XML

Name: Beginning XML
Author: Joe Fawcett, Danny Ayers, Liam R. E. Quin

Joe Fawcett, Danny Ayers, Liam R. E. Quin

Share book

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Beginning XML

Joe Fawcett, Danny Ayers, Liam R. E. Quin

Book details

Book preview

Table of contents

Citations

About This Book

A complete update covering the many advances to the XML language

The XML language has become the standard for writing documents on the Internet and is constantly improving and evolving. This new edition covers all the many new XML-based technologies that have appeared since the previous edition four years ago, providing you with an up-to-date introductory guide and reference. Packed with real-world code examples, best practices, and in-depth coverage of the most important and relevant topics, this authoritative resource explores both the advantages and disadvantages of XML and addresses the most current standards and uses of XML.

Features the most updated content built on audience feedback from the previous edition as well as the vast knowledge from XML developer teams
Boasts new chapters on RELAX NG and Schematron, XML functionality in databases, LINQ to XML, Jabber and XMLPP, XHTML, HTML5, and more
Offers in-depth coverage on extracting data from XML and updated material on Web Services

Beginning XML, Fifth Edition delivers the most important aspects of XML in regard to what it is, how it works, what technologies surround it, and how it can best be used in a variety of situations.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Beginning XML an online PDF/ePUB?

Yes, you can access Beginning XML by Joe Fawcett, Danny Ayers, Liam R. E. Quin in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming Languages. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Wrox

Year

2012

ISBN

9781118239483

Edition

Topic

Computer Science

Subtopic

Programming Languages

Index

Computer Science

PART I

Introducing XML

CHAPTER 1: What Is XML?
CHAPTER 2: Well-Formed XML
CHAPTER 3: XML Namespaces

Chapter 1 What Is XML?

WHAT YOU’LL WILL LEARN IN THIS CHAPTER:

The story before XML
How XML arrived
The basic format of an XML document
Areas where XML is useful
A brief introduction to the technologies surrounding, and associated with, XML

XML stands for Extensible Markup Language (presumably the original authors thought that sounded more exciting than EML) and its development and usage have followed a common path in the software and IT world. It started out more than ten years ago and was originally used by very few; later it caught the public eye and began to pervade the world of data exchange. Subsequently, the tools available to process and manage XML became more sophisticated, to such an extent that many people began to use it without being really aware of its existence. Lately there has been a bit of a backlash in certain quarters over its perceived failings and weak points, which has led to various proposed alternatives and improvements. Nevertheless, XML now has a permanent place in IT systems and it’s hard to imagine any non-trivial application that doesn’t use XML for either its configuration or data to some degree. For this reason it’s essential that modern software developers have a thorough understanding of its principles, what it is capable of, and how to use it to their best advantage. This book can give the reader all those things.

NOTE Although this chapter presents some short examples of XML, you aren’t expected to understand all that’s going on just yet. The idea is simply to introduce the important concepts behind the language so that throughout the book you can see not only how to use XML, but also why it works the way it does.

STEPS LEADING UP TO XML: DATA REPRESENTATION AND MARKUPS

There are two main uses for XML: One is a way to represent low-level data, for example configuration files. The second is a way to add metadata to documents; for example, you may want to stress a particular sentence in a report by putting it in italics or bold.

The first usage for XML is meant as a replacement for the more traditional ways this has been done before, usually by means of lists of name/value pairs as is seen in Windows’ INI or Java’s Property files. The second application of XML is similar to how HTML files work. The document text is contained in an overall container, the <body> element, with individual phrases surrounded by <i> or <b> tags. For both of these scenarios there has been a multiplicity of techniques devised over the years. The problem with these disparate approaches has been more apparent than ever, since the increased use of the Internet and extensive existence of distributed applications, particularly those that rely on components designed and managed by different parties. That problem is one of intercommunication. It’s certainly possible to design a distributed system that has two components, one outputting data using a Windows INI file and the other which turns it into a Java Properties format. Unfortunately, it means a lot of development on both sides, which shouldn’t really be necessary and detracts resources from the main objective, developing new functionality that delivers business value.

XML was conceived as a solution to this kind of problem; it is meant to make passing data between different components much easier and relieve the need to continually worry about different formats of input and output, freeing up developers to concentrate on the more important aspects of coding such as the business logic. XML is also seen as a solution to the question of whether files should be easily readable by software or by humans; XML’s aim is to be both. You’ll be examining the distinction between data-oriented and document-centric XML later in the book, but for now let’s look a bit more deeply into what the choices were before XML when there was need to store or communicate data in an electronic format.

This section takes a mid-level look at data representation, without taking too much time to explain low-level details such as memory addresses and the like. For the purposes here you can store data in files two ways: as binary or as text.

Binary Files

A binary file, at its simplest, is just a stream of bits (1s and 0s). It’s up to the application that created the binary file to understand what all of the bits mean. That’s why binary files can only be read and produced by certain computer programs, which have been specifically written to understand them.

For example, when saving a document in Microsoft Word, using a version before 2003, the file created (which has a doc extension) is in a binary format. If you open the file in a text editor such as Notepad, you won’t be able to see a picture of the original Word document; the best you’ll be able to see is the occasional line of text surrounded by gibberish rather than the prose, which could be in a number of formats such as bold or italic. The characters in the document other than the actual text are metadata, literally information about information. Mixing data and metadata is both common and straightforward in a binary file. Metadata can specify things such as which words should be shown in bold, what text is to be displayed in a table, and so on. To interpret this file you the need the help of the application that created it. Without the help of a converter that has in-depth knowledge of the underlying binary format, you won’t be able to open a document created in Word with another similar application such as WordPerfect. The main advantage of binary formats is that they are concise and can be expressed in a relatively small space. This means that more files can be stored (on a hard drive, for example) but, more importantly nowadays, less bandwidth is used when transporting these files across networks.

Text Files

The main difference between text and binary files is that text files are human and machine readable. Instead of a proprietary format that needs a specific application to decipher it, the data is such that each group of bits represents a character from a known set. This means that many different applications can read text files. On a standard Windows machine you have a choice of Notepad, WordPad, and others, including being able to use command-line–based utilities such as Edit. Non-Windows machines have a similar wide range insert of programs available, such as Emacs and Vim.

NOTE The way that characters are represented by the underlying data stream is referred to as a file’s encoding. The specific encoding used is often present as the first few bytes in the file; an application checks these bytes upon opening the file and then knows how to display and manipulate the data. There is also a default encoding if these first few bytes are not present. XML also has other ways of specifying how a file was encoded, and you’ll see these later on.

The ability to be read and understood by both humans and machines is not the only advantage of text files; they are also comparatively easier to parse than binary files. The main disadvantage however, is their size. In order for text files to contain metadata (for example, a stretch of text to be marked as important), the relevant words are usually surrounded by characters denoting this extra information, which are somehow differentiated from the actual text itself. The most common examples of this can be found in HTML, where angle brackets are special symbols used to convey the meaning that anything within them refers to how the text should be treated rather than the actual data. For example, if I want mark a phrase as important I can wrap it like so:

<strong>returns must include the item order number</strong>

Another disadvantage of text files is their lack of support for metadata. If you open a Word document that contains text in an array of fonts with different styles and save it as a text file, you’ll just get a plain rendition; all of the metadata has been lost. What people were looking for was some way to have the best of both worlds — a human-readable file that could also be read by a wide range of applications, and could carry metadata along with its content. This brings us to the subject of markup.

A Brief History of Markup

The advantages of text files mad...