XML

eXtensible Markup Language

XML is another common structured data format supporting hierarchal nested data with metadata. XML and HTML are structured similar but XML is more general. An example of XML is given below:

<CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>

Load XML

We can fetch the XML tree by open the URL and read the data and decode the data.

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl


# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = 'https://bioinf.nl/~fennaf/DSLS/plants.xml'
print('Retrieving', url)
uh = urllib.request.urlopen(url, context=ctx)

data = uh.read()
print('Retrieved', len(data), 'characters')
print(data.decode())
Retrieving https://bioinf.nl/~fennaf/DSLS/plants.xml
Retrieved 7086 characters
<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
 <PLANT>
 <COMMON>Bloodroot</COMMON>
 <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
 <ZONE>4</ZONE>
 <LIGHT>Mostly Shady</LIGHT>
 <PRICE>$2.44</PRICE>
 <AVAILABILITY>031599</AVAILABILITY>
 </PLANT>

.....

 <PLANT>
 <COMMON>Cardinal Flower</COMMON>
 <BOTANICAL>Lobelia cardinalis</BOTANICAL>
 <ZONE>2</ZONE>
 <LIGHT>Shade</LIGHT>
 <PRICE>$3.02</PRICE>
 <AVAILABILITY>022299</AVAILABILITY>
 </PLANT>
</CATALOG>

If we need specific information we can use ElementTree to fetch that data

import xml.etree.ElementTree as ET

data = '''
<person>
  <name>Fenna</name>
  <phone type="intl">
    +31646080034
  </phone>
  <email hide="yes" />
</person>'''

tree = ET.fromstring(data)
print('Name:', tree.find('name').text)
print('Attr:', tree.find('email').get('hide'))
Name: Fenna
Attr: yes
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl


# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = 'https://bioinf.nl/~fennaf/DSLS/plants.xml'
print('Retrieving', url)
uh = urllib.request.urlopen(url, context=ctx)

data = uh.read()
tree = ET.fromstring(data)
for child in tree:
    print('\n')
    for element in child:
        print(element.tag, element.text)
Retrieving https://bioinf.nl/~fennaf/DSLS/plants.xml


COMMON Bloodroot
BOTANICAL Sanguinaria canadensis
ZONE 4
LIGHT Mostly Shady
PRICE $2.44
AVAILABILITY 031599

.....

COMMON Cardinal Flower
BOTANICAL Lobelia cardinalis
ZONE 2
LIGHT Shade
PRICE $3.02
AVAILABILITY 022299

Last updated