XML is another common structured data format supporting hierarchal nested data with metadata. XML and HTML are structured similar but XML is more general. An example of XML is given below:
Copy <CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>
We can fetch the XML tree by open the URL and read the data and decode the data.
Copy import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = 'https://bioinf.nl/~fennaf/DSLS/plants.xml'
print('Retrieving', url)
uh = urllib.request.urlopen(url, context=ctx)
data = uh.read()
print('Retrieved', len(data), 'characters')
print(data.decode())
Copy Retrieving https://bioinf.nl/~fennaf/DSLS/plants.xml
Retrieved 7086 characters
<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>
.....
<PLANT>
<COMMON>Cardinal Flower</COMMON>
<BOTANICAL>Lobelia cardinalis</BOTANICAL>
<ZONE>2</ZONE>
<LIGHT>Shade</LIGHT>
<PRICE>$3.02</PRICE>
<AVAILABILITY>022299</AVAILABILITY>
</PLANT>
</CATALOG>
Copy import xml.etree.ElementTree as ET
data = '''
<person>
<name>Fenna</name>
<phone type="intl">
+31646080034
</phone>
<email hide="yes" />
</person>'''
tree = ET.fromstring(data)
print('Name:', tree.find('name').text)
print('Attr:', tree.find('email').get('hide'))
Copy import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = 'https://bioinf.nl/~fennaf/DSLS/plants.xml'
print('Retrieving', url)
uh = urllib.request.urlopen(url, context=ctx)
data = uh.read()
tree = ET.fromstring(data)
for child in tree:
print('\n')
for element in child:
print(element.tag, element.text)
Copy Retrieving https://bioinf.nl/~fennaf/DSLS/plants.xml
COMMON Bloodroot
BOTANICAL Sanguinaria canadensis
ZONE 4
LIGHT Mostly Shady
PRICE $2.44
AVAILABILITY 031599
.....
COMMON Cardinal Flower
BOTANICAL Lobelia cardinalis
ZONE 2
LIGHT Shade
PRICE $3.02
AVAILABILITY 022299