JSON

JavaScript Object Notation

The JSON format was inspired by the object and array format used in the JavaScript language. But since Python was invented before JavaScript, Python’s syntax for dictionaries and lists influenced the syntax of JSON. So the format of JSON is nearly identical to a combination of Python lists and dictionaries. Here is a JSON encoding that is roughly equivalent to the simple XML format:

{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

Source: Python https://docs.python.org/3.6/library/json.html

a JSON object looks like a dictionary. It can have a dictionary as a value, and this can lead to a dictionary in a dictionary tree. It is derived from Javascript in which an object is described as a dictionary.

Load JSON

We can load JSON data into a python JSON object with the methodjson.load(). This is especially handy if we want only certain keys of the JSON file. In the example above I want to make a DataFrame of the hits with a record for each ID. I load the entire JSON file and subtract the hits tree with the method pd.DataFrame.from_dict()

import json
f = open('sample.json')
data = json.load(f)
data
{'max_score': 5.9047804,
 'took': 47,
 'total': 288,
 'hits': [{'_id': '8660',
   '_score': 5.9047804,
   'entrezgene': '8660',
   'name': 'insulin receptor substrate 2',
   'symbol': 'IRS2',
   'taxid': 9606},
  {'_id': '3667',
   '_score': 5.812647,
   'entrezgene': '3667',
   'name': 'insulin receptor substrate 1',
   'symbol': 'IRS1',
   'taxid': 9606},
  {'_id': '3651',
   '_score': 5.288981,
   'entrezgene': '3651',
   'name': 'pancreatic and duodenal homeobox 1',
   'symbol': 'PDX1',
   'taxid': 9606}]}

I can investigate the structure of the JSON file by retrieving the keys. In the example below the key's max_score, took, total and hits are returned

print(data.keys())
dict_keys(['max_score', 'took', 'total', 'hits'])

It seems that the hits key contain interesting records we want to investigate further. These I will parse in a pandas DataFrame

import pandas as pd
df_data = pd.DataFrame.from_dict(data['hits'])
df_data

_id

_score

entrezgene

name

symbol

taxid

0

8660

5.904780

8660

insulin receptor substrate 2

IRS2

9606

1

3667

5.812647

3667

insulin receptor substrate 1

IRS1

9606

9

3651

5.288981

3651

pancreatic and duodenal homeobox 1

PDX1

9606

Write to JSON

We can also define a JSON object and write it to a JSON file. We use the method json.dump() to do such

data = {"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

with open('output.json', 'w') as f:
    json.dump(data, f)

we can use https://jsonlint.com to validate the output

data = df_data.to_json(orient='records')
with open('output.json', 'w') as f:
    json.dump(data, f)

Last updated