Enhancing JSON Serialization In Python With Typing

Rob Blackbourn
4 min readJan 30, 2020

Maintaining valid types when deserializing JSON with Python is hard. This article discusses an approach using typing. You can find the package on GitHub.

The Problem

Here we have a python dict with a mix of data types.

from datetime import datetime
from decimal import Decimal

dct = {
'some_text': 'Hello, World!',
'some_date': datetime.fromisoformat('2020-01-29T12:56:13'),
'some_int': 42,
'some_float': 3.14,
'some_decimal': Decimal("2.414"),
'some_list': [
{
'other_text': 'Hello, World!',
'other_date': datetime.fromisoformat('2020-01-29T12:56:13'),
'other_decimal': Decimal("2.414")
},
{
'other_text': 'Hello, World!',
'other_date': datetime.fromisoformat('2020-01-29T12:56:13'),
'other_decimal': Decimal('2.414')
}
]
}

Because the dict contains datetime and Decimal this won’t serialize without some extra work:

import json

try:
text = json.dumps(dct)
except Exception as error:
print(error)

Object of type datetime is not JSON serializable

To support the serialization of these types we need to provide an encoder.

class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, Decimal):
return float(obj)

This is quite nice. We simply check to see if the object is of a particular type and return a JSON compatible object. For the date this is a string (although we’d prefer it to be in proper ISO 8601 format), and the Decimal becomes a float. Now our serialization works:

text = json.dumps(dct, cls=MyEncoder, indent=4)
print(text)
{
"some_text": "Hello, World!",
"some_date": "2020-01-29T12:56:13",
"some_int": 42,
"some_float": 3.14,
"some_decimal": 2.414,
"some_list": [
{
"other_text": "Hello, World!",
"other_date": "2020-01-29T12:56:13",
"other_decimal": 2.414
},
{
"other_text": "Hello, World!",
"other_date": "2020-01-29T12:56:13",
"other_decimal": 2.414
}
]
}

Now lets deserialize our text.

obj1 = json.loads(text)
print(obj1)
{
'some_text': 'Hello, World!',
'some_date': '2020-01-29T12:56:13',
'some_int': 42,
'some_float': 3.14,
'some_decimal': 2.414,
'some_list': [
{
'other_text': 'Hello, World!',
'other_date': '2020-01-29T12:56:13',
'other_decimal': 2.414
},
{
'other_text': 'Hello, World!',
'other_date': '2020-01-29T12:56:13',
'other_decimal': 2.414
}
]
}
print(dct == obj1)
False

Obviously we haven’t provided any information about the dates and decimals, so we just get back the standard JSON types. To do this we need to provide an object hook function. The object hook takes in any dictionary the JSON parser finds and changes it in any way it sees fit.

def object_hook(dct):
for key, value in dct.items():
try:
dct[key] = datetime.fromisoformat(value)
except:
pass
return dct

Detecting the desired type is obviously a problem. Here I “detected” the type by trying to convert every entry. I could used a regular expression to check first, but I still have to check everything.

However my main problem is that there is no way of telling between a float and a decimal. Rob sad :(

The deserialization now looks like this.

obj2 = json.loads(text, object_hook=object_hook)
print(obj2)
{
'some_text': 'Hello, World!',
'some_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'some_int': 42,
'some_float': 3.14,
'some_decimal': 2.414,
'some_list': [
{
'other_text': 'Hello, World!',
'other_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'other_decimal': 2.414
},
{
'other_text': 'Hello, World!',
'other_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'other_decimal': 2.414
}
]
}
print(dct == obj2)
False

We could put in some business logic to use the key names, but this would require a custom decoder for each data set. We could put some type information in the JSON, but we may not be in control of its generation, and anyway that looks ugly.

Type Annotations

With type annotations we should be able to solve this problem. The data structure we are streaming can be “typed” by using a TypedDict. in Python 3.7 this can be found in the typing_extensions package, while in 3.8 it is standard.

from typing import List
try:
from typing import TypedDict
except:
from typing_extensions import TypedDict

class InnerDict(TypedDict):
other_text: str
other_date: datetime
other_decimal: Decimal

class OuterDict(TypedDict):
some_text: str
some_date: datetime
some_int: int
some_float: float
some_decimal: Decimal
some_list: List[InnerDict]

Using the jetblack-serializationpackage we can use these type annotations. It relies heavily on the typing_inspect package.

Lets see what happens when we serialize our data.

from jetblack_serialization.config import SerializerConfig
from jetblack_serialization.json import deserialize, serialize
from stringcase import camelcase, snakecase

config = SerializerConfig(camelcase, snakecase, pretty_print=True)

text = serialize(dct, OuterDict, config)
print(text)
{
"someText": "Hello, World!",
"someDate": "2020-01-29T12:56:13.00Z",
"someInt": 42,
"someFloat": 3.14,
"someDecimal": 2.414,
"someList": [
{
"otherText": "Hello, World!",
"otherDate": "2020-01-29T12:56:13.00Z",
"otherDecimal": 2.414
},
{
"otherText": "Hello, World!",
"otherDate": "2020-01-29T12:56:13.00Z",
"otherDecimal": 2.414
}
]
}

First it’s camel-cased the keys! The SerializerConfig takes a key serializer and deserializer (using the stringcase package). We can see the date got formatted nicely (the config option also allows for value serializers).

Now lets see how the JSON is deserialized.

obj = deserialize(text, OuterDict, config)
print(obj)
{
'some_text': 'Hello, World!',
'some_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'some_int': 42,
'some_float': 3.14,
'some_decimal': Decimal('2.414'),
'some_list': [
{
'other_text': 'Hello, World!',
'other_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'other_decimal': Decimal('2.414')
},
{
'other_text': 'Hello, World!',
'other_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'other_decimal': Decimal('2.414')
}
]
}

The keys are now snake-case and the types have been correctly identified, even the decimals. A quick check shows a successful round trip.

print(dct == obj)
True

Rob happy :)

And just for fun, we can serialize into XML.

from typing_extensions import Annotated
from jetblack_serialization.xml import serialize as serialize_xml, XMLEntity
from stringcase import pascalcase

xml_config = SerializerConfig(pascalcase, snakecase, pretty_print=True)

text = serialize_xml(dct, Annotated[OuterDict, XMLEntity('Dict')], xml_config)
print(text)
<Dict>
<SomeText>Hello, World!</SomeText>
<SomeDate>2020-01-29T12:56:13.00Z</SomeDate>
<SomeInt>42</SomeInt>
<SomeFloat>3.14</SomeFloat>
<SomeDecimal>2.414</SomeDecimal>
<SomeList>
<OtherText>Hello, World!</OtherText>
<OtherDate>2020-01-29T12:56:13.00Z</OtherDate>
<OtherDecimal>2.414</OtherDecimal>
</SomeList>
<SomeList>
<OtherText>Hello, World!</OtherText>
<OtherDate>2020-01-29T12:56:13.00Z</OtherDate>
<OtherDecimal>2.414</OtherDecimal>
</SomeList>
</Dict>

As XML requires a tag to wrap the object we needed to put in some extra information which was achieved with the Annotated tag available through the typing extensions package and described in PEP 593.

I’ve been using this with a REST framework which uses typing to automatically bind variables to reduce the effort to produce the API. You can find that code here.

--

--