Enhancing JSON Serialization In Python With Typing
--
Maintaining valid types when deserializing JSON with Python is hard. This article discusses an approach using typing. You can find the package on GitHub.
The Problem
Here we have a python dict
with a mix of data types.
from datetime import datetime
from decimal import Decimal
dct = {
'some_text': 'Hello, World!',
'some_date': datetime.fromisoformat('2020-01-29T12:56:13'),
'some_int': 42,
'some_float': 3.14,
'some_decimal': Decimal("2.414"),
'some_list': [
{
'other_text': 'Hello, World!',
'other_date': datetime.fromisoformat('2020-01-29T12:56:13'),
'other_decimal': Decimal("2.414")
},
{
'other_text': 'Hello, World!',
'other_date': datetime.fromisoformat('2020-01-29T12:56:13'),
'other_decimal': Decimal('2.414')
}
]
}
Because the dict contains datetime
and Decimal
this won’t serialize without some extra work:
import json
try:
text = json.dumps(dct)
except Exception as error:
print(error)
Object of type datetime is not JSON serializable
To support the serialization of these types we need to provide an encoder.
class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, Decimal):
return float(obj)
This is quite nice. We simply check to see if the object is of a particular type and return a JSON compatible object. For the date this is a string (although we’d prefer it to be in proper ISO 8601 format), and the Decimal becomes a float. Now our serialization works:
text = json.dumps(dct, cls=MyEncoder, indent=4)
print(text){
"some_text": "Hello, World!",
"some_date": "2020-01-29T12:56:13",
"some_int": 42,
"some_float": 3.14,
"some_decimal": 2.414,
"some_list": [
{
"other_text": "Hello, World!",
"other_date": "2020-01-29T12:56:13",
"other_decimal": 2.414
},
{
"other_text": "Hello, World!",
"other_date": "2020-01-29T12:56:13",
"other_decimal": 2.414
}
]
}
Now lets deserialize our text.
obj1 = json.loads(text)
print(obj1)
{
'some_text': 'Hello, World!',
'some_date': '2020-01-29T12:56:13',
'some_int': 42,
'some_float': 3.14,
'some_decimal': 2.414,
'some_list': [
{
'other_text': 'Hello, World!',
'other_date': '2020-01-29T12:56:13',
'other_decimal': 2.414
},
{
'other_text': 'Hello, World!',
'other_date': '2020-01-29T12:56:13',
'other_decimal': 2.414
}
]
}
print(dct == obj1)
False
Obviously we haven’t provided any information about the dates and decimals, so we just get back the standard JSON types. To do this we need to provide an object hook
function. The object hook takes in any dictionary the JSON parser finds and changes it in any way it sees fit.
def object_hook(dct):
for key, value in dct.items():
try:
dct[key] = datetime.fromisoformat(value)
except:
pass
return dct
Detecting the desired type is obviously a problem. Here I “detected” the type by trying to convert every entry. I could used a regular expression to check first, but I still have to check everything.
However my main problem is that there is no way of telling between a float and a decimal. Rob sad :(
The deserialization now looks like this.
obj2 = json.loads(text, object_hook=object_hook)
print(obj2)
{
'some_text': 'Hello, World!',
'some_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'some_int': 42,
'some_float': 3.14,
'some_decimal': 2.414,
'some_list': [
{
'other_text': 'Hello, World!',
'other_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'other_decimal': 2.414
},
{
'other_text': 'Hello, World!',
'other_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'other_decimal': 2.414
}
]
}
print(dct == obj2)
False
We could put in some business logic to use the key names, but this would require a custom decoder for each data set. We could put some type information in the JSON, but we may not be in control of its generation, and anyway that looks ugly.
Type Annotations
With type annotations we should be able to solve this problem. The data structure we are streaming can be “typed” by using a TypedDict
. in Python 3.7 this can be found in the typing_extensions
package, while in 3.8 it is standard.
from typing import List
try:
from typing import TypedDict
except:
from typing_extensions import TypedDict
class InnerDict(TypedDict):
other_text: str
other_date: datetime
other_decimal: Decimal
class OuterDict(TypedDict):
some_text: str
some_date: datetime
some_int: int
some_float: float
some_decimal: Decimal
some_list: List[InnerDict]
Using the jetblack-serialization
package we can use these type annotations. It relies heavily on the typing_inspect
package.
Lets see what happens when we serialize our data.
from jetblack_serialization.config import SerializerConfig
from jetblack_serialization.json import deserialize, serialize
from stringcase import camelcase, snakecase
config = SerializerConfig(camelcase, snakecase, pretty_print=True)
text = serialize(dct, OuterDict, config)
print(text)
{
"someText": "Hello, World!",
"someDate": "2020-01-29T12:56:13.00Z",
"someInt": 42,
"someFloat": 3.14,
"someDecimal": 2.414,
"someList": [
{
"otherText": "Hello, World!",
"otherDate": "2020-01-29T12:56:13.00Z",
"otherDecimal": 2.414
},
{
"otherText": "Hello, World!",
"otherDate": "2020-01-29T12:56:13.00Z",
"otherDecimal": 2.414
}
]
}
First it’s camel-cased the keys! The SerializerConfig
takes a key serializer and deserializer (using the stringcase package). We can see the date got formatted nicely (the config option also allows for value serializers).
Now lets see how the JSON is deserialized.
obj = deserialize(text, OuterDict, config)
print(obj)
{
'some_text': 'Hello, World!',
'some_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'some_int': 42,
'some_float': 3.14,
'some_decimal': Decimal('2.414'),
'some_list': [
{
'other_text': 'Hello, World!',
'other_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'other_decimal': Decimal('2.414')
},
{
'other_text': 'Hello, World!',
'other_date': datetime.datetime(2020, 1, 29, 12, 56, 13),
'other_decimal': Decimal('2.414')
}
]
}
The keys are now snake-case and the types have been correctly identified, even the decimals. A quick check shows a successful round trip.
print(dct == obj)
True
Rob happy :)
And just for fun, we can serialize into XML.
from typing_extensions import Annotated
from jetblack_serialization.xml import serialize as serialize_xml, XMLEntity
from stringcase import pascalcase
xml_config = SerializerConfig(pascalcase, snakecase, pretty_print=True)
text = serialize_xml(dct, Annotated[OuterDict, XMLEntity('Dict')], xml_config)
print(text)
<Dict>
<SomeText>Hello, World!</SomeText>
<SomeDate>2020-01-29T12:56:13.00Z</SomeDate>
<SomeInt>42</SomeInt>
<SomeFloat>3.14</SomeFloat>
<SomeDecimal>2.414</SomeDecimal>
<SomeList>
<OtherText>Hello, World!</OtherText>
<OtherDate>2020-01-29T12:56:13.00Z</OtherDate>
<OtherDecimal>2.414</OtherDecimal>
</SomeList>
<SomeList>
<OtherText>Hello, World!</OtherText>
<OtherDate>2020-01-29T12:56:13.00Z</OtherDate>
<OtherDecimal>2.414</OtherDecimal>
</SomeList>
</Dict>
As XML requires a tag to wrap the object we needed to put in some extra information which was achieved with the Annotated
tag available through the typing extensions package and described in PEP 593.
I’ve been using this with a REST framework which uses typing to automatically bind variables to reduce the effort to produce the API. You can find that code here.