Date/time, back and forth between Javascript, Django and PostgreSQL
by Dima Knivets on Tue, 17 Apr 2018DjaoDjin caters to micro-SaaS products. These are specialized and local products. As an example, A1Ceus targets compliance with New York State professional certification requirements. Usually the teams behind those websites are small (1-3 people), with their thumb on the pulse on daily business numbers. Reporting graphs in UTC did not cut it. Questions kept piling up in the customer support inbox about discrepancies between what those micro-SaaS entrepreneurs were experiencing and what the report charts were saying. Reports had to be presented midnight to midnight local time. Here is the journey of what that meant technically to show revenue reports in local time.
This is a guest post from Dima Knivets, a freelance programmer that was instrumental on getting this datetime/timezone puzzle solved correctly. Please have a look at the first part on his blog, which covers date, time and time zone basics in detail.
A brief overview of the stack
Our core tech is djaodjin-saas, a pluggable Django app. It adds a multitude of API endpoints, among which there are many that return transaction data. The data is stored in PostgreSQL, and API endpoints are powered by Django Rest Framework. On the client side, built on top of Angular.js, this transaction data is used to display graphs.
Client side
On page load, Angular.js makes a GET request like this:
/api/metrics/company_name/funds?ends_at=2018-04-18T00:00:00.000Z
The ends_at
parameter is expected to be an ISO 8601 formatted
timestamp in UTC. All DjaoDjin APIs are using ISO 8601. Stripe decided to use
a UNIX timestamp in their APIs instead, but we found it complicated to debug
date/time-related issues.
This request fetches transactions created during a 12-month period, counting backwards from today’s date,(2018-04-18) until 2017-04-01. In this case it's not a full 12-month period, because the current month hasn’t ended yet. However, a user then can pick from previous months to have a complete 12-month period, for example a period from 2018-03-01 until 2017-03-01.
However, this URL format is deprecated, so it is now required to pass an
additional timezone
parameter besides the ends_at
parameter. The reason behind this will become clear by the end of this post.
Meanwhile, here’s how we get the time zone of the client.
Initially, we construct a Date
object (which is then passed as
ends_at
parameter). This object only has the UTC offset
of the browser. However, a browser doesn’t know the exact time zone name.
In modern browsers there are APIs which standardize the process of obtaining
the time zone name. We use Moment.js
with Moment Timezone,
which leverages these new APIs to guess the client’s time zone name:
moment.tz.guess() // "Europe/Kiev"
We then pass this string as a timezone
parameter, and
a constructed Date
object as an ends_at
parameter,
when making an API call. When Angular.js makes an
actual AJAX request the toISOString
method of Date
object is called, which produces an ISO 8601 timestamp.
Django Rest Framework side
When Django Rest Framework
(aka DRF) receives a request, we parse the timestamp with
django.utils.dateparse.parse_datetime
function,
the return value of which is an aware datetime
object.
Because ISO 8601 permits only UTC or UTC offset to be specified,
the tzinfo
attribute of this object is either a UTC
(pytz.UTC
) or a UTC offset (django.utils.timezone.FixedOffset
).
Once we have an end date, we call saas.managers.metrics.month_periods
to generate a list of 12 datetime
objects, which will be used to
query the transactions later. Here is the original source of the function which
handles UTC-only dates:
def month_periods(nb_months=12, from_date=None, step_months=1): dates = [] from_date = datetime_or_now(from_date) dates.append(from_date) last = datetime( day=from_date.day, month=from_date.month, year=from_date.year, tzinfo=utc) if last.day != 1: last = datetime(day=1, month=last.month, year=last.year, tzinfo=utc) dates.append(last) nb_months = nb_months - 1 for _ in range(0, nb_months, step_months): year = last.year month = last.month - step_months if month < 1: # integer division year = last.year + month // 12 assert isinstance(year, six.integer_types) if month % 12 == 0: year -= 1 month = 12 else: month = month % 12 last = datetime(day=1, month=month, year=year, tzinfo=utc) dates.append(last) dates.reverse() return dates
The problem with the current implementation is that when constructing new
datetime
objects, the time zone is set to UTC:
last = datetime(day=1, month=month, year=year, tzinfo=utc)
This is a bug, because if a HTTP request is made with a timestamp in UTC offset,
the first datetime
object will have a UTC offset time zone,
while the rest of the datetime
objects will still be in UTC.
So, we have a potential 24-hour window for transactions with the wrong date
to creep in. A solution to this problem is to use the time zone of the first
datetime
object when constructing the rest of
the datetime
objects. In our situation, it doesn't make sense
to convert the first object to UTC too; this is because when we construct
datetime
objects in the original time zone, time
is irrelevant when constructing subsequent datetime
objects,
because each period begins at 00:00. However, if we do convert to UTC, the
time is no longer 00:00, and the day is potentially also different — this
complicates the construction of periods. So, it is much easier to stick with
whatever time zone offset was passed with the request.
orig_tz = from_date.tzinfo # later in loop last = datetime(day=1, month=month, year=year, tzinfo=orig_tz)
The other problem is that the user who made the request might be located
in a time zone which has DST. If the time zone has DST, its UTC offset will
not be constant during the year. When generating datetime
objects we use
the same UTC offset for each of them; as a consequence, for dates that fall
into a DST period, the actual local time will be off by one hour. To fix this
we need the time zone name, which we'll use to get the DST rules for this
specific time zone.
There is no way to get the correct offset for a particular date with DST without a time zone name, and we can't get the time zone name based on offset, due to its changing nature. That means we need to send the time zone from the client somehow. Unfortunately, ISO 8601 doesn't specify a way to include a time zone name with the timestamp, only an offset. As a solution, we'll pass a separate timezone parameter from the client. In cases where a client doesn't pass a time zone string or passes a wrong string, we'll have to fall back to using UTC offsets, ignoring DST. We might actually require the timezone parameter in the future to prevent potential date and time errors. Let's modify the month_periods function to accept a time zone string from the view:
def month_periods(nb_months=12, from_date=None, step_months=1, tz=None): ...
OK, now that we have a time zone string, we can solve the last issue with this
function. When constructing datetime
objects, we need to modify the offset
based on the DST, so that the local time is always equal to 00:00. To do this,
we will use a pytz
package, which is an implementation of IANA database in Python. Let's create
a helper function which parses a time zone string into a tzinfo
object:
from pytz import timezone, UnknownTimeZoneError def parse_tz(tz): if tz: try: return timezone(tz) except UnknownTimeZoneError: pass
Going back to the month_periods
function, it’s necessary to parse
the time zone string first. If what we've got is an actual time zone, convert
the from_date
previous tzinfo object to the new time zone object;
otherwise just leave it with the original UTC offset object. Here's the code
that converts the first period's tzinfo
object to the newly parsed
time zone object:
tz_ob = parse_tz(tz) if tz_ob: from_date = from_date.astimezone(tz_ob)
When dealing with the rest of the periods, we can construct a naive
datetime
(an object without a tzinfo
attribute) first.
Then if we have the time zone object, call a tz_ob.localize method which adds
a time zone to the date (making it aware) and applies a correct UTC offset
based on DST rules. If we don't have a time zone object, we'll fall back
to using UTC offset (this ignores DST). Let's wrap this into a helper function
too:
def _handle_tz(dt, tz_ob, orig_tz): if tz_ob: # adding timezone info # this also accounts for DST loc = tz_ob.localize(dt) else: # adding UTC offset only loc = last.replace(tzinfo=orig_tz) return loc
The final month_periods function will look like this:
def month_periods(nb_months=12, from_date=None, step_months=1, tz=None): dates = [] from_date = datetime_or_now(from_date) orig_tz = from_date.tzinfo tz_ob = parse_tz(tz) if tz_ob: from_date = from_date.astimezone(tz_ob) dates.append(from_date) last = datetime(day=from_date.day, month=from_date.month, year=from_date.year) last = _handle_tz(last, tz_ob, orig_tz) if last.day != 1: last = datetime(day=1, month=last.month, year=last.year) last = _handle_tz(last, tz_ob, orig_tz) dates.append(last) nb_months = nb_months - 1 for _ in range(0, nb_months, step_months): year = last.year month = last.month - step_months if month < 1: # integer division year = last.year + month // 12 assert isinstance(year, six.integer_types) if month % 12 == 0: year -= 1 month = 12 else: month = month % 12 last = datetime(day=1, month=month, year=year) last = _handle_tz(last, tz_ob, orig_tz) dates.append(last) dates.reverse()
So, when we call this function we'll have the following result:
>>> from django.utils.dateparse import parse_datetime >>> from pprint import pprint # pretty print >>> from_date = parse_datetime('2018-04-01T00:00:00+03:00') >>> tz = 'Europe/Kiev' >>> dates = month_periods(from_date=from_date, tz=tz) >>> pprint(dates) [datetime.datetime(2017, 4, 1, 0, 0, tzinfo=), datetime.datetime(2017, 5, 1, 0, 0, tzinfo= ), datetime.datetime(2017, 6, 1, 0, 0, tzinfo= ), datetime.datetime(2017, 7, 1, 0, 0, tzinfo= ), datetime.datetime(2017, 8, 1, 0, 0, tzinfo= ), datetime.datetime(2017, 9, 1, 0, 0, tzinfo= ), datetime.datetime(2017, 10, 1, 0, 0, tzinfo= ), datetime.datetime(2017, 11, 1, 0, 0, tzinfo= ), datetime.datetime(2017, 12, 1, 0, 0, tzinfo= ), datetime.datetime(2018, 1, 1, 0, 0, tzinfo= ), datetime.datetime(2018, 2, 1, 0, 0, tzinfo= ), datetime.datetime(2018, 3, 1, 0, 0, tzinfo= ), datetime.datetime(2018, 4, 1, 0, 0, tzinfo= )]
Awesome. Have you noticed how the time zone offset changes during the year, while time stays the same? That's what we were trying to achieve with localize method.
Now that we have the list of correct periods, they are then used to make
database queries. Once we have the transactions in our views, they are passed
to DRF serializers, which encode them to produce a JSON response. DRF
serializers don't modify datetime
objects and their time zone info, so we don't
need to worry about that.
Let's have a look at how Django ORM and PostgreSQL handles time zones in detail.
Django ORM time zone handling
Internally, Django uses pytz package. Django time zone behavior depends on the
USE_TZ setting.
If your application is serious about time you should always
have time zone support enabled and work with aware datetime
objects only, otherwise errors will inevitably creep in.
Django ORM will not modify aware datetime
objects
when constructing an SQL query, so it is pretty safe to pass them as arguments
to the queries. When an ORM compiles a query, it basically calls
the __str__()
method for each of the datetime fields, which
produces an ISO 8601 timestamp. This applies to both the queries produced
by ORM and to raw SQL queries crafted by hand.
So, the final SQL query that will be sent to PostgreSQL will have ISO 8601
timestamps, with whatever UTC offset was in the tzinfo attribute. Here's
an example:
SELECT COUNT(DISTINCT(prev.dest_organization_id)), SUM(prev.dest_amount) FROM saas_transaction prev LEFT OUTER JOIN ( SELECT distinct(dest_organization_id) FROM saas_transaction WHERE created_at >= '2017-05-01 00:00:00+03:00' AND created_at < '2017-06-01 00:00:00+03:00' AND orig_organization_id = '2' AND orig_account = 'Receivable' ) curr ON prev.dest_organization_id = curr.dest_organization_id WHERE prev.created_at >= '2017-04-01 00:00:00+03:00' AND prev.created_at < '2017-05-01 00:00:00+03:00' AND prev.orig_organization_id = '2' AND prev.orig_account = 'Receivable' AND curr.dest_organization_id IS NULL;
When Django establishes a connection with PostgreSQL it specifies a timezone
parameter, which is used by PostgreSQL to determine which time zone the
timestamps should be returned in. If USE_TZ
is set to
True
, the value of the parameter will be set to UTC by default.
This means that ORM will receive the response from PostgreSQL with timestamps
in UTC, and that the response will be used to build a model with datetime
fields in UTC. To construct model datetime fields a parse_datetime method is
used, which parses a UTC timestamp from the database and constructs an aware
datetime
object in UTC. As a result, we don't have the same aware
datetime
object that we stored or used to query initially, because
datetime data is stored and retrieved in UTC, and neither ORM nor PostgreSQL
have any knowledge of time zones. (Actually, Django converts those objects
to a local time zone when they are used in Django Forms or templates, but this
is not relevant to our situation.)
At this point, we have a list of transaction rows with datetime fields in UTC. We can either convert the dates back to the original time zone first, or pass them back to DRF.
Time zones in PostgreSQL
In PostgreSQL all date and time data is stored internally in UTC.
On input, PostgreSQL accepts an ISO 8601 timestamp with a UTC offset, or with a time zone name, even though it is not a standard ISO 8601 string. In our case, Django and Python will produce a timestamp with UTC or UTC offset. When no time zone info is specified in a timestamp, the time zone will be taken from the timezone parameter (in our case it is configured by Django upon connection, and set to UTC by default). Every timestamp with time zone or UTC offset used in SQL queries is converted to UTC by PostgreSQL internally.
Before outputting the data, PostgreSQL converts the timestamps from UTC to the time zone set by the timezone parameter (in our case it is set to UTC, so no conversion is required). Here's an example:
db=> SHOW TimeZone; TimeZone ---------- UTC (1 row) db=> INSERT INTO transactions (created_at) VALUES (TIMESTAMP WITH TIME ZONE '2018-04-19 03:00:00 Europe/Kiev'); INSERT 0 1 db=> SELECT created_at FROM transactions; created_at ------------------------ 2018-04-19 00:00:00+00 (1 row) db=> SET TimeZone TO 'Europe/Kiev'; SET db=> SELECT created_at FROM transactions; created_at ------------------------ 2018-04-19 03:00:00+03 (1 row)
Conclusion
In this post we showed how to present monthly SaaS reports midnight-to-midnight, in local time and through DST. Finally, we described in detail how each piece of software in the stack handles the time zones during this process. Clone djaodjin-saas on GitHub, it’s free! If you need help with your SaaS product, I am available for hire.
More to read
If you are looking for more posts on handling various issues in a multi-language stack, Integrating Django i18n with Jinja2 and Vue.js and Django Rest Framework, AngularJS and permissions are worth reading next.
More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a SaaS hosting platform.