Welcome to the DjaoDjin Blog!

A place to share experiences in building Software-as-a-Service.

Date/time, back and forth between Javascript, Django and PostgreSQL

by Dima Knivets on Tue, 17 Apr 2018

DjaoDjin caters to micro-SaaS products. These are specialized and local products. As an example, A1Ceus targets compliance with New York State professional certification requirements. Usually the teams behind those websites are small (1-3 people), with their thumb on the pulse on daily business numbers. Reporting graphs in UTC did not cut it. Questions kept piling up in the customer support inbox about discrepancies between what those micro-SaaS entrepreneurs were experiencing and what the report charts were saying. Reports had to be presented midnight to midnight local time. Here is the journey of what that meant technically to show revenue reports in local time.

This is a guest post from Dima Knivets, a freelance programmer that was instrumental on getting this datetime/timezone puzzle solved correctly. Please have a look at the first part on his blog, which covers date, time and time zone basics in detail.

A brief overview of the stack

Our core tech is djaodjin-saas, a pluggable Django app. It adds a multitude of API endpoints, among which there are many that return transaction data. The data is stored in PostgreSQL, and API endpoints are powered by Django Rest Framework. On the client side, built on top of Angular.js, this transaction data is used to display graphs.

Client side

On page load, Angular.js makes a GET request like this:

/api/metrics/company_name/funds?ends_at=2018-04-18T00:00:00.000Z

The ends_at parameter is expected to be an ISO 8601 formatted timestamp in UTC. All DjaoDjin APIs are using ISO 8601. Stripe decided to use a UNIX timestamp in their APIs instead, but we found it complicated to debug date/time-related issues.

This request fetches transactions created during a 12-month period, counting backwards from today’s date,(2018-04-18) until 2017-04-01. In this case it's not a full 12-month period, because the current month hasn’t ended yet. However, a user then can pick from previous months to have a complete 12-month period, for example a period from 2018-03-01 until 2017-03-01.

However, this URL format is deprecated, so it is now required to pass an additional timezone parameter besides the ends_at parameter. The reason behind this will become clear by the end of this post. Meanwhile, here’s how we get the time zone of the client.

Initially, we construct a Date object (which is then passed as ends_at parameter). This object only has the UTC offset of the browser. However, a browser doesn’t know the exact time zone name. In modern browsers there are APIs which standardize the process of obtaining the time zone name. We use Moment.js with Moment Timezone, which leverages these new APIs to guess the client’s time zone name:

moment.tz.guess() // "Europe/Kiev"

We then pass this string as a timezone parameter, and a constructed Date object as an ends_at parameter, when making an API call. When Angular.js makes an actual AJAX request the toISOString method of Date object is called, which produces an ISO 8601 timestamp.

Django Rest Framework side

When Django Rest Framework (aka DRF) receives a request, we parse the timestamp with django.utils.dateparse.parse_datetime function, the return value of which is an aware datetime object. Because ISO 8601 permits only UTC or UTC offset to be specified, the tzinfo attribute of this object is either a UTC (pytz.UTC) or a UTC offset (django.utils.timezone.FixedOffset).

Once we have an end date, we call saas.managers.metrics.month_periods to generate a list of 12 datetime objects, which will be used to query the transactions later. Here is the original source of the function which handles UTC-only dates:

def month_periods(nb_months=12, from_date=None, step_months=1):
    dates = []
    from_date = datetime_or_now(from_date)
    dates.append(from_date)
    last = datetime(
        day=from_date.day, month=from_date.month, year=from_date.year,
        tzinfo=utc)
    if last.day != 1:
        last = datetime(day=1, month=last.month, year=last.year, tzinfo=utc)
        dates.append(last)
        nb_months = nb_months - 1
    for _ in range(0, nb_months, step_months):
        year = last.year
        month = last.month - step_months
        if month < 1:
            # integer division
            year = last.year + month // 12
            assert isinstance(year, six.integer_types)
            if month % 12 == 0:
                year -= 1
                month = 12
            else:
                month = month % 12
        last = datetime(day=1, month=month, year=year, tzinfo=utc)
        dates.append(last)
    dates.reverse()
    return dates

The problem with the current implementation is that when constructing new datetime objects, the time zone is set to UTC:

last = datetime(day=1, month=month, year=year, tzinfo=utc)

This is a bug, because if a HTTP request is made with a timestamp in UTC offset, the first datetime object will have a UTC offset time zone, while the rest of the datetime objects will still be in UTC. So, we have a potential 24-hour window for transactions with the wrong date to creep in. A solution to this problem is to use the time zone of the first datetime object when constructing the rest of the datetime objects. In our situation, it doesn't make sense to convert the first object to UTC too; this is because when we construct datetime objects in the original time zone, time is irrelevant when constructing subsequent datetime objects, because each period begins at 00:00. However, if we do convert to UTC, the time is no longer 00:00, and the day is potentially also different — this complicates the construction of periods. So, it is much easier to stick with whatever time zone offset was passed with the request.

orig_tz = from_date.tzinfo
# later in loop
last = datetime(day=1, month=month, year=year, tzinfo=orig_tz)

The other problem is that the user who made the request might be located in a time zone which has DST. If the time zone has DST, its UTC offset will not be constant during the year. When generating datetime objects we use the same UTC offset for each of them; as a consequence, for dates that fall into a DST period, the actual local time will be off by one hour. To fix this we need the time zone name, which we'll use to get the DST rules for this specific time zone.

There is no way to get the correct offset for a particular date with DST without a time zone name, and we can't get the time zone name based on offset, due to its changing nature. That means we need to send the time zone from the client somehow. Unfortunately, ISO 8601 doesn't specify a way to include a time zone name with the timestamp, only an offset. As a solution, we'll pass a separate timezone parameter from the client. In cases where a client doesn't pass a time zone string or passes a wrong string, we'll have to fall back to using UTC offsets, ignoring DST. We might actually require the timezone parameter in the future to prevent potential date and time errors. Let's modify the month_periods function to accept a time zone string from the view:

def month_periods(nb_months=12, from_date=None, step_months=1, tz=None):
...

OK, now that we have a time zone string, we can solve the last issue with this function. When constructing datetime objects, we need to modify the offset based on the DST, so that the local time is always equal to 00:00. To do this, we will use a pytz package, which is an implementation of IANA database in Python. Let's create a helper function which parses a time zone string into a tzinfo object:

from pytz import timezone, UnknownTimeZoneError

def parse_tz(tz):
  if tz:
    try:
      return timezone(tz)
    except UnknownTimeZoneError:
      pass

Going back to the month_periods function, it’s necessary to parse the time zone string first. If what we've got is an actual time zone, convert the from_date previous tzinfo object to the new time zone object; otherwise just leave it with the original UTC offset object. Here's the code that converts the first period's tzinfo object to the newly parsed time zone object:

tz_ob = parse_tz(tz)
if tz_ob:
  from_date = from_date.astimezone(tz_ob)

When dealing with the rest of the periods, we can construct a naive datetime (an object without a tzinfo attribute) first. Then if we have the time zone object, call a tz_ob.localize method which adds a time zone to the date (making it aware) and applies a correct UTC offset based on DST rules. If we don't have a time zone object, we'll fall back to using UTC offset (this ignores DST). Let's wrap this into a helper function too:

def _handle_tz(dt, tz_ob, orig_tz):
  if tz_ob:
    # adding timezone info
    # this also accounts for DST
    loc = tz_ob.localize(dt)
  else:
    # adding UTC offset only
    loc = last.replace(tzinfo=orig_tz)
  return loc

The final month_periods function will look like this:

def month_periods(nb_months=12, from_date=None, step_months=1, tz=None):
    dates = []
    from_date = datetime_or_now(from_date)
    orig_tz = from_date.tzinfo
    tz_ob = parse_tz(tz)
    if tz_ob:
        from_date = from_date.astimezone(tz_ob)
    dates.append(from_date)
    last = datetime(day=from_date.day, month=from_date.month, year=from_date.year)
    last = _handle_tz(last, tz_ob, orig_tz)
    if last.day != 1:
        last = datetime(day=1, month=last.month, year=last.year)
        last = _handle_tz(last, tz_ob, orig_tz)
        dates.append(last)
        nb_months = nb_months - 1
    for _ in range(0, nb_months, step_months):
        year = last.year
        month = last.month - step_months
        if month < 1:
            # integer division
            year = last.year + month // 12
            assert isinstance(year, six.integer_types)
            if month % 12 == 0:
                year -= 1
                month = 12
            else:
                month = month % 12
        last = datetime(day=1, month=month, year=year)
        last = _handle_tz(last, tz_ob, orig_tz)
        dates.append(last)
    dates.reverse()

So, when we call this function we'll have the following result:

>>> from django.utils.dateparse import parse_datetime
>>> from pprint import pprint # pretty print
>>> from_date = parse_datetime('2018-04-01T00:00:00+03:00')
>>> tz = 'Europe/Kiev'
>>> dates = month_periods(from_date=from_date, tz=tz)
>>> pprint(dates)
[datetime.datetime(2017, 4, 1, 0, 0, tzinfo=),
 datetime.datetime(2017, 5, 1, 0, 0, tzinfo=),
 datetime.datetime(2017, 6, 1, 0, 0, tzinfo=),
 datetime.datetime(2017, 7, 1, 0, 0, tzinfo=),
 datetime.datetime(2017, 8, 1, 0, 0, tzinfo=),
 datetime.datetime(2017, 9, 1, 0, 0, tzinfo=),
 datetime.datetime(2017, 10, 1, 0, 0, tzinfo=),
 datetime.datetime(2017, 11, 1, 0, 0, tzinfo=),
 datetime.datetime(2017, 12, 1, 0, 0, tzinfo=),
 datetime.datetime(2018, 1, 1, 0, 0, tzinfo=),
 datetime.datetime(2018, 2, 1, 0, 0, tzinfo=),
 datetime.datetime(2018, 3, 1, 0, 0, tzinfo=),
 datetime.datetime(2018, 4, 1, 0, 0, tzinfo=)]

Awesome. Have you noticed how the time zone offset changes during the year, while time stays the same? That's what we were trying to achieve with localize method.

Now that we have the list of correct periods, they are then used to make database queries. Once we have the transactions in our views, they are passed to DRF serializers, which encode them to produce a JSON response. DRF serializers don't modify datetime objects and their time zone info, so we don't need to worry about that.

Let's have a look at how Django ORM and PostgreSQL handles time zones in detail.

Django ORM time zone handling

Internally, Django uses pytz package. Django time zone behavior depends on the USE_TZ setting. If your application is serious about time you should always have time zone support enabled and work with aware datetime objects only, otherwise errors will inevitably creep in.

Django ORM will not modify aware datetime objects when constructing an SQL query, so it is pretty safe to pass them as arguments to the queries. When an ORM compiles a query, it basically calls the __str__() method for each of the datetime fields, which produces an ISO 8601 timestamp. This applies to both the queries produced by ORM and to raw SQL queries crafted by hand. So, the final SQL query that will be sent to PostgreSQL will have ISO 8601 timestamps, with whatever UTC offset was in the tzinfo attribute. Here's an example:

SELECT COUNT(DISTINCT(prev.dest_organization_id)),
       SUM(prev.dest_amount)
FROM saas_transaction prev
LEFT OUTER JOIN (
    SELECT distinct(dest_organization_id)
    FROM saas_transaction
    WHERE created_at >= '2017-05-01 00:00:00+03:00'
    AND created_at < '2017-06-01 00:00:00+03:00'
    AND orig_organization_id = '2'
    AND orig_account = 'Receivable'
) curr
ON prev.dest_organization_id = curr.dest_organization_id
WHERE prev.created_at >= '2017-04-01 00:00:00+03:00'
    AND prev.created_at < '2017-05-01 00:00:00+03:00'
    AND prev.orig_organization_id = '2'
    AND prev.orig_account = 'Receivable'
    AND curr.dest_organization_id IS NULL;

When Django establishes a connection with PostgreSQL it specifies a timezone parameter, which is used by PostgreSQL to determine which time zone the timestamps should be returned in. If USE_TZ is set to True, the value of the parameter will be set to UTC by default. This means that ORM will receive the response from PostgreSQL with timestamps in UTC, and that the response will be used to build a model with datetime fields in UTC. To construct model datetime fields a parse_datetime method is used, which parses a UTC timestamp from the database and constructs an aware datetime object in UTC. As a result, we don't have the same aware datetime object that we stored or used to query initially, because datetime data is stored and retrieved in UTC, and neither ORM nor PostgreSQL have any knowledge of time zones. (Actually, Django converts those objects to a local time zone when they are used in Django Forms or templates, but this is not relevant to our situation.)

At this point, we have a list of transaction rows with datetime fields in UTC. We can either convert the dates back to the original time zone first, or pass them back to DRF.

Time zones in PostgreSQL

In PostgreSQL all date and time data is stored internally in UTC.

On input, PostgreSQL accepts an ISO 8601 timestamp with a UTC offset, or with a time zone name, even though it is not a standard ISO 8601 string. In our case, Django and Python will produce a timestamp with UTC or UTC offset. When no time zone info is specified in a timestamp, the time zone will be taken from the timezone parameter (in our case it is configured by Django upon connection, and set to UTC by default). Every timestamp with time zone or UTC offset used in SQL queries is converted to UTC by PostgreSQL internally.

Before outputting the data, PostgreSQL converts the timestamps from UTC to the time zone set by the timezone parameter (in our case it is set to UTC, so no conversion is required). Here's an example:

db=> SHOW TimeZone;
 TimeZone
----------
 UTC
(1 row)

db=> INSERT INTO transactions (created_at) VALUES (TIMESTAMP WITH TIME ZONE '2018-04-19 03:00:00 Europe/Kiev');
INSERT 0 1
db=> SELECT created_at FROM transactions;
        created_at
------------------------
 2018-04-19 00:00:00+00
(1 row)

db=> SET TimeZone TO 'Europe/Kiev';
SET
db=> SELECT created_at FROM transactions;
        created_at
------------------------
 2018-04-19 03:00:00+03
(1 row)

Conclusion

In this post we showed how to present monthly SaaS reports midnight-to-midnight, in local time and through DST. Finally, we described in detail how each piece of software in the stack handles the time zones during this process. Clone djaodjin-saas on GitHub, it’s free! If you need help with your SaaS product, I am available for hire.

More to read

If you are looking for more posts on handling various issues in a multi-language stack, Django Rest Framework, AngularJS and permissions and Continuous Integration for a Javascript-heavy Django Site are worth reading next.

More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a SaaS hosting platform.