pandas join by time range

Usually we need to join two tables based on time range.

One table has timestamp, the second table has time_range_start and time_range_end, so we want to join two tables based on timestamp between range_start and range_end.

However this kind of joining on '<=' or '>=' is not supported in pandas at the moment. A work around is to full outer join the two tables and do filtering based on '<=' and '>=' conditions.


Pandas has now provide a cool merge function: pandas.merge_asof

pandas.merge_asof(left, right, on=None, left_on=None, right_on=None, left_index=False, right_index=False, by=None, left_by=None, right_by=None, suffixes=('_x', '_y'), tolerance=None, allow_exact_matches=True, direction='backward')[source]

Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys.

Both DataFrames must be sorted by the key.

For each row in the left DataFrame:


The backward join direction simulates joining two time series where  one timestamp must fall within the lag and current of other timestamp. 

The forward join direction similarly simulates joining two time series where  one timestamp must fall within the current and lead of other timestamp. 

This also depends on which dataset is left or right.

The nearest is also useful when it assign rows based on the closest distance.


Range join example:

import pandas as pd

df1 = pd.DataFrame([['A', 1], ['B', 5], ['C', 10]], columns=['name', 'time'])

df2 = pd.DataFrame([['range1', 0, 3], ['range2', 6, 11]], columns=['range', 'start', 'end'])

print(df1)

print(df2)

#compare time with range start and range end separately

df = pd.merge_asof(df1, df2, left_on='time', right_on='start', direction='backward')

df = pd.merge_asof(df, df2, left_on='time', right_on='end', direction='forward')

#if time is within the range, then range_x must be the same as range_y

df = df.loc[df['range_x'] == df['range_y']][['name', 'time', 'range_x', 'start_x', 'end_y']] 

print(df)


 name  time

0    A     1

1    B     5

2    C    10

    range  start  end

0  range1      0    3

1  range2      6   11

  name  time range_x  start_x  end_y

0    A     1  range1        0      3

2    C    10  range2        6     11