InfluxDB

InfluxDB is an open source Time Series Database from InfluxData.

The time series database is designed and optimized for time series data only. For example it assumes insert only, and optimizes for high performance write and read operations. Update and delete are supported but probably a lot less efficient. InfluxDb executes a lot of things in memory and periodically flushes to disk.

There two types of time series data, regular interval data such as temperature readings every 1 minute, and irregular data (event data) which can show up at any time as the event happens. Some other time series databases are for regular interval data only but InfluxDB supports both.

There is no concept of 'table' as in relational databases. InfluxDB uses 'measurement' to group a set of data. E.g. Stock Price can be a measurement which holds a group of stock time series data. Also there is no concept of column. InfluxDB uses 'tag' to describe / categorize the data. E.g. Nasdaq / NYSE can be a tag of the stock data to describe which exchange is from. A time series can have multile tags which is called a tagset, e.g. exchange = Nasdaq, ticker = AAPL. A time series is a series of time stamped values. The name of the value is called a 'field', e.g. Price is a field. The field value itself is called 'value', e.g. 20. There can be multiple fields, such as Price, Volume, etc. so they are called fieldset. Finally the important thing is Time, which is when the field value is recorded/loaded.

Right hand side is an illustration of the stock example. A measurement can hold multiple series. Each series is described/identified by the tags. Each series can have multiple fields and field values. Each record is timestamped.

Installing InfluxDb is simple. On windows, simply download and unzip the file. There is only one exe file in the folder. Run the exe file from command line which starts the service. Access the service through UI at localhost:8086. For the first time, it will ask to put in organization name, bucket (database) name and passwords. After that, it enters the interface as below. Also it can installed from Docker, Linux, etc.

Very handy interface. The Data section provides data file upload and examples of writing/reading data through Python/Scala/etc. Also Telegraf plugins for connecting to numerous systems. Note it uses its own line protocol to represent a data point, e.g. "price,commodity=copper value=125". The Explore helps to query data. The Books is pretty cool, which is like a Jupyter notebook that you can write data pipelines and schedule it to run using Tasks. Boards and Alerts provide dashboard and monitoring.

Here is a helloworld example of inserting and querying data in InfluxDB using Python. To do that, install InfluxDB's python library first.

pip install influxdb-client

Thats all and below is the program.

from datetime import datetime

from influxdb_client import InfluxDBClient, Point, WritePrecision

from influxdb_client.client.write_api import SYNCHRONOUS

# You can generate an API token from the "API Tokens Tab" in the UI

token = "xxx"

org = "org name"

bucket = "mydb"

client = InfluxDBClient(url="http://localhost:8086", token=token, org=org)

write_api = client.write_api(write_options=SYNCHRONOUS)

#Option 1: Use InfluxDB Line Protocol to write data

data = "price,commodity=copper value=123"

write_api.write(bucket, org, data)

#Option 2: Use a Data Point to write data

point = Point("price") \

.tag("commodity", "copper") \

.field("value", 124.0) \

.time(datetime.utcnow(), WritePrecision.NS)

write_api.write(bucket, org, point)

#Option 3: Use a Batch Sequence to write data

sequence = ["price,commodity=copper value=125",

"price,commodity=copper value=127"]

write_api.write(bucket, org, sequence)

#query the inserted data

query = '''from(bucket: "mydb")

|> range(start: -1h)

|> filter(fn: (r) => r["_measurement"] == "price")

|> sort(columns: ["_time"])'''

#note there is no table or such thing in InfluxDb

#the table concept here is just for holding the returned data in a table structure

#so here there is no table schema information, simply a collection of records

#each record is a point in time series

tables = client.query_api().query(query, org=org)

for table in tables:

for record in table.records:

print(record)

client.close()

The result looks like below.

{'result': '_result',

'table': 0,

'_start': datetime.datetime(2021, 11, 25, 4, 42, 2, 228102, tzinfo=tzutc()),

'_stop': datetime.datetime(2021, 11, 25, 5, 42, 2, 228102, tzinfo=tzutc()),

'_time': datetime.datetime(2021, 11, 25, 5, 21, 20, 707371, tzinfo=tzutc()),

'_value': 127.0,

'_field': 'value',

'_measurement': 'price',

'commodity': 'copper'}