InfluxDB
InfluxDB is an open source Time Series Database from InfluxData.
The time series database is designed and optimized for time series data only. For example it assumes insert only, and optimizes for high performance write and read operations. Update and delete are supported but probably a lot less efficient. InfluxDb executes a lot of things in memory and periodically flushes to disk.
There two types of time series data, regular interval data such as temperature readings every 1 minute, and irregular data (event data) which can show up at any time as the event happens. Some other time series databases are for regular interval data only but InfluxDB supports both.
There is no concept of 'table' as in relational databases. InfluxDB uses 'measurement' to group a set of data. E.g. Stock Price can be a measurement which holds a group of stock time series data. Also there is no concept of column. InfluxDB uses 'tag' to describe / categorize the data. E.g. Nasdaq / NYSE can be a tag of the stock data to describe which exchange is from. A time series can have multile tags which is called a tagset, e.g. exchange = Nasdaq, ticker = AAPL. A time series is a series of time stamped values. The name of the value is called a 'field', e.g. Price is a field. The field value itself is called 'value', e.g. 20. There can be multiple fields, such as Price, Volume, etc. so they are called fieldset. Finally the important thing is Time, which is when the field value is recorded/loaded.
Right hand side is an illustration of the stock example. A measurement can hold multiple series. Each series is described/identified by the tags. Each series can have multiple fields and field values. Each record is timestamped.
Installing InfluxDb is simple. On windows, simply download and unzip the file. There is only one exe file in the folder. Run the exe file from command line which starts the service. Access the service through UI at localhost:8086. For the first time, it will ask to put in organization name, bucket (database) name and passwords. After that, it enters the interface as below. Also it can installed from Docker, Linux, etc.
Very handy interface. The Data section provides data file upload and examples of writing/reading data through Python/Scala/etc. Also Telegraf plugins for connecting to numerous systems. Note it uses its own line protocol to represent a data point, e.g. "price,commodity=copper value=125". The Explore helps to query data. The Books is pretty cool, which is like a Jupyter notebook that you can write data pipelines and schedule it to run using Tasks. Boards and Alerts provide dashboard and monitoring.
Here is a helloworld example of inserting and querying data in InfluxDB using Python. To do that, install InfluxDB's python library first.
pip install influxdb-client
Thats all and below is the program.
from datetime import datetime
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
# You can generate an API token from the "API Tokens Tab" in the UI
token = "xxx"
org = "org name"
bucket = "mydb"
client = InfluxDBClient(url="http://localhost:8086", token=token, org=org)
write_api = client.write_api(write_options=SYNCHRONOUS)
#Option 1: Use InfluxDB Line Protocol to write data
data = "price,commodity=copper value=123"
write_api.write(bucket, org, data)
#Option 2: Use a Data Point to write data
point = Point("price") \
.tag("commodity", "copper") \
.field("value", 124.0) \
.time(datetime.utcnow(), WritePrecision.NS)
write_api.write(bucket, org, point)
#Option 3: Use a Batch Sequence to write data
sequence = ["price,commodity=copper value=125",
"price,commodity=copper value=127"]
write_api.write(bucket, org, sequence)
#query the inserted data
query = '''from(bucket: "mydb")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "price")
|> sort(columns: ["_time"])'''
#note there is no table or such thing in InfluxDb
#the table concept here is just for holding the returned data in a table structure
#so here there is no table schema information, simply a collection of records
#each record is a point in time series
tables = client.query_api().query(query, org=org)
for table in tables:
for record in table.records:
print(record)
client.close()
The result looks like below.
{'result': '_result',
'table': 0,
'_start': datetime.datetime(2021, 11, 25, 4, 42, 2, 228102, tzinfo=tzutc()),
'_stop': datetime.datetime(2021, 11, 25, 5, 42, 2, 228102, tzinfo=tzutc()),
'_time': datetime.datetime(2021, 11, 25, 5, 21, 20, 707371, tzinfo=tzutc()),
'_value': 127.0,
'_field': 'value',
'_measurement': 'price',
'commodity': 'copper'}