Xml and More: InfluxDB―Knowing Its Key Concepts

InfluxDB is a fast time-series database distributed under an open source license with commercial support. It allows for precision to a nanosecond.

Design Goals of InfluxDB

The original design goals of InfluxDB include:^[1]

Simple to install and manage
No external dependencies like Zookeeper and Hadoop
HTTP(s) interface for reading and writing data
Horizontally scalable
On disk and in memory

Most data is cold

Compute percentiles and other functions on the fly
Downsample data on different windows of time

Time Series Database

A Time Series Database (TSDB) is a database optimized for time series data. Time series data are simply measurements or events that are things you want to ask questions about, visualize, or summarize over time.

To illustrate the concepts of InfluxDB, we use below sample data (Table 1) with a measurement named census in it, which shows the number of butterflies and honeybees counted by two scientists (langstroth and perpetua) in two locations (location 1 and location 2) over the time period from August 18, 2015 at midnight through August 18, 2015 at 6:12 AM.

Table 1. Sample Data (name: census)

time	location	scientist	butterflies	honeybees
2015-08-18T00:00:00Z	1	langstroth	12	23
2015-08-18T00:00:00Z	1	perpetua	1	30
2015-08-18T00:06:00Z	1	langstroth	11	28
2015-08-18T00:06:00Z	1	perpetua	3	28
2015-08-18T05:54:00Z	2	langstroth	2	11
2015-08-18T06:00:00Z	2	langstroth	1	10
2015-08-18T06:06:00Z	2	perpetua	8	23
2015-08-18T06:12:00Z	2	perpetua	7	22

Influx Client

influx is InfluxDB’s command line interface (CLI) that you can use to interact with an InfluxDB server. For example, you can write data (manually or from a file), query data interactively, and view query output in different formats.

Assuming it was installed in your system, you can type "influx" to launch the CLI as below:

$ influx

Connected to http://localhost:8086 version 1.5.0

InfluxDB shell version: 1.5.0

> help

Usage:

connect connects to another node specified by host:port

auth prompts for username and password

pretty toggles pretty print for the json format

chunked turns on chunked responses from server

chunk size sets the size of the chunked responses. Set to 0 to reset to the default chunked size

use sets current database

format specifies the format of the server responses: json, csv, or column

precision specifies the format of the timestamp: rfc3339, h, m, s, ms, u or ns

consistency sets write consistency level: any, one, quorum, or all

history displays command history

settings outputs the current settings for the shell

clear clears settings such as database or retention policy. run 'clear' for help

exit/quit/ctrl+d quits the influx shell

show databases show database names

show series show series information

show measurements show measurement information

show tag keys show tag key information

show field keys show field key information

A full list of influxql commands can be found at:

https://docs.influxdata.com/influxdb/latest/query_language/spec/

As highlighted above, below items are the key concepts in InfluxDB:

Series

Is the collection of data that share a retention policy, measurement, and tag set

Measurements

Acts as a container for tags, fields, and the time column
The measurement name is the description of the data that are stored in the associated fields

Tags

Are made up of tag keys and tag values.

Both tag keys and tag values are stored as strings and record metadata.

Tags are defined into JSON and indexed
Tag Set

Is the different combinations of all the tag key-value pairs

Fields

Fields are NOT indexed

How Data is Organized in Influx

In InfluxDB, data are organized as:

Databases (like in MySQL, Postgres, etc)

A logical container for users, retention policies, continuous queries, and time series data

Time series

Kind of like tables

Primary key is always time
Null values are not stored

A time series is composed by points or events

Points or events

Kind of like rows

Using sample data (Table 1) as examples:

Fields are

butterflies, honeybees

Tags are

location, scientist

Tag Sets are

location = 1, scientist = langstroth
location = 2, scientist = langstroth
location = 1, scientist = perpetua
location = 2, scientist = perpetua

Measurement is

census

Series are

See Table 2

Table 2. Time Series

Arbitrary series number Retention policy Measurement Tag set

series 1 autogen census location = 1,scientist = langstroth

series 2 autogen census location = 2,scientist = langstroth

series 3 autogen census location = 1,scientist = perpetua

series 4 autogen census location = 2,scientist = perpetua

Summary

In a nutshell, InfluxDB is a

Time series database

Where the timestamp is the key
All data in InfluxDB have time column. time stores timestamps, and the timestamp shows the date and time, in RFC3339 UTC (e.g., 2015-08-18T00:06:00Z), associated with particular data
Works best with large number of series with fewer columns in each one

Schemaless database

Which means it’s easy to add new measurements, tags, and fields at any time
It’s designed to make working with time series data easier and faster

InfluxQL is a SQL-like query language for interacting with InfluxDB and providing features specific to storing and analyzing time series data.

References

Devoxx france 2015 influxdb
InfluxDB Key Concepts
InfluxQL
Oracle Cloud Infrastructure (redthunder.blog)

Tuesday, January 8, 2019

InfluxDB―Knowing Its Key Concepts

Design Goals of InfluxDB

Time Series Database

Influx Client

How Data is Organized in Influx

Arbitrary series number Retention policy Measurement Tag set

series 1 autogen census location = 1,scientist = langstroth

series 2 autogen census location = 2,scientist = langstroth

series 3 autogen census location = 1,scientist = perpetua

series 4 autogen census location = 2,scientist = perpetua

Summary

References

No comments:

Post a Comment

Arbitrary series number	Retention policy	Measurement	Tag set
series 1	autogen	census	location = 1,scientist = langstroth
series 2	autogen	census	location = 2,scientist = langstroth
series 3	autogen	census	location = 1,scientist = perpetua
series 4	autogen	census	location = 2,scientist = perpetua

Tuesday, January 8, 2019

InfluxDB―Knowing Its Key Concepts

Design Goals of InfluxDB

Time Series Database

Influx Client

How Data is Organized in Influx

Arbitrary series number Retention policy Measurement Tag set series 1 autogen census location = 1,scientist = langstroth series 2 autogen census location = 2,scientist = langstroth series 3 autogen census location = 1,scientist = perpetua series 4 autogen census location = 2,scientist = perpetua

Summary

References

No comments:

Post a Comment