Tuesday, January 8, 2019

InfluxDB―Knowing Its Key Concepts

InfluxDB is a fast time-series database distributed under an open source license with commercial support. It allows for precision to a nanosecond.

Design Goals of InfluxDB


The original design goals of InfluxDB include:[1]
  • Simple to install and manage
  • No external dependencies like Zookeeper and Hadoop
  • HTTP(s) interface for reading and writing data
  • Horizontally scalable
  • On disk and in memory
    • Most data is cold
  • Compute percentiles and other functions on the fly
  • Downsample data on different windows of time

Time Series Database


A Time Series Database (TSDB) is a database optimized for time series data. Time series data are simply measurements or events that are things you want to ask questions about, visualize, or summarize over time.

To illustrate the concepts of InfluxDB, we use below sample data (Table 1) with a measurement named census in it, which shows the number of butterflies and honeybees counted by two scientists (langstroth and perpetua) in two locations (location 1 and location 2) over the time period from August 18, 2015 at midnight through August 18, 2015 at 6:12 AM. 

Table 1.  Sample Data (name: census)

timelocationscientistbutterflieshoneybees
2015-08-18T00:00:00Z1langstroth1223
2015-08-18T00:00:00Z1perpetua130
2015-08-18T00:06:00Z1langstroth1128
2015-08-18T00:06:00Z1perpetua328
2015-08-18T05:54:00Z2langstroth211
2015-08-18T06:00:00Z2langstroth110
2015-08-18T06:06:00Z2perpetua823
2015-08-18T06:12:00Z2perpetua722


Influx Client


influx is InfluxDB’s command line interface (CLI) that you can use to interact with an InfluxDB server.  For example, you can write data (manually or from a file), query data interactively, and view query output in different formats.

Assuming it was installed in your system, you can type "influx" to launch the CLI as below:

$ influx
Connected to http://localhost:8086 version 1.5.0
InfluxDB shell version: 1.5.0
> help
Usage:
        connect    connects to another node specified by host:port
        auth                  prompts for username and password
        pretty                toggles pretty print for the json format
        chunked               turns on chunked responses from server
        chunk size      sets the size of the chunked responses.  Set to 0 to reset to the default chunked size
        use          sets current database
        format        specifies the format of the server responses: json, csv, or column
        precision     specifies the format of the timestamp: rfc3339, h, m, s, ms, u or ns
        consistency    sets write consistency level: any, one, quorum, or all
        history               displays command history
        settings              outputs the current settings for the shell
        clear                 clears settings such as database or retention policy.  run 'clear' for help
        exit/quit/ctrl+d      quits the influx shell

        show databases        show database names
        show series           show series information
        show measurements     show measurement information
        show tag keys         show tag key information
        show field keys       show field key information

        A full list of influxql commands can be found at:

As highlighted above, below items are the key concepts in InfluxDB:
  • Series
    • Is the collection of data that share a retention policy, measurement, and tag set
  • Measurements
    • Acts as a container for tags, fields, and the time column
    • The measurement name is the description of the data that are stored in the associated fields
  • Tags
    • Are made up of tag keys and tag values
      • Both tag keys and tag values are stored as strings and record metadata. 
    • Tags are defined into JSON and indexed
    • Tag Set
      • Is the different combinations of all the tag key-value pairs
  • Fields
    • Fields are NOT indexed

How Data is Organized in Influx


In InfluxDB, data are organized as:
  • Databases (like in MySQL, Postgres, etc)
  • Time series 
    • Kind of like tables
      • Primary key is always time
      • Null values are not stored
    • A time series is composed by points or events
  • Points or events
    • Kind of like rows
Using sample data (Table 1) as examples:
  • Fields are
    • butterflies, honeybees
  • Tags are
    • location, scientist
  • Tag Sets are
    • location = 1, scientist = langstroth
    • location = 2, scientist = langstroth
    • location = 1, scientist = perpetua
    • location = 2, scientist = perpetua
  • Measurement is
    • census
  • Series are
    • See Table 2

Table 2. Time Series

Arbitrary series number  Retention policy  Measurement  Tag set
series 1 autogen census location = 1,scientist = langstroth
series 2 autogen census location = 2,scientist = langstroth
series 3 autogen census location = 1,scientist = perpetua
series 4 autogen census location = 2,scientist = perpetua

Summary


In a nutshell, InfluxDB is a
  • Time series database
    • Where the timestamp is the key
    • All data in InfluxDB have time column. time stores timestamps, and the timestamp shows the date and time, in RFC3339 UTC (e.g., 2015-08-18T00:06:00Z), associated with particular data
    • Works best with large number of series with fewer columns in each one
  • Schemaless database 
    • Which means it’s easy to add new measurements, tags, and fields at any time
    • It’s designed to make working with time series data easier and faster

InfluxQL is a SQL-like query language for interacting with InfluxDB and providing features specific to storing and analyzing time series data.

References

  1. Devoxx france 2015 influxdb
  2. InfluxDB Key Concepts
  3. InfluxQL
  4. Oracle Cloud Infrastructure (redthunder.blog) 

No comments: