Concepts¶

If you are relatively new to the Predictive Grid Platform, then there are a few things you should be aware of about interacting with the server. First of all, time series databases such as BTrDB (our internal data timeseries database) are not relational databases, and so they behave differently, have different access methods and provide different guarantees.

The following sections provide insight into the high level objects and aspects of their behavior which will allow you to use them effectively.

BTrDB Server¶

Like most time series databases, the Predictive Grid Platform contains multiple streams of data in which each stream contains timeseries data. However, the Predictive Grid Platform focuses on univariate data which opens a host of benefits and is one of the reasons the platform is able to process incredibly large amounts of data quickly and easily.

Points¶

Points of data within a time series make up the smallest objects you will be dealing with when making calls to the database. Because there are different types of interactions with the database, there are different types of points that could be returned to you: RawPoint and StatPoint.

RawPoint¶

The RawPoint represents a single time/value pair and is the simpler of the two types of points. This is most useful when you need to process every single value within the stream.

    >>> # view time and value of a single point in the stream
    >>> point.time
    1547241923338098176
    point.value
    120.5

StatPoint¶

The StatPoint provides statistics about multiple points and gives aggregation values such as min, max, mean, count and stddev (standard deviation). This is most useful when you don't need to touch every individual value such as when you only need the count of the values over a range of time.

These statistical queries execute in time proportional to the number of results, not the number of underlying points (i.e logarithmic time) and so you can attain valuable data in a fraction of the time when compared with retrieving all of the individual values. Due to the internal data structures, the Predictive Grid Platform does not need to read the underlying points to return these statistics!

    >>> # view aggregate values for points in a stream
    >>> point.time
    1547241923338098176

    >>> point.min
    42.1

    >>> point.mean
    78.477

    >>> point.max
    122.4

    >>> point.count
    18600

    >>> point.stddev
    3.4

Tabular Data¶

Most of the time when accessing the Predictive Grid via the PingThings API, you'll be working with tabular data in the form of PyArrow Tables, a highly efficient format that can be easily converted into fan favorite data structures like pandas DataFrames.

Streams¶

Stream s represent a single series of time/value pairs. As such, the database can hold an almost unlimited amount of individual streams. Each stream has a collection which is similar to a "path" or grouping for multiple streams. Each steam will also have a name as well as a uuid which is guaranteed to be unique across streams.

Predictive Grid Platform data is versioned such that changes to a given stream (time series) will result in a new version for the stream. In this manner, you can pin your interactions to a specific version ensuring the values do not change over the course of your interactions.

Note

If you want to work with the most recent version/data then specify a version of 0 (the default).

Each stream has a number of attributes and methods available and these are documented within the API Reference section of this documentation. But the most common interactions by users are to access the UUID, tags, annotations, version, and underlying data.

Each stream uses a UUID as its unique identifier which can also be used when querying for streams. Metadata is provided by tags and annotations which are both provided as dictionaries of data. tags are used internally and have very specific keys while annotations are more free-form and can be used by you to store your own metadata.

    >>> # retrieve stream's UUID
    >>> stream.uuid
    UUID('35bdb8dc-bf18-4523-85ca-8ebe384bd9b5')

    >>> # retrieve stream's current version
    >>> stream.get_latest_version()
    229266

    >>> # retrieve stream tags
    >>> stream.tags()
    {'name': 'L1MAG', 'unit': 'volts', 'ingress': '', 'distiller': ''}

    >>> # retrieve stream annotations
    >>> stream.annotations()
    {'location': 'PV array',
     'impedance': '{"source": "PMU3", "target": "PMU1", "Zreal": "..."}'}

    >>> # find the earliest and latest datapoints from stream version 133
    >>> stream.earliest(version=133)
    Point(time: 1443715704008333000, value: 118.93663787841797)
    >>> start = _.time  # store last value
    >>> Stream.latest(version=133)
    Point(time: 1443741639999999000, value: 119.0277099609375)

    >>> # pull some raw values
    >>> stream.raw_values(start=start, end=start + pt.utils.ns_delta(seconds=1), version=133)
    pyarrow.Table
    time: timestamp[ns, tz=UTC] not null
    value: float not null
    ----
    time: [[2015-10-01 16:08:24.008333000Z,2015-10-01 16:08:24.008333000Z,2015-10-01 16:08:24.016666000Z,2015-10-01 16:08:24.016666000Z,2015-10-01 16:08:24.024999000Z,...,2015-10-01 16:08:24.983333000Z,2015-10-01 16:08:24.991666000Z,2015-10-01 16:08:24.991666000Z,2015-10-01 16:08:24.999999000Z,2015-10-01 16:08:24.999999000Z]]
    value: [[118.93664,118.93664,118.93267,118.93267,118.93979,...,118.8709,118.87462,118.87462,118.87119,118.87119]]

    >>> # pull some statistical values
    >>> stream.windowed_values(start, start + pt.utils.ns_delta(days=5), width=pt.utils.ns_delta(days=1))
    pyarrow.Table
    time: timestamp[ns, tz=UTC] not null
    min: double not null
    mean: double not null
    max: double not null
    count: uint64 not null
    stddev: double not null
    ----
    time: [[2015-10-01 07:39:15.548954624Z,2015-10-02 03:12:04.293132288Z,2015-10-02 22:44:53.037309952Z,2015-10-03 18:17:41.781487616Z,2015-10-04 13:50:30.525665280Z,2015-10-05 09:23:19.269842944Z]]
    min: [[117.23108673095703,117.31828308105469,117.65534210205078,116.78414916992188,117.88751983642578,117.23599243164062]]
    mean: [[118.79573168499999,119.18348273119825,119.43878387730199,119.59166904577015,119.46326680390561,119.02729526047821]]
    max: [[121.25965881347656,121.36625671386719,121.30216979980469,120.76347351074219,120.76020812988281,120.46713256835938]]
    count: [[4697722,8444249,8444249,8444250,8444249,8444249]]
    stddev: [[0.7508700216994401,0.6816618775653529,0.4419924931102922,0.5311943349684866,0.43286462539457415,0.5332431786098909]]

StreamSets¶

Often you will want to query and work with multiple streams instead of just an individual stream: StreamSets allow you to do this effectively. A StreamSet is a lightweight wrapper around a list of Stream objects with many of the same methods. This allows you to more effectively query multiple data sources at once rather than rely on making individual calls to each Stream.

    >>> # grab all streams for one PMU
    >>> streamset = conn.streams_in_collection("sunshine/PMU1")
    >> print(f"Total streams: {len(streamset)}")
    Total streams: 13

    >>> streamset.count()
    {UUID('6ffb2e7e-273c-4963-9143-b416923980b0'): 5143078642,
     UUID('d625793b-721f-46e2-8b8c-18f882366eeb'): 5143123001,
     UUID('fb61e4d1-3e17-48ee-bdf3-43c54b03d7c8'): 5143097287,
     UUID('d765f128-4c00-4226-bacf-0de8ebb090b5'): 5143098143,
     UUID('1187af71-2d54-49d4-9027-bae5d23c4bda'): 5143132662,
     UUID('0be8a8f4-3b45-4fe3-b77c-1cbdadb92039'): 5143148457,
     UUID('e4efd9f6-9932-49b6-9799-90815507aed0'): 5143182368,
     UUID('886203ca-d3e8-4fca-90cc-c88dfd0283d4'): 5143232504,
     UUID('b2936212-253e-488a-87f6-a9927042031f'): 5143199721,
     UUID('51840b07-297a-42e5-a73a-290c0a47bddb'): 5143185555,
     UUID('97de3802-d38d-403c-96af-d23b874b5e95'): 5143164541,
     UUID('35bdb8dc-bf18-4523-85ca-8ebe384bd9b5'): 5143168296,
     UUID('d4cfa9a6-e11a-4370-9eda-16e80773ce8c'): 5143199915}

    >>> streamset.earliest()
    {UUID('6ffb2e7e-273c-4963-9143-b416923980b0'): Point(time: 1443715704008333000, value: 2),
     UUID('d625793b-721f-46e2-8b8c-18f882366eeb'): Point(time: 1443715704008333000, value: 312.8782653808594),
     UUID('fb61e4d1-3e17-48ee-bdf3-43c54b03d7c8'): Point(time: 1443715704008333000, value: 0.14318889379501343),
     UUID('d765f128-4c00-4226-bacf-0de8ebb090b5'): Point(time: 1443715704008333000, value: 0.14414265751838684),
     UUID('1187af71-2d54-49d4-9027-bae5d23c4bda'): Point(time: 1443715704008333000, value: 0.14462821185588837),
     UUID('0be8a8f4-3b45-4fe3-b77c-1cbdadb92039'): Point(time: 1443715704008333000, value: 192.26597595214844),
     UUID('e4efd9f6-9932-49b6-9799-90815507aed0'): Point(time: 1443715704008333000, value: 191.16004943847656),
     UUID('886203ca-d3e8-4fca-90cc-c88dfd0283d4'): Point(time: 1443715704008333000, value: 71.20166778564453),
     UUID('b2936212-253e-488a-87f6-a9927042031f'): Point(time: 1443715704008333000, value: 118.62010955810547),
     UUID('51840b07-297a-42e5-a73a-290c0a47bddb'): Point(time: 1443715704008333000, value: 310.982421875),
     UUID('97de3802-d38d-403c-96af-d23b874b5e95'): Point(time: 1443715704008333000, value: 71.94818115234375),
     UUID('35bdb8dc-bf18-4523-85ca-8ebe384bd9b5'): Point(time: 1443715704008333000, value: 118.93663787841797),
     UUID('d4cfa9a6-e11a-4370-9eda-16e80773ce8c'): Point(time: 1443715704008333000, value: 118.98837280273438)}

    >>> start = max([point.time for point in _.values()])  # last earliest
    >>> stats = stream.windowed_values(
    ...     start, start + pt.utils.ns_delta(days=5), width=pt.utils.ns_delta(days=1)
    ... )
    >>> print(f"Pulled a table of {stats.shape[0]} rows and {stats.shape[1]} columns")
    Pulled a table of 6 rows and 66 columns

Because StreamSets are just collections of Stream objects, you can easily build or filter your own StreamSets using standard Python methods:

    >>> # start by grabbing *all* Sunshine streams
    >>> sunshine_streams = conn.streams_in_collection("sunshine")
    >>> print(f"Total streams: {len(sunshine_streams)}")
    Total streams: 78
    >>> phase1_currents = StreamSet(
    ...     [
    ...         stream for stream in sunshine_streams
    ...         if stream.name.startswith("C1") and stream.tags()["unit"] == "amps"
    ...     ]
    ... )
    >>> for stream in phase1_currents:
    ...     print(
    ...         stream.uuid,
    ...         stream.collection,
    ...         stream.name,
    ...         pt.utils.ns_to_datetime(stream.earliest().time)
    ... )
    e610bbdc-296e-4a31-bc38-a3aa152b65f6 sunshine/PMU6 C1MAG 2016-03-01 00:00:00.008333+00:00
    a6df7f78-5b6d-4036-a7ef-4de95d04a53e sunshine/PMU2 C1MAG 2016-03-01 00:00:00.008333+00:00
    1e641edc-d95a-494f-99f3-cbb991ef05bf sunshine/PMU3 C1MAG 2015-07-28 00:00:00.008333+00:00
    305e0cb9-5bff-4377-9af7-929d9dfc1909 sunshine/PMU4 C1MAG 2014-06-10 03:27:20.008333+00:00
    1187af71-2d54-49d4-9027-bae5d23c4bda sunshine/PMU1 C1MAG 2015-10-01 16:08:24.008333+00:00
    bd74aa49-8ccc-4f0b-8f83-7195b6c1818e sunshine/PMU5 C1MAG 1999-12-31 23:59:57.008333+00:00