Replies: 2 comments 3 replies
-
I appreciate your thoughts on establishing effective time-series in arcadedb. For me its not clear, how a time-series differs form an ordinary embedded Hash with limited update functionality. I am asking myself, why not enhance the already implemented embedded List? |
Beta Was this translation helpful? Give feedback.
-
Hello! Any update on the time series model? |
Beta Was this translation helpful? Give feedback.
-
We have some users that are already using ArcadeDB for time series despite it's not an official model. We collected many use cases and come up with a design that should be easy to implement and, most importantly, blazing fast and space efficient.
The idea is simple: when you create a time series type, you define the following special attributes:
You can find some of these concepts in clustered tables.
For example, if you have sensor data with millisecond precision, let's say around 1-10K measurements per minute, you could create a type "Sensor" with the following settings:
This means ArcadeDB will create a new file every day (with a name such as "Sensor_20230721") and it will start storing sensor data from this day in that file.
Each page stores only a minute in this case. The page size is configurable. Let's say you are keeping the following data arrived from a measurement:
Then ArcadeDB will save the record that hosts the minute relative to the timestamp
1689956195339
that isFriday, July 21, 2023 4:16:35.339 PM
(GMT) in the fileSensor_20230721
.A page has the following header (8,210 bytes total):
The record above will be stored in the following way:
The record in the example above could be stored in only 3 bytes with the most favorable conditions. A 64K page, without the header (that is 8,210 bytes), can use up to 57,326 bytes of content = an average of 13 bytes per record.
The other page attributes (
previous page id
andnext page id
) work as a linked list. In the perfect scenario that sensor data are coming ordered by timestamp, a dichotomic search would be very efficient to look up the right page during a query. In the case some records arrive late, the relative page is updated until there is space, otherwise a new page is appended and linked to the previous one.If you're looking for a sensor in a particular range you will be able to issue this query:
And ArcadeDB will use this special search to look for the record in this range. This works as a
clustered index
and there is no need to create an index on timestamp for an efficient retrieval. Also, thisclustered index
layout allows fast lookups and minimal storage = fast search and blazing fast insert.We run some benchmarks internally to simulate this structure with the current buckets and we were able to measure >3M insert per second on a MacBook Pro 2019 using 7 parallel threads (!)
Another topic is a configurable pre-aggregation of data. In the example above, you could specify to aggregate the temperature by minute, using the
AVERAGE
function. In this way, during the insertion, the aggregated value would be updated and ready to be returned without any calculation. This is meant for phase 2 of the time series module.WDYT? Any feedback about this?
Beta Was this translation helpful? Give feedback.
All reactions