-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] changes to enable folding YCSB-TS back into YCSB #1095
[core] changes to enable folding YCSB-TS back into YCSB #1095
Conversation
Sets compiler target to 1.8 Additionally removes openjdk7 from travis build-matrix Fixes brianfrankcooper#1033 Additionally includes lexicographic reordering of binding-dependency properties in root pom
This base-class parses the queries generated by TimeSeriesWorkload into a format that's usable for simple consumption. The format has been defined through the work done by Andreas Bader (@baderas) in https://github.com/TSDBBench/YCSB-TS.
Ping @manolama |
For now the follow-up pull-requests are suspended, until this one is merged in, because any subsequent work will require the changes from this PR. Given that merging apparently happens through squashes, I'd prefer to keep rebases and cherry-picks to a minimum. Accordingly I'll just continue working in the overarching PR, which should approach a diff of +/- 0 when follow-up PRs are merged |
Some TimeSeries Databases require a "schema" up front. To generate said schema, the tag keys used by the workload are required. This method should be centrally available for maintainability
TimeSeries databases only rarely support deleting data. As such we can generally assume the operation to always return a Status.NOT_IMPLEMENTED. Similarly to updating that trivial implementation does not need to be repeated in every TimeSeriesDatabase implementation
Ah got it, will complete the CR by Sunday, thanks! |
@@ -0,0 +1,282 @@ | |||
package com.yahoo.ycsb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright header please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright header has been added
// seems like this should be a more elaborate query. | ||
// for now we don't support querying ranges | ||
// TODO: Support Timestamp range queries | ||
return Status.NOT_IMPLEMENTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have both start and stop timestamps for the read()
method so I'd go ahead and pass it. There are definitely much more complex queries dealing with timestamps but we really just want to focus on common features in TSDs and querying over a time-range is the most common use case.
Ah never mind, you're right if we are doing a read, we're looking for a data point. In that case you may want to throw a DBException here saying that range queries aren't allowed for a read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can not throw DBException here, because DB#read
does not declare a throw for that. I'll be changing this to return Status.BAD_REQUEST
though.
* @param tags actual tags that were want to receive (can be empty) | ||
* @return Zero on success, a non-zero error code on error or "not found". | ||
*/ | ||
protected abstract Status read(String metric, Long timestamp, Map<String, List<String>> tags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to pass through table
though a lot of TSDs will ignore it, and add the end timestamp and convert Long
to long
to use the primitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metric
is the equivalent to table
. See also the callsite (return read(table, timestamp, tagQueries);
)
As established above, range queries for reads make no sense. Not sure what you mean by "add the end timestamp"
public final Status scan(String table, String startkey, int recordcount, Set<String> fields, | ||
Vector<HashMap<String, ByteIterator>> result) { | ||
Map<String, List<String>> tagQueries = new HashMap<>(); | ||
Long start = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primitives please :)
end = Long.valueOf(rangeParts[1]); | ||
} else if (field.startsWith(groupByKey)) { | ||
String groupBySpecifier = field.split(tagPairDelimiter)[1]; | ||
aggregationOperation = TimeseriesDB.AggregationOperation.valueOf(groupBySpecifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to wrap this up in a DBException or WorkloadException saying that the group by wasn't in the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same problem as mentioned above applies. There is no exception declared in the super method, as such it's not possible to throw an exception here.
* @param tags A HashMap of tag/tagvalue pairs to insert as tagsmv c | ||
* @return A {@link Status} detailing the outcome of the insert | ||
*/ | ||
protected abstract Status insert(String metric, Long timestamp, double value, Map<String, ByteIterator> tags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long
-> long
. And I know a lot of TSDs only support doubles (grrr) but we should have an override for signed ints too. Implementations can return a Status.NOT_IMPLEMENTED
if they don't support storing ints.
/** | ||
* NOTE: This operation is usually <b>not</b> supported for Time-Series databases. | ||
* Deletion of data is often instead regulated through automatic cleanup and "retention policies" or similar. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note for devs who see this comment, it will be more important for TSDs to support arbitrary deletion in the future due to laws like GDPR.
return Status.NOT_IMPLEMENTED; | ||
} | ||
|
||
protected final String buildTagFilter(Map<String, List<String>> tags) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some docs about what this helper is used for. It seems SQLish so may only apply to certain TSDs and may actually be better in a DB implementation class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped the helper, since it's as of now only used in InfluxDB
* An enum containing the possible aggregation operations. | ||
*/ | ||
public enum AggregationOperation { | ||
NONE, SUM, AVERAGE, COUNT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have MAX
and MIN
as well, those are fairly common. And javadocs would be nice :)
@@ -1,6 +1,6 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> | |||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorting is nice but we may want to break it out into another PR.
This adds a copyright header and significant amounts of javadoc for TimeseriesDB Furthermore occurrences of Long have been largely replaced with the primitive long Finally an overload for signed integer values has been added for insertions
…oper#1095) * [core] Drop JDK 7 Support Sets compiler target to 1.8 Additionally removes openjdk7 from travis build-matrix Fixes brianfrankcooper#1033 Additionally includes lexicographic reordering of binding-dependency properties in root pom * [core] introduce baseclass for timeseries database bindings This base-class parses the queries generated by TimeSeriesWorkload into a format that's usable for simple consumption. The format has been defined through the work done by Andreas Bader (@baderas) in https://github.com/TSDBBench/YCSB-TS. * [core] parse debug and test properties for all Timeseries databases * [core] Add method to define possible tagKeys for Timeseries Workloads Some TimeSeries Databases require a "schema" up front. To generate said schema, the tag keys used by the workload are required. This method should be centrally available for maintainability * [core] Provide default implementation for operation TimeSeries databases only rarely support deleting data. As such we can generally assume the operation to always return a Status.NOT_IMPLEMENTED. Similarly to updating that trivial implementation does not need to be repeated in every TimeSeriesDatabase implementation * [core] Address review This adds a copyright header and significant amounts of javadoc for TimeseriesDB Furthermore occurrences of Long have been largely replaced with the primitive long Finally an overload for signed integer values has been added for insertions
This is the first pull-request in a series that intend to fold back the YCSB-TS fork into the mainline.
See #1068 for an "overarching" pull request, as well as TSDBBench/issues/8