Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] changes to enable folding YCSB-TS back into YCSB #1095

Merged
merged 6 commits into from
Mar 19, 2018

Conversation

Vogel612
Copy link
Contributor

This is the first pull-request in a series that intend to fold back the YCSB-TS fork into the mainline.

See #1068 for an "overarching" pull request, as well as TSDBBench/issues/8

Sets compiler target to 1.8
Additionally removes openjdk7 from travis build-matrix
Fixes brianfrankcooper#1033

Additionally includes lexicographic reordering of binding-dependency properties in root pom
This base-class parses the queries generated by TimeSeriesWorkload into a format
that's usable for simple consumption.
The format has been defined through the work done by Andreas Bader (@baderas) in
https://github.com/TSDBBench/YCSB-TS.
@Vogel612
Copy link
Contributor Author

Ping @manolama

@Vogel612
Copy link
Contributor Author

For now the follow-up pull-requests are suspended, until this one is merged in, because any subsequent work will require the changes from this PR. Given that merging apparently happens through squashes, I'd prefer to keep rebases and cherry-picks to a minimum. Accordingly I'll just continue working in the overarching PR, which should approach a diff of +/- 0 when follow-up PRs are merged

@Vogel612
Copy link
Contributor Author

Hey @manolama @busbey this has been sitting for a month now. Can I pretty please get a review? Thanks

Some TimeSeries Databases require a "schema" up front.
To generate said schema, the tag keys used by the workload are required.
This method should be centrally available for maintainability
TimeSeries databases only rarely support deleting data. As such we can generally assume
the operation to always return a Status.NOT_IMPLEMENTED. Similarly to updating that trivial
implementation does not need to be repeated in every TimeSeriesDatabase implementation
@manolama
Copy link
Collaborator

Ah got it, will complete the CR by Sunday, thanks!

@@ -0,0 +1,282 @@
package com.yahoo.ycsb;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright header please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright header has been added

// seems like this should be a more elaborate query.
// for now we don't support querying ranges
// TODO: Support Timestamp range queries
return Status.NOT_IMPLEMENTED;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have both start and stop timestamps for the read() method so I'd go ahead and pass it. There are definitely much more complex queries dealing with timestamps but we really just want to focus on common features in TSDs and querying over a time-range is the most common use case.

Ah never mind, you're right if we are doing a read, we're looking for a data point. In that case you may want to throw a DBException here saying that range queries aren't allowed for a read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can not throw DBException here, because DB#read does not declare a throw for that. I'll be changing this to return Status.BAD_REQUEST though.

* @param tags actual tags that were want to receive (can be empty)
* @return Zero on success, a non-zero error code on error or "not found".
*/
protected abstract Status read(String metric, Long timestamp, Map<String, List<String>> tags);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to pass through table though a lot of TSDs will ignore it, and add the end timestamp and convert Long to long to use the primitive.

Copy link
Contributor Author

@Vogel612 Vogel612 Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metric is the equivalent to table. See also the callsite (return read(table, timestamp, tagQueries);)

As established above, range queries for reads make no sense. Not sure what you mean by "add the end timestamp"

public final Status scan(String table, String startkey, int recordcount, Set<String> fields,
Vector<HashMap<String, ByteIterator>> result) {
Map<String, List<String>> tagQueries = new HashMap<>();
Long start = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primitives please :)

end = Long.valueOf(rangeParts[1]);
} else if (field.startsWith(groupByKey)) {
String groupBySpecifier = field.split(tagPairDelimiter)[1];
aggregationOperation = TimeseriesDB.AggregationOperation.valueOf(groupBySpecifier);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to wrap this up in a DBException or WorkloadException saying that the group by wasn't in the enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem as mentioned above applies. There is no exception declared in the super method, as such it's not possible to throw an exception here.

* @param tags A HashMap of tag/tagvalue pairs to insert as tagsmv c
* @return A {@link Status} detailing the outcome of the insert
*/
protected abstract Status insert(String metric, Long timestamp, double value, Map<String, ByteIterator> tags);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long -> long. And I know a lot of TSDs only support doubles (grrr) but we should have an override for signed ints too. Implementations can return a Status.NOT_IMPLEMENTED if they don't support storing ints.

/**
* NOTE: This operation is usually <b>not</b> supported for Time-Series databases.
* Deletion of data is often instead regulated through automatic cleanup and "retention policies" or similar.
*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note for devs who see this comment, it will be more important for TSDs to support arbitrary deletion in the future due to laws like GDPR.

return Status.NOT_IMPLEMENTED;
}

protected final String buildTagFilter(Map<String, List<String>> tags) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some docs about what this helper is used for. It seems SQLish so may only apply to certain TSDs and may actually be better in a DB implementation class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped the helper, since it's as of now only used in InfluxDB

* An enum containing the possible aggregation operations.
*/
public enum AggregationOperation {
NONE, SUM, AVERAGE, COUNT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have MAX and MIN as well, those are fairly common. And javadocs would be nice :)

@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting is nice but we may want to break it out into another PR.

This adds a copyright header and significant amounts of javadoc for TimeseriesDB
Furthermore occurrences of Long have been largely replaced with the primitive long
Finally an overload for signed integer values has been added for insertions
@manolama manolama merged commit 2218649 into brianfrankcooper:master Mar 19, 2018
ChrisG55 pushed a commit to ChrisG55/YCSB that referenced this pull request May 4, 2018
…oper#1095)

* [core] Drop JDK 7 Support

Sets compiler target to 1.8
Additionally removes openjdk7 from travis build-matrix
Fixes brianfrankcooper#1033

Additionally includes lexicographic reordering of binding-dependency properties in root pom

* [core] introduce baseclass for timeseries database bindings

This base-class parses the queries generated by TimeSeriesWorkload into a format
that's usable for simple consumption.
The format has been defined through the work done by Andreas Bader (@baderas) in
https://github.com/TSDBBench/YCSB-TS.

* [core] parse debug and test properties for all Timeseries databases

* [core] Add method to define possible tagKeys for Timeseries Workloads

Some TimeSeries Databases require a "schema" up front.
To generate said schema, the tag keys used by the workload are required.
This method should be centrally available for maintainability

* [core] Provide default implementation for  operation

TimeSeries databases only rarely support deleting data. As such we can generally assume
the operation to always return a Status.NOT_IMPLEMENTED. Similarly to updating that trivial
implementation does not need to be repeated in every TimeSeriesDatabase implementation

* [core] Address review

This adds a copyright header and significant amounts of javadoc for TimeseriesDB
Furthermore occurrences of Long have been largely replaced with the primitive long
Finally an overload for signed integer values has been added for insertions
@busbey busbey mentioned this pull request May 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants