New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[core] changes to enable folding YCSB-TS back into YCSB #1095

Merged

manolama merged 6 commits into brianfrankcooper:master from TSDBBench:ycsb-ts-fold-core

Mar 19, 2018

Contributor

Vogel612 commented Feb 14, 2018

This is the first pull-request in a series that intend to fold back the YCSB-TS fork into the mainline.

See #1068 for an "overarching" pull request, as well as TSDBBench/issues/8

Vogel612 added 3 commits

February 14, 2018 17:00


          [core] Drop JDK 7 Support

e77ed8a

Sets compiler target to 1.8
Additionally removes openjdk7 from travis build-matrix
Fixes brianfrankcooper#1033

Additionally includes lexicographic reordering of binding-dependency properties in root pom


          [core] introduce baseclass for timeseries database bindings

dd32402

This base-class parses the queries generated by TimeSeriesWorkload into a format
that's usable for simple consumption.
The format has been defined through the work done by Andreas Bader (@baderas) in
https://github.com/TSDBBench/YCSB-TS.


          [core] parse debug and test properties for all Timeseries databases

f798f05

Contributor Author

Vogel612 commented Feb 14, 2018

Contributor Author

Vogel612 commented Feb 14, 2018

For now the follow-up pull-requests are suspended, until this one is merged in, because any subsequent work will require the changes from this PR. Given that merging apparently happens through squashes, I'd prefer to keep rebases and cherry-picks to a minimum. Accordingly I'll just continue working in the overarching PR, which should approach a diff of +/- 0 when follow-up PRs are merged

cgmcintyr mentioned this pull request

Fix missing comma in DATABASES dictionary TSDBBench/YCSB-TS#9

Merged

Contributor Author

Vogel612 commented Mar 15, 2018

Hey @manolama @busbey this has been sitting for a month now. Can I pretty please get a review? Thanks

Vogel612 added 2 commits

March 15, 2018 20:29


          [core] Add method to define possible tagKeys for Timeseries Workloads

144ab75

Some TimeSeries Databases require a "schema" up front.
To generate said schema, the tag keys used by the workload are required.
This method should be centrally available for maintainability


          [core] Provide default implementation for operation

ea20862

TimeSeries databases only rarely support deleting data. As such we can generally assume
the operation to always return a Status.NOT_IMPLEMENTED. Similarly to updating that trivial
implementation does not need to be repeated in every TimeSeriesDatabase implementation

Collaborator

manolama commented Mar 17, 2018

Ah got it, will complete the CR by Sunday, thanks!

manolama requested changes

View reviewed changes

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java

		@@ -0,0 +1,282 @@
		package com.yahoo.ycsb;

Collaborator

manolama Mar 17, 2018

Copyright header please.

Contributor Author

Vogel612 Mar 19, 2018

Copyright header has been added

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java Outdated

+                        // seems like this should be a more elaborate query.
+                        // for now we don't support querying ranges
+                        // TODO: Support Timestamp range queries
+                        return Status.NOT_IMPLEMENTED;

Collaborator

manolama Mar 18, 2018

We should have both start and stop timestamps for the read() method so I'd go ahead and pass it. There are definitely much more complex queries dealing with timestamps but we really just want to focus on common features in TSDs and querying over a time-range is the most common use case.

Ah never mind, you're right if we are doing a read, we're looking for a data point. In that case you may want to throw a DBException here saying that range queries aren't allowed for a read.

Contributor Author

Vogel612 Mar 19, 2018

Can not throw DBException here, because DB#read does not declare a throw for that. I'll be changing this to return Status.BAD_REQUEST though.

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java Outdated

+                 * @param tags      actual tags that were want to receive (can be empty)
+                 * @return Zero on success, a non-zero error code on error or "not found".
+                 */
+                protected abstract Status read(String metric, Long timestamp, Map<String, List<String>> tags);

Collaborator

manolama Mar 18, 2018

Still need to pass through table though a lot of TSDs will ignore it, and add the end timestamp and convert Long to long to use the primitive.

Contributor Author

Vogel612 Mar 19, 2018 •

edited

Loading

metric is the equivalent to table. See also the callsite (return read(table, timestamp, tagQueries);)

As established above, range queries for reads make no sense. Not sure what you mean by "add the end timestamp"

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java Outdated

+                public final Status scan(String table, String startkey, int recordcount, Set<String> fields,
+                                         Vector<HashMap<String, ByteIterator>> result) {
+                  Map<String, List<String>> tagQueries = new HashMap<>();
+                  Long start = null;

Collaborator

manolama Mar 18, 2018

Primitives please :)

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java

+                      end = Long.valueOf(rangeParts[1]);
+                    } else if (field.startsWith(groupByKey)) {
+                      String groupBySpecifier = field.split(tagPairDelimiter)[1];
+                      aggregationOperation = TimeseriesDB.AggregationOperation.valueOf(groupBySpecifier);

Collaborator

manolama Mar 18, 2018

Might be nice to wrap this up in a DBException or WorkloadException saying that the group by wasn't in the enum.

Contributor Author

Vogel612 Mar 19, 2018

The same problem as mentioned above applies. There is no exception declared in the super method, as such it's not possible to throw an exception here.

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java Outdated

+                 * @param tags      A HashMap of tag/tagvalue pairs to insert as tagsmv c
+                 * @return A {@link Status} detailing the outcome of the insert
+                 */
+                protected abstract Status insert(String metric, Long timestamp, double value, Map<String, ByteIterator> tags);

Collaborator

manolama Mar 18, 2018

Long -> long. And I know a lot of TSDs only support doubles (grrr) but we should have an override for signed ints too. Implementations can return a Status.NOT_IMPLEMENTED if they don't support storing ints.

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java

+                /**
+                 * NOTE: This operation is usually <b>not</b> supported for Time-Series databases.
+                 * Deletion of data is often instead regulated through automatic cleanup and "retention policies" or similar.
+                 *

Collaborator

manolama Mar 18, 2018

Just a note for devs who see this comment, it will be more important for TSDs to support arbitrary deletion in the future due to laws like GDPR.

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java Outdated

+                  return Status.NOT_IMPLEMENTED;
+                }
+                protected final String buildTagFilter(Map<String, List<String>> tags) {

Collaborator

manolama Mar 18, 2018

Add some docs about what this helper is used for. It seems SQLish so may only apply to certain TSDs and may actually be better in a DB implementation class.

Contributor Author

Vogel612 Mar 19, 2018

dropped the helper, since it's as of now only used in InfluxDB

core/src/main/java/com/yahoo/ycsb/TimeseriesDB.java Outdated

+                 * An enum containing the possible aggregation operations.
+                 */
+                public enum AggregationOperation {
+                  NONE, SUM, AVERAGE, COUNT;

Collaborator

manolama Mar 18, 2018

Would be good to have MAX and MIN as well, those are fairly common. And javadocs would be nice :)

pom.xml

		@@ -1,6 +1,6 @@
		<?xml version="1.0" encoding="UTF-8"?>
		<!--

Collaborator

manolama Mar 18, 2018

The sorting is nice but we may want to break it out into another PR.


          [core] Address review

4cb8ec9

This adds a copyright header and significant amounts of javadoc for TimeseriesDB
Furthermore occurrences of Long have been largely replaced with the primitive long
Finally an overload for signed integer values has been added for insertions

manolama approved these changes

View reviewed changes

manolama merged commit 2218649 into brianfrankcooper:master

busbey mentioned this pull request

[WIP] Folding the YCSB-TS fork back into YCSB #1068

Open

ChrisG55 pushed a commit to ChrisG55/YCSB that referenced this pull request


          [core] changes to enable folding YCSB-TS back into YCSB (brianfrankco…

e80b48e

…oper#1095)

* [core] Drop JDK 7 Support

Sets compiler target to 1.8
Additionally removes openjdk7 from travis build-matrix
Fixes brianfrankcooper#1033

Additionally includes lexicographic reordering of binding-dependency properties in root pom

* [core] introduce baseclass for timeseries database bindings

This base-class parses the queries generated by TimeSeriesWorkload into a format
that's usable for simple consumption.
The format has been defined through the work done by Andreas Bader (@baderas) in
https://github.com/TSDBBench/YCSB-TS.

* [core] parse debug and test properties for all Timeseries databases

* [core] Add method to define possible tagKeys for Timeseries Workloads

Some TimeSeries Databases require a "schema" up front.
To generate said schema, the tag keys used by the workload are required.
This method should be centrally available for maintainability

* [core] Provide default implementation for  operation

TimeSeries databases only rarely support deleting data. As such we can generally assume
the operation to always return a Status.NOT_IMPLEMENTED. Similarly to updating that trivial
implementation does not need to be repeated in every TimeSeriesDatabase implementation

* [core] Address review

This adds a copyright header and significant amounts of javadoc for TimeseriesDB
Furthermore occurrences of Long have been largely replaced with the primitive long
Finally an overload for signed integer values has been added for insertions

busbey mentioned this pull request

Release 0.14.0 #1117

Closed

smartygus mentioned this pull request

[WIP] TimeSeriesWorkload enhancements + New TS Workloads + Cassandra Adapater for TimeSeriesWorkload #1407

Draft

4 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet