- Format version incremented to 9. #2108
- supporting serialization (using capnproto) build on windows #2100
- Config option vfs.s3.sse for S3 server-side encryption support #2130
- Name attribute/dimension files by index. This is fragment-specific and updates the format version to version 9. #2107
- Smoke Test, remove nullable structs from global namespace. #2078
- Added config option vfs.gcs.request_timeout_ms #2148
- Improve fragment info loading by parallelizing fragment_size requests #2143
- Allow open array stats to be printed without read query #2131
- Cleanup the GHA CI scripts - put common code into external shell scripts. #2124
- Reduced memory consumption in the read path for multi-range reads. #2118
- The latest version of was leaving behind a . This ensures that the directory is removed when is run. #2113
- Migrating AZP CI to GA #2111
- Cache non_empty_domain for REST arrays like all other arrays #2105
- Add additional stats printing to breakdown read state initialization timings #2095
- Places the in-memory filesystem under unit test #1961
- Adds a Github Action to automate the HISTORY.md #2075
- Improve GCS multipart locking #2087
- Consolidation functions now use the ctx's config if not config is passed #2126
- Fixes a potential crash when retrying incomplete reads #2137
- Fixes a potential crash when opening an array with consolidated fragment metadata #2135
- Corrected a bug where sparse cells may be incorrectly returned using string dimensions. #2125
- Fix segfault in serialized queries when partition is unsplittable #2120
- Always use original buffer size in serialized read queries serverside. #2115
- Fix an edge-case where a read query may hang on array with string dimensions #2089
- Fix mutex locking bugs on Windows due to unlocking on different thread and missing task join #2077
- Apply 'var_offsets.extra_element' mode to string dimension offsets too #2145
- Removes non-default parameter in tiledb_config_unset. #2099
-
Removes non-default parameter in Config::unset. #2099
-
Add support for a string-typed, variable-sized, nullable attribute in the C++ API. #2090
-
Add new Config constructors for converting from STL map types #2081
- Add support for retrying REST requests that fail with certain http status code such as 503 #2060
- Parallelize across attributes when closing a write #2048
- Support for dimension/attribute names that contain commonly reserved filesystem characters #2047
- Remove unnecessary
is_dir
inFragmentMetadata::store
, this can increase performance for s3 writes #2050 - Improve S3 multipart locking #2055
- Parallize loading fragments and array schema #2061
- REST client support for caching redirects #1919
- Add additional timer statistics for openning array for reads #2027
- Add
rest.creation_access_credentials_name
configuration parameter #2025
- Fixed ArrowAdapter export of string arrays with 64-bit offsets #2037
- Fixed ArrowAdapter export of TILEDB_CHAR arrays with 64-bit offsets #2039
- Add
tiledb_query_set_config
to apply atiledb_config_t
to query-level parameters #2030 - Add
tiledb_heap_profiler_enable
to enable heap memory profiling #2035
- Added
Query::set_config
to apply atiledb::Config
to query-level parameters #2030
- The tile extent can now be set to null, in which case internally TileDB sets the extent to the dimension domain range. #1880
- The C++ API
std::pair<uint64_t, uint64_t> Query::est_result_size_var
has been changed to 1) a return type ofstd::array<uint64_t, 2>
and 2) returns the offsets as a size in bytes rather than elements. #1946
- Support for nullable attributes. #1895 #1938 #1948 #1945
- Support for Hilbert order sorting for sparse arrays. #1880
- Support for AWS S3 "AssumeRole" temporary credentials #1882
- Support for zero-copy import/export with the Apache Arrow adapter #2001
- Experimental support for an in-memory backend used with bootstrap option "--enable-memfs" #1873
- Support for element offsets when reading var-sized attributes. [#1897] (TileDB-Inc#1897)
- Support for an extra offset indicating the size of the returned data when reading var-sized attributes. [#1932] (TileDB-Inc#1932)
- Support for 32-bit offsets when reading var-sized attributes. [#1950] (TileDB-Inc#1950)
- Optimized string dimension performance.
- Added functionality to get fragment information from an array. #1900
- Prevented unnecessary sorting when (1) there is a single fragment and (i) either the query layout is global order, or (ii) the number of dimensions is 1, and (2) when there is a single range for which the result coordinates have already been sorted. #1880
- Added extra stats for consolidation. #1880
- Disabled checking if cells are written in global order when consolidating, as it was redundant (the cells are already being read in global order during consolidation). #1880
- Optimize consolidated fragment metadata loading #1975
- Fix tiledb_dimension_alloc returning a non-null pointer after error [#1959]((TileDB-Inc#1859)
- Fixed issue with string dimensions and non-set subarray (which implies spanning the whole domain). There was an assertion being triggered. Now it works properly.
- Fixed bug when checking the dimension domain for infinity or NaN values. #1880
- Fixed bug with string dimension partitioning. #1880
- Added functions for getting fragment information. #1900
- Added APIs for getting and setting ranges of queries using a dimension name. #1920
- Added class
FragmentInfo
and functions for getting fragment information. #1900 - Added function
Dimension::create
that allows not setting a space tile extent. #1880 - Added APIs for getting and setting ranges of queries using a dimension name. #1920
- Changed
std::pair<uint64_t, uint64_t> Query::est_result_size_var
tostd::array<uint64_t, 2> Query::est_result_size_var
. Additionally, the size estimate for the offsets have been changed from elements to bytes. #1946
- This release was skipped due to an erroneous
2.2.0
git tag.
- Fix deadlock in
ThreadPool::wait_or_work
#1994 - Fix "[TileDB::ChunkedBuffer] Error: Cannot init chunk buffers; Total size must be non-zero." in read path #1992
- Optimize consolidated fragment metadata loading #1975
- Optimize
Reader::load_tile_offsets
for loading only relevant fragments #1976 #1983 - Optimize locking in
FragmentMetadata::load_tile_offsets
andFragmentMetadata::load_tile_var_offsets
#1979 - Exit early in
Reader::copy_coordinates
/Reader::copy_attribute_values
when no results #1984
- Fix segfault in optimized
compute_results_sparse<char>
#1969 - Fix GCS "Error:: Read object failed"#1966
- Fix segfault in
ResultTile::str_coords_intersects
#1981
- Optimize
ResultTile::compute_results_sparse<char>
resulting in significant performance increases in certain cases with string dimensions #1963
- Optimized string dimension performance. #1922
- Updated the AWS SDK to v1.8.84 to fix an uncaught exception when using S3 #1899TileDB-Py #409
- Fixed bug where a read on a sparse array may return duplicate values. #1905
- Fixed bug where an array could not be opened if created with an array schema from an older version #1889
- Fix compilation of TileDB Tools #1926
- Fix ArraySchema not write protecting fill values for only schema version 6 or newer #1868
- Fix segfault that may occur in the VFS read-ahead cache #1871
- The result size estimatation routines will no longer return non-zero sizes that can not contain a single value. #1849
- Fix serialization of dense writes that use ranges #1860
- Fixed a crash from an unhandled SIGPIPE signal that may raise when using S3 #1856
- Empty dense arrays now return cells with fill values. Also the result estimator is adjusted to work properly with this new behavior.
- Added configuration option "sm.compute_concurrency_level" #1766
- Added configuration option "sm.io_concurrency_level" #1766
- Added configuration option "sm.sub_partitioner_memory_budget" #1729
- Added configuration option "vfs.read_ahead_size" #1785
- Added configuration option "vfs.read_ahead_cache_size" #1785
- Added support for getting/setting Apache Arrow buffers from a Query #1816
- Source built curl only need HTTP support #1712
- AWS SDK version bumped to 1.8.6 #1718
- Split posix permissions into files and folers permissions #1719
- Support seeking for CURL to allow redirects for posting to REST #1728
- Changed default setting for
vfs.s3.proxy_scheme
fromhttps
tohttp
to match common usage needs #1759 - Enabled parallelization with native system threads when TBB is disabled #1760
- Subarray ranges will be automatically coalesced as they are added #1755
- Update GCS SDK to v1.16.0 to fixes multiple bugs reported #1768
- Read-ahead cache for cloud-storage backends #1785
- Allow multiple empty values at the end of a variable-length write #1805
- Build system will raise overridable error if important paths contain regex character #1808
- Lazily create AWS ClientConfiguration to avoid slow context creations for non S3 usage after the AWS SDK version bump #1821
- Moved
Status
,ThreadPool
, andLogger
classes from foldertiledb/sm
totiledb/common
#1843
- Deprecated config option "sm.num_async_threads" #1766
- Deprecated config option "sm.num_reader_threads" #1766
- Deprecated config option "sm.num_writer_threads" #1766
- Deprecated config option "vfs.num_threads" #1766
- Support for MacOS older than 10.13 is being dropped when using the AWS SDK. Prebuilt Binaries now target 10.13 #1753
- Use of Intel's Thread Building Blocks (TBB) will be discontinued in the future. It is now disabled by default #1762
- No longer building release artifacts with Intel's Thread Building Blocks (TBB) enabled #1825
- Fixed bug in setting a fill value for var-sized attributes.
- Fixed a bug where the cpp headers would always produce compile-time warnings about using the deprecated c-api "tiledb_coords()" #1765
- Only serialize the Array URI in the array schema client side. #1806
- Fix C++ api
consolidate_metadata
function uses incorrect config #1841 #1844
- Added functions
tiledb_attribute_{set,get}_fill_value
to get/set default fill values
- Added functions
Attribute::{set,get}_fill_value
to get/set default fill values
- Lazy initialization for GCS backend #1752
- Add additional release artifacts which include disabling TBB #1753
- Fix crash during GCS backend initialization due to upstream bug. #1752
- Various performance optimizations in the read path. #1689 #1692 #1693 #1694 #1695
- Google Cloud SDK bumped to 1.14. #1687, #1742
- Fixed error "Error: Out of bounds read to internal chunk buffer of size 65536" that may occur when writing var-sized attributes. #1732
- Fixed error "Error: Chunk read error; chunk unallocated error" that may occur when reading arrays with more than one dimension. #1736
- Fix Catch2 detection of system install #1733
- Use libtiledb-detected certificate path for Google Cloud Storage, for better linux binary/wheel portability. #1741
- Fixed a small memory leak when opening arrays. #1690
- Fixed an overflow in the partioning path that may result in a hang or poor read performance. #1725#1707
- Fix compilation on gcc 10.1 for blosc #1740
- Fixed a rare hang in the usage of
load_tile_var_sizes
. #1748
- Add new config option
vfs.file.posix_permissions
. #1710
- Return possible env config variables in config iter #1714
- Don't include curl's linking to libz, avoids build issue with double libz linkage #1682
- Fix typo in GCS cmake file for superbuild #1665
- Don't error on GCS client init failure #1667
- Don't include curl's linking to ssl, avoids build issue on fresh macos 10.14/10.15 installs #1671
- Handle ubuntu's cap'n proto package not providing cmake targets #1659
- The C++ Attribute::create API now correctly builds from an STL array #1670
- Allow opening arrays with read-only permission on posix filesystems #1676
- Fixed build issue caused by passing std::string to an Aws method #1678
- Add robust retries for S3 SLOW_DOWN errors #1651
- Improve GCS build process #1655
- Add generation of pkg-config file #1656
- S3 should use HEADObject for file size #1657
- Improvements to stats #1652
- Add artifacts to releases from CI #1663
- Remove to unneeded semicolons noticed by the -pedantic flag #1653
- Fix cases were TILEDB_FORCE_ALL_DEPS picked up system builds #1654
- Allow errors to be show in cmake super build #1658
- Properly check vacuum files and limit fragment loading #1661
- Fix edge case where consolidated but unvacuumed array can have coordinates report twice #1662
- Add c-api tiledb_stats_raw_dump[_str] function for raw stats dump #1660
- Add c++-api Stats::raw_dump function for raw stats dump #1660
- Fix hang on open array v1.6 #1645
- Allow empty values for variable length attributes #1646
- Remove deprecated max buffer size APIs from unit tests #1625
- Remove deprecated max buffer API from examples #1626
- Remove zipped coords from examples #1632
- Allow AWSSDK_ROOT_DIR override #1637
- Allow setting zipped coords multiple times #1627
- Fix overflow in check_tile_extent #1635
- Fix C++ Dimension API
{tile_extent,domain}_to_str
. #1638 - Remove xlock in FragmentMetadata::store #1639
- Removed file
__coords.tdb
that stored the zipped coordinates in sparse fragments - Now storing the coordinate tiles on each dimension in separate files
- Changed fragment name format from
__t1_t2_uuid
to__t1_t2_uuid_<format_version>
. That was necessary for backwards compatibility
- Changed
domain
input oftiledb_dimension_get_domain
toconst void**
(fromvoid**
). - Changed
tile_extent
input oftiledb_dimension_get_tile_extent
toconst void**
(fromvoid**
). - Anonymous attribute and dimensions (i.e., empty strings for attribute/dimension names) is no longer supported. This is because now the user can set separate dimension buffers to the query and, therefore, supporting anonymous attributes and dimensions creates ambiguity in the current API.
- Now the TileDB consolidation process does not clean up the fragments or array metadata it consolidates. This is (i) to avoid exclusively locking at any point during consolidation, and (ii) to enable fine-grained time traveling even in the presence of consolidated fragments or array metadata. Instead, we added a special vacuuming API which explicitly cleans up consolidated fragments or array metadata (with appropriate configuration parameters). The vacuuming functions briefly exclusively lock the array.
- Added string dimension support (currently only
TILEDB_STRING_ASCII
). - The user can now set separate coordinate buffers to the query. Also any subset of the dimensions is supported.
- The user can set separate filter lists per dimension, as well as the number of values per coordinate.
- Added support for AWS Security Token Service session tokens via configuration option
vfs.s3.session_token
. #1472 - Added support for indicating zero-value metadata by returning
value_num
== 1 from the_get_metadatata
andArray::get_metadata
APIs #1438 (this is a non-breaking change, as the documented return ofvalue == nullptr
to indicate missing keys does not change)` - User can set coordinate buffers separately for write queries.
- Added option to enable duplicate coordinates for sparse arrays #1504
- Added support for writing at a timestamp by allowing opening an array at a timestamp (previously disabled).
- Added special files with the same name as a fragment directory and an added suffix ".ok", to indicate a committed fragment. This improved the performance of opening an array on object stores significantly, as it avoids an extra REST request per fragment.
- Added functionality to consolidation, which allows consolidating the fragment metadata footers in a single file by toggling a new config parameter. This leads to a huge performance boost when opening an array, as it avoids fetching a separate footer per fragment from storage.
- Various reader parallelizations that boosted read performance significantly.
- Configuration parameters can now be read from environmental variables.
vfs.s3.session_token
->TILEDB_VFS_S3_SESSION_TOKEN
. The prefix ofTILEDB_
is configurable viaconfig.env_var_prefix
. #1600
- The TileDB tiledb_array_consolidate_metadata and tiledb_array_consolidate_metadata_with_key C-API routines have been deprecated and will be removed entirely in a future release. The tiledb_array_consolidate and tiledb_array_consolidate_with_key routines achieve the same behavior when the "sm.consolidation.mode" parameter of the configuration argument is equivalent to "array_meta".
- The TileDB Array::consolidate_metadata CPP-API routine has been deprecated and will be removed entirely in a future release. The Array::consolidate routine achieves the same behavior when the "sm.consolidation.mode" parameter of the configuration argument is equivalent to "array_meta".
- Fixed bug in dense consolidation when the array domain is not divisible by the tile extents.
- Added C API function
tiledb_array_has_metadata_key
and C++ API functionArray::has_metadata_key
#1439 - Added C API functions
tiledb_array_schema_{set,get}_allows_dups
and C++ API functionsArray::set_allows_dups
andArray::allows_dups
- Added C API functions
tiledb_dimension_{set,get}_filter_list
andtiledb_dimension_{set,get}_cell_val_num
- Added C API functions
tiledb_array_get_non_empty_domain_from_{index,name}
- Added C API function
tiledb_array_vacuum
- Added C API functions
tiledb_array_get_non_empty_domain_var_size_from_{index,name}
- Added C API functions
tiledb_array_get_non_empty_domain_var_from_{index,name}
- Added C API function
tiledb_array_add_range_var
- Added C API function
tiledb_array_get_range_var_size
- Added C API function
tiledb_array_get_range_var
- Added C++ API functions
Dimension::set_cell_val_num
andDimension::cell_val_num
. - Added C++ API functions
Dimension::set_filter_list
andDimension::filter_list
. - Added C++ API functions
Array::non_empty_domain(unsigned idx)
andArray::non_empty_domain(const std::string& name)
. - Added C++ API functions
Domain::dimension(unsigned idx)
andDomain::dimension(const std::string& name)
. - Added C++ API function
Array::load_schema(ctx, uri)
andArray::load_schema(ctx, uri, key_type, key, key_len)
. - Added C++ API function
Array::vacuum
. - Added C++ API functions
Array::non_empty_domain_var
(from index and name). - Added C++ API function
add_range
with string inputs. - Added C++ API function
range
with string outputs. - Added C++ API functions
Array
andContext
constructors which take a c_api object to wrap. #1623
- Fix expanded domain consolidation #1572
- Add MD5 and SHA256 checksum filters #1515
- Added support for AWS Security Token Service session tokens via configuration option
vfs.s3.session_token
. #1472
- Fix new SHA1 for intel TBB in superbuild due to change in repository name #1551
- Avoid useless serialization of Array Metadata on close #1485
- Update CONTRIBUTING and Code of Conduct #1487
- Fix deadlock in writes of TileDB Cloud Arrays #1486
- REST requests now will use http compression if available #1479
- Array metadata fetching is now lazy (fetch on use) to improve array open performance #1466
- libtiledb on Linux will no longer re-export symbols from statically linked dependencies #1461
TileDB 1.7.2 contains bug fixes and several internal optimizations.
- Added support for getting/setting array metadata via REST. #1449
- Fixed several REST query and deserialization bugs. #1433, #1437, #1440, #1444
- Fixed bug in setting certificate path on Linux for the REST client. #1452
TileDB 1.7.1 contains build system and bug fixes, and one non-breaking API update.
- Fixed bug in dense consolidation when the array domain is not divisible by the tile extents. #1442
- Added C API function
tiledb_array_has_metadata_key
and C++ API functionArray::has_metadata_key
#1439 - Added support for indicating zero-value metadata by returning
value_num
== 1 from the_get_metadatata
andArray::get_metadata
APIs #1438 (this is a non-breaking change, as the documented return ofvalue == nullptr
to indicate missing keys does not change)`
TileDB 1.7.0 contains the new feature of array metadata, and numerous bugfixes.
- Added array metadata. #1377
- Allow writes to older-versioned arrays. #1417
- Added overseen optimization to check the fragment non-empty domain before loading the fragment R-Tree. #1395
- Use
major.minor
for SOVERSION instead of fullmajor.minor.rev
. #1398
- Numerous query serialization bugfixes and performance improvements.
- Numerous tweaks to build strategy for libcurl dependency.
- Fix crash in StorageManager destructor when GlobalState init fails. #1393
- Fix Windows destructor crash due to missing unlock (mutex/refcount). #1400
- Normalize attribute names in multi-range size estimation C API. #1408
- Added C API functions
tiledb_query_get_{fragment_num,fragment_uri,fragment_timestamp_range}
. #1396 - Added C++ API functions
Query::{fragment_num,fragment_uri,fragment_timestamp_range}
. #1396 - Added C API function
tiledb_ctx_set_tag
and C++ APIContext::set_tag()
. #1406 - Add config support for S3 ca_path, ca_file, and verify_ssl options. #1376
- Removed key-value functionality,
tiledb_kv_*
functions from the C API and Map and MapSchema from the C++ API. #1415
- Added config param
vfs.s3.logging_level
. #1236
- Fixed FP slice point-query with small (eps) gap coordinates. #1384
- Fixed several unused variable warnings in unit tests. #1385
- Fixed missing include in
subarray.h
. #1374 - Fixed missing virtual destructor in C++ API
schema.h
. #1391 - Fixed C++ API build error with clang regarding deleted default constructors. #1394
- Fix incorrect version number listed in
tiledb_version.h
header file and doc page. - Fix issue with release notes from 1.6.0 release. #1359
- Bug fix in incomplete query behavior. #1358
The 1.6.0 release adds the major new feature of non-continuous range slicing, as well as a number of stability and usability improvements. Support is also introduced for datetime dimensions and attributes.
- Added support for multi-range reads (non-continuous range slicing) for dense and sparse arrays.
- Added support for datetime domains and attributes.
- Removed fragment metadata caching. #1197
- Removed array schema caching. #1197
- The tile MBR in the in-memory fragment metadata are organized into an R-Tree, speeding up tile overlap operations during subarray reads. #1197
- Improved encryption key validation process when opening already open arrays. Fixes issue with indefinite growing of the URI to encryption key mapping in
StorageManager
(the mapping is no longer needed). #1197 - Improved dense write performance in some benchmarks. #1229
- Support for direct writes without using the S3 multi-part API. Allows writing to Google Cloud Storage S3 compatibility mode. #1219
- Removed 256-character length limit from URIs. #1288
- Dense reads and writes now always require a subarray to be set, to avoid confusion. #1320
- Added query and array schema serialization API. #1262
- The TileDB KV API has been deprecated and will be removed entirely in a future release. The KV mechanism will be removed when full support for string-valued dimensions has been added.
- Bug fix with amplification factor in consolidation. #1275
- Fixed thread safety issue in opening arrays. #1252
- Fixed floating point exception when writing fixed-length attributes with a large cell value number. #1289
- Fixed off-by-one limitation with floating point dimension tile extents. #1314
- Added functions
tiledb_query_{get_est_result_size, get_est_result_size_var, add_range, get_range_num, get_range}
. - Added function
tiledb_query_get_layout
- Added datatype
tiledb_buffer_t
and functionstiledb_buffer_{alloc,free,get_type,set_type,get_data,set_data}
. - Added datatype
tiledb_buffer_list_t
and functionstiledb_buffer_list_{alloc,free,get_num_buffers,get_total_size,get_buffer,flatten}
. - Added string conversion functions
tiledb_*_to_str()
andtiledb_*_from_str()
for all public enum types. - Added config param
vfs.file.enable_filelocks
- Added datatypes
TILEDB_DATETIME_*
- Added function
tiledb_query_get_array
- Added functions
Query::{query_layout, add_range, range, range_num, array}
.
- Removed ability to set
null
tile extents on dimensions. All dimensions must now have an explicit tile extent.
- Removed cast operators of C++ API objects to their underlying C API objects. This helps prevent inadvertent memory issues such as double-frees.
- Removed ability to set
null
tile extents on dimensions. All dimensions must now have an explicit tile extent. - Changed argument
config
inArray::consolidate()
from a const-ref to a pointer. - Removed default includes of
Map
andMapSchema
. To use the deprecated KV API temporarily, include<tiledb/map.h>
explicitly.
- Better handling of
{C,CXX}FLAGS
during the build. #1209 - Update libcurl dependency to v7.64.1 for S3 builds. #1240
- S3 SDK build error fix. #1201
- Fixed thread safety issue with ZStd compressor. #1208
- Fixed crash in consolidation due to accessing invalid entry #1213
- Fixed memory leak in C++ KV API. #1247
- Fixed minor bug when writing in global order with empty buffers. #1248
The 1.5.0 release focuses on stability, performance, and usability improvements, as well a new consolidation algorithm.
- Added an advanced, tunable consolidation algorithm. #1101
- Small tiles are now batched for larger VFS read operations, improving read performance in some cases. #1093
- POSIX error messages are included in log messages. #1076
- Added
tiledb
command-line tool with several supported actions. #1081 - Added build flag to disable internal statistics. #1111
- Improved memory overhead slightly by lazily allocating memory before checking the tile cache. #1141
- Improved tile cache utilization by removing erroneous use of cache for metadata. #1151
- S3 multi-part uploads are aborted on error. #1166
- Bug fix when reading from a sparse array with real domain. Also added some checks on NaN and INF. #1100
- Fixed C++ API functions
group_by_cell
andungroup_var_buffer
to treat offsets in units of bytes. #1047 - Several HDFS test build errors fixed. #1092
- Fixed incorrect indexing in
parallel_for
. #1105 - Fixed incorrect filter statistics counters. #1112
- Preserve anonymous attributes in
ArraySchema
copy constructor. #1144 - Fix non-virtual destructors in C++ API. #1153
- Added zlib dependency to AWS SDK EP. #1165
- Fixed a hang in the 'S3::ls()'. #1183
- Many other small and miscellaneous bug fixes.
- Added function
tiledb_vfs_dir_size
. - Added function
tiledb_vfs_ls
. - Added config params
vfs.max_batch_read_size
andvfs.max_batch_read_amplification
. - Added functions
tiledb_{array,kv}_encryption_type
. - Added functions
tiledb_stats_{dump,free}_str
. - Added function
tiledb_{array,kv}_schema_has_attribute
. - Added function
tiledb_domain_has_dimension
.
{Array,Map}::consolidate{_with_key}
now takes aConfig
as an optional argument.- Added function
VFS::dir_size
. - Added function
VFS::ls
. - Added
{Array,Map}::encryption_type()
. - Added
{ArraySchema,MapSchema}::has_attribute()
- Added
Domain::has_dimension()
- Added constructor overloads for
Array
andMap
to take astd::string
encryption key. - Added overloads for
{Array,Map}::{open,create,consolidate}
to take astd::string
encryption key. - Added untyped overloads for
Query::set_buffer()
.
- Deprecated
tiledb_compressor_t
APIs from v1.3.x have been removed, replaced by thetiledb_filter_list
API. #1128 tiledb_{array,kv}_consolidate{_with_key}
now takes atiledb_config_t*
as argument.
- Deprecated
tiledb::Compressor
APIs from v1.3.x have been removed, replaced by theFilterList
API. #1128
- Fixed support for config parameter values
sm.num_reader_threads
and `sm.num_writer_threads. User-specified values had been ignored for these parameters. #1096 - Fixed GCC 7 linker errors. #1091
- Bug fix in the case of dense reads in the presence of both dense and sparse fragments. #1079
- Fixed double-delta decompression bug on reads for uncompressible chunks. #1074
- Fixed unnecessary linking of shared zlib when using TileDB superbuild. #1125
- Added lazy creation of S3 client instance on first request. #1084
- Added config params
vfs.s3.aws_access_key_id
andvfs.s3.aws_secret_access_key
for configure s3 access at runtime. #1036 - Added missing check if coordinates obey the global order in global order sparse writes. #1039
- Fixed bug in incomplete queries, which should always return partial results. An incomplete status with 0 returned results must always mean that the buffers cannot even fit a single cell value. #1056
- Fixed performance bug during global order write finalization. #1065
- Fixed error in linking against static TileDB on Windows. #1058
- Fixed build error when building without TBB. #1051
- Set LZ4, Zlib and Zstd compressors to build in release mode. #1034
- Changed coordinates to always be split before filtering. #1054
- Added type-safe filter option methods to C++ API. #1062
The 1.4.0 release brings two new major features, attribute filter lists and at-rest array encryption, along with bugfixes and performance improvements.
Note: TileDB v1.4.0 changes the on-disk array format. Existing arrays should be re-written using TileDB v1.4.0 before use. Starting from v1.4.0 and on, backwards compatibility for reading old-versioned arrays will be fully supported.
- All array data can now be encrypted at rest using AES-256-GCM symmetric encryption. #968
- Negative and real-valued domain types are now fully supported. #885
- New filter API for transforming attribute data with an ordered list of filters. #912
- Current filters include: previous compressors, bit width reduction, bitshuffle, byteshuffle, and positive-delta encoding.
- The bitshuffle filter uses an implementation by Kiyoshi Masui.
- The byteshuffle filter uses an implementation by Francesc Alted (from the Blosc project).
- Arrays can now be opened at specific timestamps. #984
- The C and C++ APIs for compression have been deprecated. The corresponding filter API should be used instead. The compression API will be removed in a future TileDB version. #1008
- Removed Blosc compressors (obviated by byteshuffle -> compressor filter list).
- Fix issue where performing a read query with empty result could cause future reads to return empty #882
- Fix TBB initialization bug with multiple contexts #898
- Fix bug in max buffer sizes estimation #903
- Fix Buffer allocation size being incorrectly set on realloc #911
- Added check if the coordinates fall out-of-bounds (i.e., outside the array domain) during sparse writes, and added config param
sm.check_coord_oob
to enable/disable the check (enabled by default). #996 - Add config params
sm.num_reader_threads
andsm.num_writer_threads
for separately controlling I/O parallelism from compression parallelism. - Added contribution guidelines #899
- Enable building TileDB in Cygwin environment on Windows #890
- Added a simple benchmarking script and several benchmark programs #889
- Changed C API and disk format integer types to have explicit bit widths. #981
- Added
tiledb_{array,kv}_open_at
,tiledb_{array,kv}_open_at_with_key
andtiledb_{array,kv}_reopen_at
. - Added
tiledb_{array,kv}_get_timestamp()
. - Added
tiledb_kv_is_open
- Added
tiledb_filter_t
tiledb_filter_type_t
,tiledb_filter_option_t
, andtiledb_filter_list_t
types - Added
tiledb_filter_*
andtiledb_filter_list_*
functions. - Added
tiledb_attribute_{set,get}_filter_list
,tiledb_array_schema_{set,get}_coords_filter_list
,tiledb_array_schema_{set,get}_offsets_filter_list
functions. - Added
tiledb_query_get_buffer
andtiledb_query_get_buffer_var
. - Added
tiledb_array_get_uri
- Added
tiledb_encryption_type_t
- Added
tiledb_array_create_with_key
,tiledb_array_open_with_key
,tiledb_array_schema_load_with_key
,tiledb_array_consolidate_with_key
- Added
tiledb_kv_create_with_key
,tiledb_kv_open_with_key
,tiledb_kv_schema_load_with_key
,tiledb_kv_consolidate_with_key
- Added encryption overloads for
Array()
,Array::open()
,Array::create()
,ArraySchema()
,Map()
,Map::open()
,Map::create()
andMapSchema()
. - Added
Array::timestamp()
andArray::reopen_at()
methods. - Added
Filter
andFilterList
classes - Added
Attribute::filter_list()
,Attribute::set_filter_list()
,ArraySchema::coords_filter_list()
,ArraySchema::set_coords_filter_list()
,ArraySchema::offsets_filter_list()
,ArraySchema::set_offsets_filter_list()
functions. - Added overloads for
Array()
,Array::open()
,Map()
,Map::open()
for handling timestamps.
- Removed Blosc compressors.
- Removed
tiledb_kv_set_max_buffered_items
. - Modified
tiledb_kv_open
to not take an attribute subselection, but instead take as input the query type (similar to arrays). This makes the key-value store behave similarly to arrays, which means that the key-value store does not support interleaved reads/writes any more (an opened key-value store is used either for reads or writes, but not both). tiledb_kv_close
does not flush the written items. Instead,tiledb_kv_flush
must be invoked explicitly.
- Removed Blosc compressors.
- Removed
Map::set_max_buffered_items
. - Modified
Map::Map
andMap::open
to not take an attribute subselection, but instead take as input the query type (similar to arrays). This makes the key-value store behave similarly to arrays, which means that the key-value store does not support interleaved reads/writes any more (an opened key-value store is used either for reads or writes, but not both). Map::close
does not flush the written items. Instead,Map::flush
must be invoked explicitly.
- Fix read query bug from multiple fragments when query layout differs from array layout #869
- Fix error when consolidating empty arrays #861
- Fix bzip2 external project URL #875
- Invalidate cached buffer sizes when query subarray changes #882
- Add check to ensure tile extent greater than zero #866
- Add
TILEDB_INSTALL_LIBDIR
CMake option #858 - Remove
TILEDB_USE_STATIC_*
CMake variables from build #871 - Allow HDFS init to succeed even if libhdfs is not found #873
- Add missing checks when setting subarray for sparse writes #843
- Fix dl linking build issue for C/C++ examples on Linux #844
- Add missing type checks for C++ api Query object #845
- Add missing check that coordinates are provided for sparse writes #846
Version 1.3.0 focused on performance, stability, documentation and API improvements/enhancements.
- New guided tutorial series added to documentation.
- Query times improved dramatically with internal parallelization using TBB (multiple PRs)
- Optional deduplication pass on writes can be enabled #636
- New internal statistics reporting system to aid in performance optimization #736
- Added string type support: ASCII, UTF-8, UTF-16, UTF-32, UCS-2, UCS-4 #415
- Added
TILEDB_ANY
datatype #446 - Added parallelized VFS read operations, enabled by default #499
- SIGINT signals will cancel in-progress queries #578
- Arrays must now be open and closed before issuing queries, which clarifies the TileDB consistency model.
- Improved handling of incomplete queries and variable-length attribute data.
- Added parallel S3, POSIX, and Win32 reads and writes, enabled by default.
- Query performance improvements with parallelism (using TBB as a dependency).
- Got rid of special S3 "directory objects."
- Refactored sparse reads, making them simpler and more amenable to parallelization.
- Refactored dense reads, making them simpler and more amenable to parallelization.
- Refactored dense ordered writes, making them simpler and more amenable to parallelization.
- Refactored unordered writes, making them simpler and more amenable to parallelization.
- Refactored global writes, making them simpler and more amenable to parallelization.
- Added ability to cancel pending background/async tasks. SIGINT signals now cancel pending tasks.
- Async queries now use a configurable number of background threads (default number of threads is 1).
- Added checks for duplicate coordinates and option for coordinate deduplication.
- Map usage via the C++ API
operator[]
is faster, similar to theMapItem
path.
- Fixed bugs with reads/writes of variable-sized attributes.
- Fixed file locking issue with simultaneous queries.
- Fixed S3 issues with simultaneous queries within the same context.
- Added
tiledb_array_alloc
- Added
tiledb_array_{open, close, free}
- Added
tiledb_array_reopen
- Added
tiledb_array_is_open
- Added
tiledb_array_get_query_type
- Added
tiledb_array_get_schema
- Added
tiledb_array_max_buffer_size
andtiledb_array_max_buffer_size_var
- Added
tiledb_query_finalize
function. - Added
tiledb_ctx_cancel_tasks
function. - Added
tiledb_query_set_buffer
andtiledb_query_set_buffer_var
which sets a single attribute buffer - Added
tiledb_query_get_type
- Added
tiledb_query_has_results
- Added
tiledb_vfs_get_config
function. - Added
tiledb_stats_{enable,disable,reset,dump}
functions. - Added
tiledb_kv_alloc
- Added
tiledb_kv_reopen
- Added
tiledb_kv_has_key
to check if a key exists in the key-value store. - Added
tiledb_kv_free
. - Added
tiledb_kv_iter_alloc
which takes as input a kv object - Added
tiledb_kv_schema_{set,get}_capacity
. - Added
tiledb_kv_is_dirty
- Added
tiledb_kv_iter_reset
- Added
sm.num_async_threads
,sm.num_tbb_threads
, andsm.enable_signal_handlers
config parameters. - Added
sm.check_dedup_coords
andsm.dedup_coords
config parameters. - Added
vfs.num_threads
andvfs.min_parallel_size
config parameters. - Added
vfs.{s3,file}.max_parallel_ops
config parameters. - Added
vfs.s3.multipart_part_size
config parameter. - Added
vfs.s3.proxy_{scheme,host,port,username,password}
config parameters.
- Added
Array::{open, close}
- Added
Array::reopen
- Added
Array::is_open
- Added
Array::query_type
- Added
Context::cancel_tasks()
function. - Added
Query::finalize()
function. - Added
Query::query_type
- Added
Query::has_results
- Changed the return type of the
Query
setters to return the object reference. - Added an extra
Query
constructor that omits the query type (this is inherited from the input array). - Added
Map::{open, close}
- Added
Map::reopen
- Added
Map::is_dirty
- Added
Map::has_key
to check for key presence. - A
tiledb::Map
defined with only one attribute will allow implicit usage, e.x.map[key] = val
instead ofmap[key][attr] = val
. - Added optional attributes argument in
Map::Map
andMap::open
MapIter
can be used to create iterators for a map.- Added
MapIter::reset
- Added
MapSchema::set_capacity
andMapSchema::capacity
- Support for trivially copyable objects, such as a custom data struct, was added. They will be backed by an
sizeof(T)
sizedchar
attribute. Attribute::create<T>
can now be used with compoundT
, such asstd::string
andstd::vector<T>
, and other objects such as a simple data struct.- Added a
Dimension::create
factory function that does not take tile extent, which sets the tile extent toNULL
. tiledb::Attribute
can now be constructed with an enumerated type (e.x.TILEDB_CHAR
).- Added
Stats
class (wraps C APItiledb_stats_*
functions) - Added
Config::save_to_file
tiledb_query_finalize
must always be called beforetiledb_query_free
after global-order writes.- Removed
tiledb_vfs_move
and addedtiledb_vfs_move_file
andtiledb_vfs_move_dir
instead. - Removed
force
argument fromtiledb_vfs_move_*
andtiledb_object_move
. - Removed
vfs.s3.file_buffer_size
config parameter. - Removed
tiledb_query_get_attribute_status
. - All
tiledb_*_free
functions now returnvoid
and do not takectx
as input (except fortiledb_ctx_free
). - Changed signature of
tiledb_kv_close
to take atiledb_kv_t*
argument instead oftiledb_kv_t**
. - Renamed
tiledb_domain_get_rank
totiledb_domain_get_ndim
to avoid confusion with matrix def of rank. - Changed signature of
tiledb_array_get_non_empty_domain
. - Removed
tiledb_array_compute_max_read_buffer_sizes
. - Changed signature of
tiledb_{array,kv}_open
. - Removed
tiledb_kv_iter_create
- Renamed all C API functions that create TileDB objects from
tiledb_*_create
totiledb_*_alloc
. - Removed
tiledb_query_set_buffers
- Removed
tiledb_query_reset_buffers
- Added query type argument to
tiledb_array_open
- Changed argument order in
tiledb_config_iter_alloc
,tiledb_ctx_alloc
,tiledb_attribute_alloc
,tiledb_dimension_alloc
,tiledb_array_schema_alloc
,tiledb_kv_schema_load
,tiledb_kv_get_item
,tiledb_vfs_alloc
- Fixes with
Array::max_buffer_elements
andQuery::result_buffer_elements
to comply with the API docs.pair.first
is the number of elements of the offsets buffer. Ifpair.first
is 0, it is a fixed-sized attribute or coordinates. std::array<T, N>
is backed by achar
tiledb attribute since the size is not guaranteed.- Headers have the
tiledb_cpp_api_
prefix removed. For example, the include is now#include <tiledb/attribute.h>
- Removed
VFS::move
and addedVFS::move_file
andVFS::move_dir
instead. - Removed
force
argument fromVFS::move_*
andObject::move
. - Removed
vfs.s3.file_buffer_size
config parameter. Query::finalize
must always be called before going out of scope after global-order writes.- Removed
Query::attribute_status
. - The API was made header only to improve cross-platform compatibility.
config_iter.h
,filebuf.h
,map_item.h
,map_iter.h
, andmap_proxy.h
are no longer available, but grouped into the headers of the objects they support. - Previously a
tiledb::Map
could be created from astd::map
, an anonymous attribute name was defined. This must now be explicitly defined:tiledb::Map::create(tiledb::Context, std::string uri, std::map, std::string attr_name)
- Removed
tiledb::Query::reset_buffers
. Any previous usages can safely be removed. Map::begin
refers to the same iterator object. For multiple concurrent iterators, aMapIter
should be manually constructed instead of usingMap::begin()
more than once.- Renamed
Domain::rank
toDomain::ndim
to avoid confusion with matrix def of rank. - Added query type argument to
Array
constructor - Removed iterator functionality from
Map
. - Removed
Array::parition_subarray
.
- Fix I/O bug on POSIX systems with large reads/writes (#467)
- Memory overflow error handling (moved from constructors to init functions) (#472)
- Memory leaks with realloc in case of error (#472)
- Handle non-existent config param in C++ API (#475)
- Read query overflow handling (#485)
- Changed S3 default config so that AWS S3 just works (#455)
- Minor S3 optimizations and error message fixes (#462)
- Documentation additions including S3 usage (#456, #458, #459)
- Various CI improvements (#449)
- Fixed TileDB header includes for all examples (#409)
- Fixed TileDB library dynamic linking problem for C++ API (#412)
- Fixed VS2015 build errors (#424)
- Bug fix in the sparse case (#434)
- Bug fix for 1D vector query layout (#438)
- Added documentation to API and examples (#410, #414)
- Migrated docs to Readthedocs (#418, #420, #422, #423, #425)
- Added dimension domain/tile extent checks (#429)
The 1.2.0 release of TileDB includes many new features, improvements in stability and performance, and two new language interfaces (Python and C++). There are also several breaking changes in the C API and on-disk format, documented below.
Important Note: due to several improvements and changes in the underlying array storage mechanism, you will need to recreate any existing arrays in order to use them with TileDB v1.2.0.
- Windows support. TileDB is now fully supported on Windows systems (64-bit Windows 7 and newer).
- Python API. We are very excited to announce the initial release of a Python API for TileDB. The Python API makes TileDB accessible to a much broader audience, allowing seamless integration with existing Python libraries such as NumPy, Pandas and the scientific Python ecosystem.
- C++ API. We've included a C++ API, which allows TileDB to integrate into modern C++ applications without having to write code towards the C API. The C++ API is more concise and provides additional compile time type safety.
- S3 object store support. You can now easily store, query, and manipulate your TileDB arrays on S3 API compatibile object stores, including Amazon's AWS S3 service.
- Virtual filesystem interface. The TileDB API now exposes a virtual filesystem (or VFS) interface, allowing you to perform tasks such as file creation, deletion, reads, and appends without worrying about whether your files are stored on S3, HDFS, a POSIX or Windows filesystem, etc.
- Key-value store. TileDB now provides a key-value (meta) data storage abstraction. Its implementation is built upon TileDB's sparse arrays and inherits all the properties of TileDB sparse arrays.
- Homebrew formula added for easier installation on macOS. Homebrew is now the perferred method for installing TileDB and its dependencies on macOS.
- Docker images updated to include stable/unstable/dev options, and easy configuration of additional components (e.g. S3 support).
- Tile cache implemented, which will greatly speed up repeated queries on overlapping regions of the same array.
- Ability to pass runtime configuration arguments to TileDB/VFS backends.
- Unnamed (or "anonymous") dimensions are now supported; having a single anonymous attribute is also supported.
- Concurrency bugfixes for several compressors.
- Correctness issue fixed in double-delta compressor for some datatypes.
- Better build behavior on systems with older GCC or CMake versions.
- Several memory leaks and overruns fixed with help of sanitizers.
- Many improved error condition checks and messages for easier debugging.
- Many other small bugs and API inconsistencies fixed.
tiledb_config_*
: Types and functions related to the new configuration object and functionality.tiledb_config_iter_*
: Iteration functionality for retieving parameters/values from the new configuration object.tiledb_ctx_get_config()
: Function to get a configuration from a context.tiledb_filesystem_t
: Filesystem type enum.tiledb_ctx_is_supported_fs()
: Function to check for support for a given filesystem backend.tiledb_error_t
,tiledb_error_message()
andtiledb_error_free()
: Type and functions for TileDB error messages.tiledb_ctx_get_last_error()
: Function to get last error from context.tiledb_domain_get_rank()
: Function to retrieve number of dimensions in a domain.tiledb_domain_get_dimension_from_index()
andtiledb_domain_get_dimension_from_name()
: Replaces dimension iterators.tiledb_dimension_{create,free,get_name,get_type,get_domain,get_tile_extent}()
: Functions related to creation and manipulation oftiledb_dimension_t
objects.tiledb_array_schema_set_coords_compressor()
: Function to set the coordinates compressor.tiledb_array_schema_set_offsets_compressor()
: Function to set the offsets compressor.tiledb_array_schema_get_attribute_{num,from_index,from_name}()
: Replaces attribute iterators.tiledb_query_create()
: Replaced many arguments with newtiledb_query_set_*()
setter functions.tiledb_array_get_non_empty_domain()
: Function to retrieve the non-empty domain from an array.tiledb_array_compute_max_read_buffer_sizes()
: Function to compute an upper bound on the buffer sizes required for a read query.tiledb_object_ls()
: Function to visit the children of a path.tiledb_uri_to_path()
: Function to convert a file:// URI to a platform-native path.TILEDB_MAX_PATH
andtiledb_max_path()
: The maximum length for tiledb resource paths.tiledb_kv_*
: Types and functions related to the new key-value store functionality.tiledb_vfs_*
: Types and functions related to the new virtual filesystem (VFS) functionality.
- Rename
tiledb_array_metadata_t
->tiledb_array_schema_t
, and associatedtiledb_array_metadata_*
functions totiledb_array_schema_*
. - Remove
tiledb_attribute_iter_t
. - Remove
tiledb_dimension_iter_t
. - Rename
tiledb_delete()
,tiledb_move()
,tiledb_walk()
totiledb_object_{delete,move,walk}()
. tiledb_ctx_create
: Config argument added.tiledb_domain_create
: Datatype argument removed.tiledb_domain_add_dimension
: Name, domain and tile extent arguments replaced with singletiledb_dimension_t
argument.tiledb_query_create()
: Replaced many arguments with newtiledb_query_set_*()
setter functions.tiledb_array_create()
: Added array URI argument.tiledb_*_free()
: All free functions now take a pointer to the object pointer instead of simply object pointer.- The include files are now installed into a
tiledb
folder. The correct path is now#include <tiledb/tiledb.h>
(or#include <tiledb/tiledb>
for the C++ API).
- Support for moving resources across previous VFS backends (local fs <-> HDFS) has been removed. A more generic implementation for this functionality with improved performance is planned for the next version of TileDB.