Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparent Item movements #251

Closed
wants to merge 4 commits into from

Conversation

vinser52
Copy link
Contributor

The new algorithm relies on the moving bit and does not require external synchronization. Data movement happens transparently for the client: if the client thread attempts to get a handle for the item being moved it will get a handle with wait context to wait till the movement is completed.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 18, 2023
@vinser52 vinser52 force-pushed the upstream_transparent_move branch 2 times, most recently from c0ecd6d to ab70b7d Compare August 18, 2023 12:49
@vinser52
Copy link
Contributor Author

Hi @haowu14, @therealgymmy,

Could you please take a look at this PR?

@haowu14
Copy link
Contributor

haowu14 commented Sep 13, 2023

@vinser52 In the middle of reviewing this. I can publish some of the comments later today.

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just FYI @haowu14 is on vacation and will be out until Oct 1st)

I haven't started CacheAllocator-inl.h yet. Saving comments.

Separately, it's been a while since I reviewed the chained items logic. Just thought of this:

  • Instead of using chained item hash table and using the parent’s address as the chained item’s key which always feels kinda hacky.
    • Can we simply re-allocate parent and add the additional four bytes to link to the first chained item?
    • We can use a “bit” in the item header (which we already have) to denote whether or not an item is a regular or first of a chained item
    • The overhead is in re-allocation of the parent. But that is only once on first expanding the item.
    • We can also add a special API to always allocate the parent as a chained item by default. This works well for use-cases that are always allocating for chains of items.

// Bumps up the reference count only if the new count will be strictly less
// than or equal to the maxCount and the item is not exclusive
// @return true if refcount is bumped. false otherwise (if item is exclusive)
// @throw exception::RefcountOverflow if new count would be greater than
// maxCount
FOLLY_ALWAYS_INLINE bool incRef() {
auto predicate = [](const Value curValue) {
FOLLY_ALWAYS_INLINE incResult incRef() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments that detail the scenarios when a refcount bump will fail?

E.g. when exclusive bit is held, but we can still bump it if there're pending access-count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. refcount bump fail if exclusive bit is set. There are two possible reasons for that:

  1. Item is moving (exclusive bit is set and refcount > 0).
  2. Item is evicted (only exclusive bit is set).

@@ -2175,6 +2280,22 @@ class CacheAllocator : public CacheBase {
// poolResizer_, poolOptimizer_, memMonitor_, reaper_
mutable std::mutex workersMutex_;

static constexpr size_t kShards = 8192; // TODO: need to define right value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably okay. We use 8K shards for NvmCache as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove this TODO comment?
As an alternative, we can add a config option to allow customization of this parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah removing the comment is fine. 8192 is a reasonable number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +2153 to +2145
// We'll return a miss to the user's pending read,
// so we should enqueue a delete via NvmCache.
// TODO: cache.remove(it);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can change the logic to re-throw the exception (both MoveContext and also for regular WaitContext). I'm thinking of propagating the refcountOverflowed to each write-handle, and when user calls get(), we'll re-throw. This only needs to be applied if there's an outstanding wait-context. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to suggest this change in this diff. We can make this change on our side in a separate diff.


size_t wakeUpWaitersLocked(folly::StringPiece key, WriteHandle&& handle);

class MoveCtx {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to GetCtx in NvmCache. Can you abstract out the common logic?

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saving a few more comments

XDCHECK(!oldItem.isInMMContainer());
XDCHECK(newItemHdl->isInMMContainer());

replaceInChainLocked(oldItem, newItemHdl, parent, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: false /* fromMove */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 972 to 973
if (tryGetHandleWithWaitContextForMovingItem(*it, handle))
return handle;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: curly braces

if (...) {
  return handle;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1185 to 1187
if (!item.isMoving())
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: curly braces

if (...) {
  return false
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
// item is being evicted
return WriteHandle{};
while (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions to clarify the behavior in this block

(1) L972 If the item is in the middle of moving, we will return a wait-handle. What happens when the move has just been completed when we call into tryGetHandleWithWaitContextForMovingItem()? Would we be "incRef" on the previous item that has just been moved?

(2) L966 - Are we guaranteed to end up here in the middle of an eviction? Can there be a case where an item has already been evicted and we happen to call acquire? (IIRC we no longer synchronize on the access-container lock for eviction, right?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed (1) in a sync. Writing down summary. TLDR: not an issue.

In order for us to unmark an item as moving, we must have removed it from the AccessContainer. acquire() also happens under AccessContainer lock (this is called from within a callback within AccessContainer). This means in order for us to find an item and successfully increment refcount, this item must be in either of the two states: (A) item is valid and has not been moved yet (B) this is a new item that had been moved from an older item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) We rely on the access container (AC) lock. Let me describe the scenario:
Thread 1 (BG Move thread):

  1. Mark moving
  2. Allocate new item
  3. Copy data
  4. Update MM containers
  5. Update Access container (under AC lock)
  6. Unmark moving
  7. Notify waiters and remove context under (under Move lock)

Client thread (lookup):

  1. Lookup in Access Container (under AC lock)
  2. Acquire called under AC lock
  3. a. Acquire called before Thread 1 step 5. In that case this thread will acquire Move lock and add itself to the wait context.
  4. b. Acquire called after Thread 1 step 5. In that case this thread will get the new Item (already moved).

The tryGetHandleWithWaitContextForMovingItem might return false if the move failed and the item should be evicted (markForEvictionWhenMoving is going to be called). That is why we need this while loop - on the next iteration incRef will return RefcountWithFlags::incResult::incFailedEviction.

(2) I think there is no issue with that because the acquire called under AC lock.

reinterpret_cast<uintptr_t>(&parentHandle->asChainedItem())) {
newItemHdl.reset();
reinterpret_cast<uintptr_t>(&parentItem.asChainedItem())) {
XDCHECK(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why assert? Is this guaranteed not to happen anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code was introduced by our team for debugging purposes and wee forgot to clean up it before upstreaming. Furthermore, below at L1315 there is the assert to check the same. So I just remove this if block.

Comment on lines 2750 to 2751
// we can't rely on an item being marked moving because
// it may have previously been a chained item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are right. Removed.

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last few comments.

Overall, I'm not yet confident I follow the new logic on slab release yet.

Right now my understanding on the high level flow:

  1. Mark an item as moving (exclusive and +1 refcount) -> blocking future readers behind wait-context
  2. Attempt to move the item when refcount drops to exactly "1"
  3. Move the item (separate logic for regular and chained items)
  4. If failed to move, evict the item (separate logic for regular and chained items)
  5. In both (3) and (4), we'll wake up any pending waiters at the end.

My questions inline are mostly to understand (3) and (4) in more details. Specifically I want to be sure we're indeed able to operate on an item safely (exclusibe bit and refcount == 1 exactly) in both cases.

bool ret = oldItem.isChainedItem()
? moveChainedItem(oldItem.asChainedItem(), newItemHdl)
: moveRegularItem(oldItem, newItemHdl);
removeFromMMContainer(oldItem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always remove item from mmContainer here? (Comments seem to suggest we only remove it here if move failed)

Btw, merge this back into moveForSlabRelease logic?


XDCHECK(evicted->isMoving());
token = createPutToken(*evicted);
auto ret = evicted->markForEvictionWhenMoving();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed to end here with the item having exclusive bit and also refcount == 1?

Marking an item as moving means the refcount >= 1, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving state means that exclusive bit is set and refcount is equals 1.
After this method ref count just drops from 1 to 0. And the exclusive bit is still set.

Comment on lines -2523 to -2525
for (unsigned int itemMovingAttempts = 0;
itemMovingAttempts < config_.movingTries;
++itemMovingAttempts) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we remove the moving attempts? In the new logic, once we marking the moving bit, the item can still have refcount > 1, right? Future readers will be blocked on a wait-context but current readers are not hindered. Wouldn't we still need this logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, with the new approach we are able to mark an Item as Moving only if there are no any active readers. The refcount of regular Item when moving bit is set is always equals 1. For chained Item we are marking the parent Item as moving and refcount of parent Item is also equals 1 in that case.

@vinser52 vinser52 force-pushed the upstream_transparent_move branch 3 times, most recently from 635eb22 to bada673 Compare October 27, 2023 14:15
The new algorithm relies on the moving bit and does not require external
synchronization. Data movement happens transparently for the client: if
the client thread attempts to get a handle for the item being moved it
will get a handle with wait context to wait till the movement is
completed.
@vinser52 vinser52 force-pushed the upstream_transparent_move branch from bada673 to 807dbda Compare October 27, 2023 14:46
@vinser52 vinser52 force-pushed the upstream_transparent_move branch from 807dbda to 3d73cfe Compare November 8, 2023 13:42
@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@haowu14 merged this pull request in 8655d6b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants