-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transparent Item movements #251
Conversation
c0ecd6d
to
ab70b7d
Compare
Hi @haowu14, @therealgymmy, Could you please take a look at this PR? |
@vinser52 In the middle of reviewing this. I can publish some of the comments later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just FYI @haowu14 is on vacation and will be out until Oct 1st)
I haven't started CacheAllocator-inl.h
yet. Saving comments.
Separately, it's been a while since I reviewed the chained items logic. Just thought of this:
- Instead of using chained item hash table and using the parent’s address as the chained item’s key which always feels kinda hacky.
- Can we simply re-allocate parent and add the additional four bytes to link to the first chained item?
- We can use a “bit” in the item header (which we already have) to denote whether or not an item is a regular or first of a chained item
- The overhead is in re-allocation of the parent. But that is only once on first expanding the item.
- We can also add a special API to always allocate the parent as a chained item by default. This works well for use-cases that are always allocating for chains of items.
// Bumps up the reference count only if the new count will be strictly less | ||
// than or equal to the maxCount and the item is not exclusive | ||
// @return true if refcount is bumped. false otherwise (if item is exclusive) | ||
// @throw exception::RefcountOverflow if new count would be greater than | ||
// maxCount | ||
FOLLY_ALWAYS_INLINE bool incRef() { | ||
auto predicate = [](const Value curValue) { | ||
FOLLY_ALWAYS_INLINE incResult incRef() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments that detail the scenarios when a refcount bump will fail?
E.g. when exclusive bit is held, but we can still bump it if there're pending access-count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. refcount bump fail if exclusive bit is set. There are two possible reasons for that:
- Item is moving (exclusive bit is set and refcount > 0).
- Item is evicted (only exclusive bit is set).
@@ -2175,6 +2280,22 @@ class CacheAllocator : public CacheBase { | |||
// poolResizer_, poolOptimizer_, memMonitor_, reaper_ | |||
mutable std::mutex workersMutex_; | |||
|
|||
static constexpr size_t kShards = 8192; // TODO: need to define right value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably okay. We use 8K shards for NvmCache as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I remove this TODO
comment?
As an alternative, we can add a config option to allow customization of this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah removing the comment is fine. 8192 is a reasonable number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// We'll return a miss to the user's pending read, | ||
// so we should enqueue a delete via NvmCache. | ||
// TODO: cache.remove(it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can change the logic to re-throw the exception (both MoveContext and also for regular WaitContext). I'm thinking of propagating the refcountOverflowed
to each write-handle, and when user calls get()
, we'll re-throw. This only needs to be applied if there's an outstanding wait-context. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to suggest this change in this diff. We can make this change on our side in a separate diff.
|
||
size_t wakeUpWaitersLocked(folly::StringPiece key, WriteHandle&& handle); | ||
|
||
class MoveCtx { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to GetCtx
in NvmCache. Can you abstract out the common logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saving a few more comments
XDCHECK(!oldItem.isInMMContainer()); | ||
XDCHECK(newItemHdl->isInMMContainer()); | ||
|
||
replaceInChainLocked(oldItem, newItemHdl, parent, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: false /* fromMove */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (tryGetHandleWithWaitContextForMovingItem(*it, handle)) | ||
return handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: curly braces
if (...) {
return handle;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (!item.isMoving()) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: curly braces
if (...) {
return false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} else { | ||
// item is being evicted | ||
return WriteHandle{}; | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions to clarify the behavior in this block
(1) L972 If the item is in the middle of moving, we will return a wait-handle. What happens when the move has just been completed when we call into tryGetHandleWithWaitContextForMovingItem()
? Would we be "incRef" on the previous item that has just been moved?
(2) L966 - Are we guaranteed to end up here in the middle of an eviction? Can there be a case where an item has already been evicted and we happen to call acquire? (IIRC we no longer synchronize on the access-container lock for eviction, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed (1) in a sync. Writing down summary. TLDR: not an issue.
In order for us to unmark an item as moving, we must have removed it from the AccessContainer. acquire()
also happens under AccessContainer lock (this is called from within a callback within AccessContainer). This means in order for us to find an item and successfully increment refcount, this item must be in either of the two states: (A) item is valid and has not been moved yet (B) this is a new item that had been moved from an older item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) We rely on the access container (AC) lock. Let me describe the scenario:
Thread 1 (BG Move thread):
- Mark moving
- Allocate new item
- Copy data
- Update MM containers
- Update Access container (under AC lock)
- Unmark moving
- Notify waiters and remove context under (under Move lock)
Client thread (lookup):
- Lookup in Access Container (under AC lock)
- Acquire called under AC lock
- a. Acquire called before Thread 1 step 5. In that case this thread will acquire Move lock and add itself to the wait context.
- b. Acquire called after Thread 1 step 5. In that case this thread will get the new Item (already moved).
The tryGetHandleWithWaitContextForMovingItem
might return false
if the move failed and the item should be evicted (markForEvictionWhenMoving is going to be called). That is why we need this while
loop - on the next iteration incRef
will return RefcountWithFlags::incResult::incFailedEviction
.
(2) I think there is no issue with that because the acquire called under AC lock.
reinterpret_cast<uintptr_t>(&parentHandle->asChainedItem())) { | ||
newItemHdl.reset(); | ||
reinterpret_cast<uintptr_t>(&parentItem.asChainedItem())) { | ||
XDCHECK(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why assert? Is this guaranteed not to happen anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code was introduced by our team for debugging purposes and wee forgot to clean up it before upstreaming. Furthermore, below at L1315 there is the assert to check the same. So I just remove this if
block.
// we can't rely on an item being marked moving because | ||
// it may have previously been a chained item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you are right. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last few comments.
Overall, I'm not yet confident I follow the new logic on slab release yet.
Right now my understanding on the high level flow:
- Mark an item as moving (exclusive and +1 refcount) -> blocking future readers behind wait-context
- Attempt to move the item when refcount drops to exactly "1"
- Move the item (separate logic for regular and chained items)
- If failed to move, evict the item (separate logic for regular and chained items)
- In both (3) and (4), we'll wake up any pending waiters at the end.
My questions inline are mostly to understand (3) and (4) in more details. Specifically I want to be sure we're indeed able to operate on an item safely (exclusibe bit and refcount == 1 exactly) in both cases.
bool ret = oldItem.isChainedItem() | ||
? moveChainedItem(oldItem.asChainedItem(), newItemHdl) | ||
: moveRegularItem(oldItem, newItemHdl); | ||
removeFromMMContainer(oldItem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we always remove item from mmContainer here? (Comments seem to suggest we only remove it here if move failed)
Btw, merge this back into moveForSlabRelease logic?
|
||
XDCHECK(evicted->isMoving()); | ||
token = createPutToken(*evicted); | ||
auto ret = evicted->markForEvictionWhenMoving(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we guaranteed to end here with the item having exclusive bit and also refcount == 1?
Marking an item as moving means the refcount >= 1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving state means that exclusive bit is set and refcount is equals 1
.
After this method ref count just drops from 1 to 0. And the exclusive bit is still set.
for (unsigned int itemMovingAttempts = 0; | ||
itemMovingAttempts < config_.movingTries; | ||
++itemMovingAttempts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we remove the moving attempts? In the new logic, once we marking the moving bit, the item can still have refcount > 1, right? Future readers will be blocked on a wait-context but current readers are not hindered. Wouldn't we still need this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, with the new approach we are able to mark an Item as Moving only if there are no any active readers. The refcount
of regular Item when moving bit is set is always equals 1
. For chained Item we are marking the parent Item as moving and refcount of parent Item is also equals 1
in that case.
635eb22
to
bada673
Compare
The new algorithm relies on the moving bit and does not require external synchronization. Data movement happens transparently for the client: if the client thread attempts to get a handle for the item being moved it will get a handle with wait context to wait till the movement is completed.
bada673
to
807dbda
Compare
807dbda
to
3d73cfe
Compare
Use findInternal instead of acquire
@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
The new algorithm relies on the moving bit and does not require external synchronization. Data movement happens transparently for the client: if the client thread attempts to get a handle for the item being moved it will get a handle with wait context to wait till the movement is completed.