Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: fix pfree valueBuffer #273

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ivan-v-kush
Copy link
Contributor

@ivan-v-kush ivan-v-kush commented Dec 18, 2024

Hello.
This MR fixes an issue with a cache. Currently cache entries contain already freed memory. MR fixes this.

SQL code to reproduce

broken_cache.txt

Steps:

  1. create a table
  2. convert it to columnar
  3. enable caching
  4. try convert to heap...
  5. get an error:
pfree called with invalid pointer 0x5414c18 (header 0x0000000000000000) 

Description

Currently an error

pfree called with invalid pointer 0x5414c18 (header 0x0000000000000000) 

occurs when cache is full and we try to free up space by 10%.
We call in the EvictCache

pfree(str->data);

but str->data contains a garbage, so BogusFree with an ERROR is executed (see Path to BogusFree chapter below)

The reason to a garbage is the following.

A valueBuffer (StringInfo) is added to a cache in the ColumnarAddCacheEntry Also it's added to chunkData->valueBufferArray

static ChunkData *
DeserializeChunkData(StripeBuffers *stripeBuffers, uint64 chunkIndex,
					 uint32 rowCount, TupleDesc tupleDescriptor,
					 List *projectedColumnList, StripeReadState *state, uint64 stripeId)
{
...
				if (shouldCache)
				{
					ColumnarAddCacheEntry(state->relation->rd_id, stripeId, chunkIndex, columnIndex, valueBuffer);
					MemoryContextSwitchTo(oldMemoryContext);
				}
			}
....
			chunkData->valueBufferArray[columnIndex] = valueBuffer;
}

In the function FreeChunkBufferValueArray valueBuffer will be freed

void FreeChunkBufferValueArray(ChunkData *chunkData)
{
....
			pfree(chunkData->valueBufferArray[columnIndex]->data);
			pfree(chunkData->valueBufferArray[columnIndex]);
...
}

and also EvictCache tries to freed it as our cache is full if (totalAllocationLength >= (columnar_page_cache_size * 1024 * 1024))

static void
EvictCache(uint64 size)
{
....
				StringInfo str = entry->store;
				if (str->data) {
					pfree(str->data);
				}

				pfree(str);
}

And we got an ERROR.

To fix this, we mustn't add valueBufferArray to chunkData->valueBufferArray when previously have added it to a cache

static ChunkData *
DeserializeChunkData(StripeBuffers *stripeBuffers, uint64 chunkIndex,
					 uint32 rowCount, TupleDesc tupleDescriptor,
					 List *projectedColumnList, StripeReadState *state, uint64 stripeId)
{
...
			if(shouldCache) {
				/* store current chunk's data buffer to be freed at next chunk read */
				chunkData->valueBufferArray[columnIndex] = NULL;
			}
			else {
				chunkData->valueBufferArray[columnIndex] = valueBuffer;
			}

Path to BogusFree

src/backend/columnar/columnar_cache.c
void
ColumnarAddCacheEntry(uint64 relId, uint64 stripeId, uint64 chunkId,
 uint32 columnId, void *data)
{
| 363/* If we are over our cache allocation, clear until we are at 90%. */     │ 
│364 if (totalAllocationLength >= (columnar_page_cache_size * 1024 * 1024))     │ 
│365 {          │ 
>366 EvictCache((columnar_page_cache_size * 1024 * 1024 * .1) +      │ 
│367   (totalAllocationLength - (columnar_page_cache_size * 1024 * 1024)));   │ 
│368}

static void
EvictCache(uint64 size)
{
				StringInfo str = entry->store;
				if (str->data) {
					pfree(str->data);
				}

src/backend/utils/mmgr/mcxt.c
/*
 * pfree
 *		Release an allocated chunk.
 */
void
pfree(void *pointer)
{
...
	MCXT_METHOD(pointer, free_p) (pointer); // calls BogusFree as pointer is broken
.....
}


src/backend/utils/mmgr/mcxt.c
  /*
 * Support routines to trap use of invalid memory context method IDs
 * (from calling pfree or the like on a bogus pointer).  As a possible
 * aid in debugging, we report the header word along with the pointer
 * address (if we got here, there must be an accessible header word).
 */
static void
BogusFree(void *pointer)
{
	elog(ERROR, "pfree called with invalid pointer %p (header 0x%016llx)",
		 pointer, (unsigned long long) GetMemoryChunkHeader(pointer));
}

Backtrace:

BogusFree(void * pointer) (/build/src/backend/utils/mmgr/mcxt.c:239)
pfree(void * pointer) (/build/src/backend/utils/mmgr/mcxt.c:1467)
pg_columnar.so!FreeChunkBufferValueArray(ChunkData * chunkData) (/build/contrib/hydra/src/backend/columnar/columnar_reader.c:1240)
pg_columnar.so!EndChunkGroupRead(ChunkGroupReadState * chunkGroupReadState) (/build/contrib/hydra/src/backend/columnar/columnar_reader.c:1053)
pg_columnar.so!ReadStripeNextRow(StripeReadState * stripeReadState, Datum * columnValues, _Bool * columnNulls, uint64 stripeFirstRowNumber, Snapshot snapshot, uint64 stripeId) (/build/contrib/hydra/src/backend/columnar/columnar_reader.c:993)
pg_columnar.so!ColumnarReadNextRow(ColumnarReadState * readState, Datum * columnValues, _Bool * columnNulls, uint64 * rowNumber) (/build/contrib/hydra/src/backend/columnar/columnar_reader.c:356)
pg_columnar.so!columnar_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot * slot) (/build/contrib/hydra/src/backend/columnar/columnar_tableam.c:403)
pg_columnar.so!table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot * slot) (/build/src/include/access/tableam.h:1066)
pg_columnar.so!ColumnarScanNext(ColumnarScanState * columnarScanState) (/build/contrib/hydra/src/backend/columnar/columnar_customscan.c:2474)
pg_columnar.so!ExecScanFetch(ScanState * node, ExecScanAccessMtd accessMtd, ExecScanRecheckMtd recheckMtd) (/build/contrib/hydra/src/backend/columnar/columnar_customscan.c:2187)
pg_columnar.so!CustomExecScan(ColumnarScanState * columnarScanState, ExecScanAccessMtd accessMtd, ExecScanRecheckMtd recheckMtd) (/build/contrib/hydra/src/backend/columnar/columnar_customscan.c:2217)
pg_columnar.so!ColumnarScan_ExecCustomScan(CustomScanState * node) (/build/contrib/hydra/src/backend/columnar/columnar_customscan.c:2506)
ExecCustomScan(PlanState * pstate) (/build/src/backend/executor/nodeCustom.c:124)
ExecProcNode(PlanState * node) (/build/src/include/executor/executor.h:273)
ExecModifyTable(PlanState * pstate) (/build/src/backend/executor/nodeModifyTable.c:3730)
ExecProcNodeFirst(PlanState * node) (/build/src/backend/executor/execProcnode.c:464)
ExecProcNode(PlanState * node) (/build/src/include/executor/executor.h:273)
ExecutePlan(EState * estate, PlanState * planstate, _Bool use_parallel_mode, CmdType operation, _Bool sendTuples, uint64 numberTuples, ScanDirection direction, DestReceiver * dest, _Bool execute_once) (/build/src/backend/executor/execMain.c:1678)
standard_ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (/build/src/backend/executor/execMain.c:367)
ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (/build/src/backend/executor/execMain.c:311)
_SPI_pquery(QueryDesc * queryDesc, _Bool fire_triggers, uint64 tcount) (/build/src/backend/executor/spi.c:2929)
_SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions * options, Snapshot snapshot, Snapshot crosscheck_snapshot, _Bool fire_triggers) (/build/src/backend/executor/spi.c:2699)
SPI_execute_extended(const char * src, const SPIExecuteOptions * options) (/build/src/backend/executor/spi.c:663)
plpgsql.so!exec_stmt_dynexecute(PLpgSQL_execstate * estate, PLpgSQL_stmt_dynexecute * stmt) (/build/src/pl/plpgsql/src/pl_exec.c:4636)
plpgsql.so!exec_stmts(PLpgSQL_execstate * estate, List * stmts) (/build/src/pl/plpgsql/src/pl_exec.c:2127)
plpgsql.so!exec_stmt_block(PLpgSQL_execstate * estate, PLpgSQL_stmt_block * block) (/build/src/pl/plpgsql/src/pl_exec.c:1971)
plpgsql.so!exec_toplevel_block(PLpgSQL_execstate * estate, PLpgSQL_stmt_block * block) (/build/src/pl/plpgsql/src/pl_exec.c:1639)
plpgsql.so!plpgsql_exec_function(PLpgSQL_function * func, FunctionCallInfo fcinfo, EState * simple_eval_estate, ResourceOwner simple_eval_resowner, ResourceOwner procedure_resowner, _Bool atomic) (/build/src/pl/plpgsql/src/pl_exec.c:628)
plpgsql.so!plpgsql_call_handler(FunctionCallInfo fcinfo) (/build/src/pl/plpgsql/src/pl_handler.c:277)
ExecInterpExpr(ExprState * state, ExprContext * econtext, _Bool * isnull) (/build/src/backend/executor/execExprInterp.c:734)
ExecInterpExprStillValid(ExprState * state, ExprContext * econtext, _Bool * isNull) (/build/src/backend/executor/execExprInterp.c:1870)
ExecEvalExprSwitchContext(ExprState * state, ExprContext * econtext, _Bool * isNull) (/build/src/include/executor/executor.h:355)
ExecProject(ProjectionInfo * projInfo) (/build/src/include/executor/executor.h:389)
ExecResult(PlanState * pstate) (/build/src/backend/executor/nodeResult.c:136)
ExecProcNodeFirst(PlanState * node) (/build/src/backend/executor/execProcnode.c:464)
ExecProcNode(PlanState * node) (/build/src/include/executor/executor.h:273)
ExecutePlan(EState * estate, PlanState * planstate, _Bool use_parallel_mode, CmdType operation, _Bool sendTuples, uint64 numberTuples, ScanDirection direction, DestReceiver * dest, _Bool execute_once) (/build/src/backend/executor/execMain.c:1678)
standard_ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (/build/src/backend/executor/execMain.c:367)
ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (/build/src/backend/executor/execMain.c:311)
PortalRunSelect(Portal portal, _Bool forward, long count, DestReceiver * dest) (/build/src/backend/tcop/pquery.c:924)
PortalRun(Portal portal, long count, _Bool isTopLevel, _Bool run_once, DestReceiver * dest, DestReceiver * altdest, QueryCompletion * qc) (/build/src/backend/tcop/pquery.c:768)
exec_simple_query(const char * query_string, int16 format) (/build/src/backend/tcop/postgres.c:1299)
PostgresMain(const char * dbname, const char * username) (/build/src/backend/tcop/postgres.c:4836)
BackendRun(Port * port) (/build/src/backend/postmaster/postmaster.c:4486)
BackendStartup(Port * port) (/build/src/backend/postmaster/postmaster.c:4206)
ServerLoop() (/build/src/backend/postmaster/postmaster.c:1794)
PostmasterMain(int argc, char ** argv) (/build/src/backend/postmaster/postmaster.c:1478)
main(int argc, char ** argv) (/build/src/backend/main/main.c:198)

@wuputah
Copy link
Member

wuputah commented Dec 18, 2024

@JerrySievert looks ok to me, wdyt?

@JerrySievert
Copy link
Contributor

I'll reproduce locally and give you an informed response soon :)

@ashenBlade
Copy link
Contributor

I also have noticed bug in current implementation. For PostgreSQL 16+ you have function MemoryContextContains - link

#if PG_VERSION_NUM >= PG_VERSION_16
/* Copied from postgres 15 source, since it was removed from 16. */
static bool
MemoryContextContains(MemoryContext context, void *pointer)
{
        MemoryContext ptr_context;

        /*
         * NB: Can't use GetMemoryChunkContext() here - that performs assertions
         * that aren't acceptable here since we might be passed memory not
         * allocated by any memory context.
         *
         * Try to detect bogus pointers handed to us, poorly though we can.
         * Presumably, a pointer that isn't MAXALIGNED isn't pointing at an
         * allocated chunk.
         */
        if (pointer == NULL || pointer != (void *) MAXALIGN(pointer))
                return false;

        /*
         * OK, it's probably safe to look at the context.
         */
        ptr_context = *(MemoryContext *) (((char *) pointer) - sizeof(void *));

        return ptr_context == context;
}
#endif

But this DOES NOT WORK in pg 16 anymore. Starting from this version, theese 8 bytes store bit fields: size + memory context method. This is NOT POINTER. So, this function will always return false.

This function called in FreeChunkBufferValueArray to check for pfree. But condition just translates into chunkData->valueBufferArray[columnIndex] != NULL.

/* FreeChunkValueArrayBuffer relase valueBufferArray memory. */
void FreeChunkBufferValueArray(ChunkData *chunkData)
{
	uint32 columnIndex = 0;

	if (chunkData == NULL)
	{
		return;
	}

	for (columnIndex = 0; columnIndex < chunkData->columnCount; columnIndex++)
	{
		if (chunkData->valueBufferArray[columnIndex] != NULL && !MemoryContextContains(ColumnarCacheMemoryContext(), chunkData->valueBufferArray[columnIndex]))
		{
			pfree(chunkData->valueBufferArray[columnIndex]->data);
			pfree(chunkData->valueBufferArray[columnIndex]);
		}
	}
}

@ivan-v-kush
Copy link
Contributor Author

ivan-v-kush commented Dec 19, 2024

Hm, in this case can be that the idea behind MemoryContextContains is it checks that cache doesn't contain valueBuffer and we can pree it in FreeChunkBufferValueArray?

@ashenBlade
Copy link
Contributor

@JerrySievert @wuputah I have create another PR that fixes this behaviour #274

@ivan-v-kush
Copy link
Contributor Author

ivan-v-kush commented Dec 19, 2024

Updated MR with new info about MemoryContextContains. Removed MemoryContextContains with check

I think this realization is simplier comparing to #274. Here we only NULLify. But in 274 have added a new struct, repalloc...

How do you think @JerrySievert @wuputah @ashenBlade ?

@ashenBlade
Copy link
Contributor

ashenBlade commented Dec 20, 2024

I think this realization is simplier comparing to #274. Here we only NULLify. But in 274 have added a new struct, repalloc...

If we nullify this, it will not be available for us anymore. We will have to read and decompress it again.

@ivan-v-kush
Copy link
Contributor Author

ivan-v-kush commented Dec 20, 2024

@ashenBlade valueBufferArray is only used in the 3 functions.
We assign a value to valueBufferArray in the DeserializeChunkData and nowhere assign valueBufferArray to something, nowhere read it in conditions (only to pfree in the FreeChunkBufferValueArray.
Could you please describe in more detail in what case we will have to read and decompress it again.

I think this realization is simplier comparing to #274. Here we only NULLify. But in 274 have added a new struct, repalloc...

If we nullify this, it will not be available for us anymore. We will have to read and decompress it again.

  1. CreateEmptyChunkData to create valueBufferArray
ChunkData *
CreateEmptyChunkData(uint32 columnCount, bool *columnMask, uint32 chunkGroupRowCount)
{
....
	chunkData->valueBufferArray = palloc0(columnCount * sizeof(StringInfo));
	for (columnIndex = 0; columnIndex < columnCount; columnIndex++)
	{
...
			chunkData->valueBufferArray[columnIndex] = NULL;
...
	}
...
}
  1. DeserializeChunkData sets value of valueBuffer to valueBufferArray
DeserializeChunkData(StripeBuffers *stripeBuffers, uint64 chunkIndex,
					 uint32 rowCount, TupleDesc tupleDescriptor,
					 List *projectedColumnList, StripeReadState *state, uint64 stripeId)
{
...
			/* store current chunk's data buffer to be freed at next chunk read */
			chunkData->valueBufferArray[columnIndex] = valueBuffer;
...
}
  1. FreeChunkBufferValueArray to free the buffer
void FreeChunkBufferValueArray(ChunkData *chunkData)
{
....
		if (chunkData->valueBufferArray[columnIndex] != NULL && !MemoryContextContains....)
		{
			pfree(chunkData->valueBufferArray[columnIndex]->data);
			pfree(chunkData->valueBufferArray[columnIndex]);
		}
...
}

image

@JerrySievert
Copy link
Contributor

I think a better solution is to make a copy of the data, not mess with trying to synchronize a lower and higher level caching mechanism, nor try to cover all of the possible edge-cases.

and add a test replicating the issue to the regression tests.

@ivan-v-kush
Copy link
Contributor Author

@JerrySievert Maybe simply remove ValueBufferArray? I didn't find it used anywhere
#273 (comment)

I'll think how implement a test. We need to determine that cache was used. The only idea came to mind is to add elog that writes "Columnar: clearing the cache by 10 percent has started". If it will be NOTICE level, it may be spam in daily workload. If it will have a DEBUG level, it will spam in test, all DEBUG messages from postgres will be written to test output..

@JerrySievert
Copy link
Contributor

just setting the cache level to fairly low and using the example you provided, but pared down, should trigger the issue.

there should be a logical existing test file to put it into, though

@JerrySievert
Copy link
Contributor

if caching is disabled, the ValueBufferArray is usually helpful

@ivan-v-kush ivan-v-kush force-pushed the cache/fix-pfree-valueBuffer branch from 2f982fb to 3a2505f Compare December 23, 2024 15:53
@ivan-v-kush ivan-v-kush force-pushed the cache/fix-pfree-valueBuffer branch from 3a2505f to 0b1c73c Compare December 23, 2024 15:54
@ivan-v-kush
Copy link
Contributor Author

ahh... found where it's read (columnar_write.c)

@JerrySievert I've added copying and a test. But from performance view, better to use approach without copying from #274

ashenBlade added a commit to ashenBlade/hydra that referenced this pull request Dec 24, 2024
@ivan-v-kush
Copy link
Contributor Author

@JerrySievert hello, what do you think?

@JerrySievert I've added copying and a test. But from performance view, better to use approach without copying from #274

@JerrySievert
Copy link
Contributor

I'm just here as an advisor, I have no merge powers :)

but, it would be up to @wuputah to determine whether the merge makes sense. this doesn't seem to deal with the multiple copy and cache level synchronization though, which is a much bigger task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants