Do not crash when locations are null(!) #14

AmlingPalantir · 2015-10-26T23:19:46Z

We have observed null locations (specifically write, flush, and replay) in the field crashing manatee-adm pg-status.

davepacheco · 2015-10-27T00:18:00Z

Thanks for this. Under what conditions are the locations null though?

AmlingPalantir · 2015-10-27T01:31:06Z

Let's see, IIRC this was right after I had restarted the sitter after rebuilding a deposed primary. For some time the row for sync (from which the rebuilt deposed primary would soon be replicating) would show "0/0" under "SENT" and then crash. Inspection of internals showed WRITE, FLUSH, and REPLAY were all null. My guess is this was as the new box was starting up replication, or something. Sorry I don't really have much but I am able to reproduce this very easily (every failover) if you need to know more.

davepacheco · 2015-11-12T22:30:44Z

@AmlingPalantir Given that it's easy to reproduce, could you post the output that came back from Postgres? If it were me, I'd just run with --abort-on-uncaught-exception and look at the actual value in the core file when it crashes. If it's easier, maybe you could console.log() it instead?

If it's not clear, the context is that I don't like to make changes that just assume any arbitrary value can be null. I'd rather have a concrete understanding of exactly what it means when various fields are null, and make sure that's documented and dealt with everywhere it matters.

AmlingPalantir · 2015-11-17T20:47:58Z

Whoops, here's the row from pg_stat_replication that it is barfing on:

{ pid: 61060,
  usesysid: 10,
  usename: 'palantir',
  application_name: 'pg_basebackup',
  client_addr: '<ip>',
  client_hostname: null,
  client_port: 47944,
  backend_start: Tue Nov 17 2015 15:36:08 GMT-0500 (EST),
  backend_xmin: null,
  state: 'backup',
  sent_location: '0/0',
  write_location: null,
  flush_location: null,
  replay_location: null,
  sync_priority: 0,
  sync_state: 'async' }

Our fork uses pg_basebackup in place of ZFS snapshots and this looks to be
[more] fallout from that.

davepacheco · 2015-11-19T22:30:11Z

Given that context, I'm not sure it's sufficient to just set the locations to "-". Wouldn't it make more sense to label the replication kind as "pgbackup" or something to indicate that pg_basebackup is being used? Between that and talking with David yesterday, I wonder if it doesn't make more sense to fold this into the other changes you all have and get them all upstreamed together. Let me know what you think. If for whatever reason it would be helpful to get this change in sooner or separately, we should figure out how we actually want this tool to behave on this kind of replication row. I don't think presenting the locations as "-" is quite right.

Do not crash when locations are null(!)

73d82fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not crash when locations are null(!) #14

Do not crash when locations are null(!) #14

AmlingPalantir commented Oct 26, 2015

davepacheco commented Oct 27, 2015

AmlingPalantir commented Oct 27, 2015

davepacheco commented Nov 12, 2015

AmlingPalantir commented Nov 17, 2015

davepacheco commented Nov 19, 2015

Do not crash when locations are null(!) #14

Are you sure you want to change the base?

Do not crash when locations are null(!) #14

Conversation

AmlingPalantir commented Oct 26, 2015

davepacheco commented Oct 27, 2015

AmlingPalantir commented Oct 27, 2015

davepacheco commented Nov 12, 2015

AmlingPalantir commented Nov 17, 2015

davepacheco commented Nov 19, 2015