How is ergm using Change Statistics to calculate summary statistics? #527

akumar01 · 2023-05-10T01:35:45Z

akumar01
May 10, 2023

I have a question regarding how the ergm package counts the summary statistics associated with model terms (for example when calling summary(nw ~ ctriple) to get the number of directed 3-ctycles).

I can appreciate that the Metropolis-Hastings steps and subsequent MCMLE optimization steps only require keeping track of how the count of the model terms changes upon edge swaps/toggles, but this whole process is presumably predicated on knowing the initial real count in the observed network.

After digging through the codebase a little, I found the function allstatistics.c that seems to use all possible toggles to somehow get subgraph counts, though it is does not seem to be explicitly used during normal model fitting initialization.

Given that subgraph enumeration is a non-trivial problem with lots of literature (e.g. https://link.springer.com/chapter/10.1007/978-3-540-71681-5_7), I'm a little surprised there is no documentation on how it is done within ergm. Could anyone shed some light on this? I am particularly interested in what is required in order to add new model terms.

Thanks!

Answered by CarterButts

May 10, 2023

Although ergm 4.0 has added some alternatives, the standard way that statistics are calculated is simple: one begins with the value for the empty graph (which is usually, but not always, 0), and then performs one additive edge toggle per observed edge. The accumulated changescores yield the final graph state. Since most graphs for which ergm is used are sparse, one generally needs only order N toggles for this purpose - while there are faster ways to compute some graph statistics (some are implemented e.g. in the sna package, which is designed to compute descriptives rather than to perform changescore based calculations), in practice this is usually quite fast. (And if it isn't, you're…

View full answer

CarterButts · 2023-05-10T02:02:48Z

CarterButts
May 10, 2023
Maintainer

Although ergm 4.0 has added some alternatives, the standard way that statistics are calculated is simple: one begins with the value for the empty graph (which is usually, but not always, 0), and then performs one additive edge toggle per observed edge. The accumulated changescores yield the final graph state. Since most graphs for which ergm is used are sparse, one generally needs only order N toggles for this purpose - while there are faster ways to compute some graph statistics (some are implemented e.g. in the sna package, which is designed to compute descriptives rather than to perform changescore based calculations), in practice this is usually quite fast. (And if it isn't, you're probably not going to be fitting an ergm with that term anyway, because you are going to be doing a *lot* more toggles than that during the estimation process.) Another advantage of this scheme is that it greatly simplifies implementation: one only needs to define a changescore function, and to know the statistic for the empty graph (which is passed by the InitErgmTerm function). There is no need for a function to know how to do anything other than handle edge toggles, and one can use it for both summary() and MCMC calls. As noted at the outset, ergm 4.0 has added the ability to have terms that do implement separate summary() behavior, for cases where this is helpful - it can thus be done, but is optional. It is also possible to have stateful behavior for changescores (which was not supported before); that should certainly speed up some kinds of changescores, but it's pretty new functionality and not widely used yet. Hope that helps,

…

-Carter

On 5/9/23 6:35 PM, akumar01 wrote: I have a question regarding how the ergm package counts the summary statistics associated with model terms (for example when calling summary(nw ~ ctriple) to get the number of directed 3-ctycles). I can appreciate that the Metropolis-Hastings steps and subsequent MCMLE optimization steps only require keeping track of how the count of the model terms changes upon edge swaps/toggles, but this whole process is presumably predicated on knowing the initial real count in the observed network. After digging through the codebase a little, I found the function allstatistics.c that seems to use all possible toggles to somehow get subgraph counts, though it is does not seem to be explicitly used during normal model fitting initialization. Given that subgraph enumeration is a non-trivial problem with lots of literature (e.g. https://link.springer.com/chapter/10.1007/978-3-540-71681-5_7), I'm a little surprised there is no documentation on how it is done within ergm. Could anyone shed some light on this? I am particularly interested in what is required in order to add new model terms. Thanks! — Reply to this email directly, view it on GitHub <#527>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJM3GA6L27J4JMAR4WODVLXFLWHXANCNFSM6AAAAAAX4BMUYE>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

akumar01 · 2023-05-10T03:04:34Z

akumar01
May 10, 2023
Author

Thanks for the detailed response!

0 replies

drh20drh20 · 2023-05-10T11:22:20Z

drh20drh20
May 10, 2023
Maintainer

To what Carter said, I’d add that if you’re interested in allstatistics, there is documentation for ergm.allstats but it sounds like this isn’t quite what you’re interested in. Best, Dave From: CarterButts ***@***.***> Reply-To: statnet/ergm ***@***.***> Date: Tuesday, May 9, 2023 at 10:03 PM To: statnet/ergm ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [statnet/ergm] How is ergm using Change Statistics to calculate summary statistics? (Discussion #527) Although ergm 4.0 has added some alternatives, the standard way that statistics are calculated is simple: one begins with the value for the empty graph (which is usually, but not always, 0), and then performs one additive edge toggle per observed edge. The accumulated changescores yield the final graph state. Since most graphs for which ergm is used are sparse, one generally needs only order N toggles for this purpose - while there are faster ways to compute some graph statistics (some are implemented e.g. in the sna package, which is designed to compute descriptives rather than to perform changescore based calculations), in practice this is usually quite fast. (And if it isn't, you're probably not going to be fitting an ergm with that term anyway, because you are going to be doing a *lot* more toggles than that during the estimation process.) Another advantage of this scheme is that it greatly simplifies implementation: one only needs to define a changescore function, and to know the statistic for the empty graph (which is passed by the InitErgmTerm function). There is no need for a function to know how to do anything other than handle edge toggles, and one can use it for both summary() and MCMC calls. As noted at the outset, ergm 4.0 has added the ability to have terms that do implement separate summary() behavior, for cases where this is helpful - it can thus be done, but is optional. It is also possible to have stateful behavior for changescores (which was not supported before); that should certainly speed up some kinds of changescores, but it's pretty new functionality and not widely used yet. Hope that helps,

-Carter

On 5/9/23 6:35 PM, akumar01 wrote: I have a question regarding how the ergm package counts the summary statistics associated with model terms (for example when calling summary(nw ~ ctriple) to get the number of directed 3-ctycles). I can appreciate that the Metropolis-Hastings steps and subsequent MCMLE optimization steps only require keeping track of how the count of the model terms changes upon edge swaps/toggles, but this whole process is presumably predicated on knowing the initial real count in the observed network. After digging through the codebase a little, I found the function allstatistics.c that seems to use all possible toggles to somehow get subgraph counts, though it is does not seem to be explicitly used during normal model fitting initialization. Given that subgraph enumeration is a non-trivial problem with lots of literature (e.g. https://link.springer.com/chapter/10.1007/978-3-540-71681-5_7), I'm a little surprised there is no documentation on how it is done within ergm. Could anyone shed some light on this? I am particularly interested in what is required in order to add new model terms. Thanks! — Reply to this email directly, view it on GitHub <#527>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJM3GA6L27J4JMAR4WODVLXFLWHXANCNFSM6AAAAAAX4BMUYE>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

— Reply to this email directly, view it on GitHub<#527 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACUJ3YCANO7JFYBQFNFN2F3XFLZNJANCNFSM6AAAAAAX4BMUYE>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is ergm using Change Statistics to calculate summary statistics? #527

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

How is ergm using Change Statistics to calculate summary statistics? #527

akumar01 May 10, 2023

Replies: 3 comments

CarterButts May 10, 2023 Maintainer

akumar01 May 10, 2023 Author

drh20drh20 May 10, 2023 Maintainer

akumar01
May 10, 2023

CarterButts
May 10, 2023
Maintainer

akumar01
May 10, 2023
Author

drh20drh20
May 10, 2023
Maintainer