Code for the Location Heatmaps paper. #47

ebagdasa · 2021-11-04T17:39:07Z

This code demonstrates ability to build location heatmaps using
distributed differential privacy mechanism and proposed adaptive
algorithm. The code represents this paper: https://arxiv.org/abs/2111.02356 .
It also includes the Google Colab example for the experiments.

google-cla · 2021-11-04T17:39:15Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

ebagdasa · 2021-11-04T17:40:06Z

@googlebot I signed it!

samellem · 2021-11-22T19:36:23Z

analytics/location_heatmaps/README.md

+
+To experiment with the code there is a working [notebook](dp_location_heatmaps.ipynb) 
+with all the examples from the paper, please don't hesitate to contact the 
+[author](mailto:[email protected]) or raise an issue.1 


extraneous '1' at end of line

samellem · 2021-11-22T19:37:41Z

analytics/location_heatmaps/config.py

+    image: Any
+    level_sample_size: int = 10000
+    secagg_round_size: int = 10000
+    threshold: float = 0


how about 'split_threshold' to contrast with 'collapse_threshold' below?

samellem · 2021-11-22T19:48:27Z

analytics/location_heatmaps/mechanisms.py

@@ -127,6 +123,9 @@ def sample_inverse_prob(self):
  def eps_local(self):
    return np.log(2 * self.num_clients / self.lam - 1)

+  def get_noise_tensor(self, input_shape):
+    return


IIUC, this is unused for RapporNoise and should not be called by users. Shall we raise a NotImplementedError instead of silently returning? Should this method be _get_noise_tensor instead of get_noise_tensor to discourage direct usage, pointing users toward apply_noise instead?

samellem · 2021-11-22T19:51:57Z

analytics/location_heatmaps/metrics.py

@@ -36,13 +37,32 @@ class Metrics:
    f1: f1 score on the discovered hot spots.
    mutual_info: mutual information metric.
  """


add definition/description of new metrics (mape, smape, maape, nmse)

could also note how the zeros are handled (replaced with the next smallest true value from the image)

updated, added clarification to get_metrics()

samellem · 2021-11-22T20:00:27Z

analytics/location_heatmaps/plotting.py

-               f'MSE: {metric.mse:.2e}')
-  ax.imshow(test_image)
+               f'MSE: {metric.mse:.2e}', fontsize=30)
+  ax.imshow(test_image, interpolation='gaussian')


Why Gaussian interpolation for image display? The statistics calculated and displayed over the image would be different if calculated on the gaussian-interpolated image, wouldn't they?

this is just for visualization, it doesn't impact metrics. Mostly it improves rendering of lines in the contour grid image.

samellem · 2021-11-22T23:24:05Z

analytics/location_heatmaps/geo_utils.py

+  # print(f'Collapsed: {collapsed}, created when collapsing: {created},' + \
+  #       f'new expanded: {fresh_expand},' + \
+  #       f'unchanged: {unchanged}, total: {len(new_tree_prefix_list)}')
+  if fresh_expand == 0:  # len(new_tree_prefix_list) <= len(tree_prefix_list):


remove or uncomment extraneous debugging code

samellem · 2021-11-22T23:26:15Z

analytics/location_heatmaps/geo_utils.py

+          neg_image[x_bot:x_top + 1, y_bot:y_top + 1] = count
+        else:
+          raise ValueError(f'Not supported: {pos}')
+  return current_image, pos_image, neg_image


 def split_regions(tree_prefix_list,


this function is very long with a lot of nested conditions, which makes it hard to read. can we extract some reasonable helpers to improve readability by making the overall structure of the threshold checks and tree traversal more apparent?

samellem · 2021-11-22T23:26:40Z

analytics/location_heatmaps/geo_utils.py

@@ -276,3 +435,167 @@ def quantize_vector(vector, left_bound, right_bound):
  scale = (vector - left_bound) // distance
  vector -= distance * scale
  return vector
+
+
+def makeGaussian(image, total_size, fwhm=3, center=None,


make_gaussian for consistent style

samellem · 2021-11-22T23:41:16Z

analytics/location_heatmaps/geo_utils.py

+  elif level == 98:
+    z = 2.326
+  else:
+    raise ValueError(f'Incorrect confidence level {level}.')


It'd be nicer to just compute the z score analytically, rather than having sparse lookup table. I think the following should do the trick:

from scipy.stats import norm

z = norm.ppf(1-(1-level/100)/2)

samellem · 2021-11-22T23:42:57Z

analytics/location_heatmaps/geo_utils.py

+
+def make_step(samples, eps, threshold, partial,
+              prefix_len, dropout_rate, tree, tree_prefix_list,
+              noiser, quantize, total_size, positivity, count_min):


please add docstring

ebagdasa · 2021-12-10T20:15:08Z

@samellem please take a look, addressed your comments

samellem

Looking pretty good! Just a few additional ideas.

Also, it looks like you missed a few comments from the previous review that got auto-collapsed in the GitHub UI:

Most of those were small nits, and some may not be as relevant after your changes, but please do take a look at them if you missed them the first time.

samellem · 2021-12-14T01:31:21Z

analytics/location_heatmaps/geo_utils.py

  Returns:
-      new_tree, new_tree_prefix_list, finished
+      new_tree, new_tree_prefix_list, fresh_expand


The meaning of fresh_expand is not obvious, especially since this function both collapses and expands. Maybe num_newly_expanded_nodes?

great, thanks a lot!

samellem · 2021-12-14T01:33:22Z

analytics/location_heatmaps/geo_utils.py

      image_bit_level: stopping criteria once the final resolution is reached.
      collapse_threshold: threshold value used to collapse the nodes.
+      expand_all: expand all regions,


can we achieve this functionality by just passing split_threshold = -np.inf and eliminate the extra parameter & special-casing?

yeah, great idea, fixed

samellem · 2021-12-14T01:38:02Z

analytics/location_heatmaps/geo_utils.py

@@ -260,12 +260,30 @@ def rebuild_from_vector(vector, tree, image_size, contour=False, threshold=0,
  return current_image, pos_image, neg_image


+def update_tree(prefix, tree, tree_prefix_list):


nit: maybe append_to_tree instead of update_tree?

samellem · 2021-12-14T01:38:59Z

analytics/location_heatmaps/geo_utils.py

  Returns:
-      new_tree, new_tree_prefix_list, finished
+      new_tree, new_tree_prefix_list, fresh_expand
  """
  collapsed = 0
  created = 0
  fresh_expand = 0
  unchanged = 0


collapsed, created, and unchanged do not appear to be used for anything anymore. let's delete them xor do something with them.

printing the results in the end of the function now

samellem · 2021-12-14T01:39:59Z

analytics/location_heatmaps/geo_utils.py

+  collapsed = 0
+  created = 0
+  fresh_expand = 0
+  unchanged = 0


likewise re. collapsed, created, and unchanged being unused

samellem · 2021-12-14T01:43:30Z

analytics/location_heatmaps/geo_utils.py

+  return new_tree, new_tree_prefix_list, fresh_expand
+
+
+def split_regions_aux(tree_prefix_list,


I suspect that even more of these two functions could be shared (in particular, the basic structure of looping over prefixes and adding nodes to the tree as appropriate for the splitting & collapsing criteria), but acknowledge that it may not actually improve readability much more to do further surgery. Please consider sharing that prefix-looping structure, but if you can't see a clean and easy way to do so, that's fine.

yeah, I agree, it's just that I need to look at both bits in data that is hard to unify. Maybe once we go to multiple dimensions we can just unify everything.

samellem · 2021-12-14T01:44:31Z

analytics/location_heatmaps/geo_utils.py

@@ -499,18 +500,11 @@ def convert_to_dataset(image, total_size, value=None):


 def compute_conf_intervals(sum_vector: np.ndarray, level=95):
+  from scipy.stats import norm


I'd prefer not to do imports inline like this.

samellem · 2021-12-14T01:46:14Z

analytics/location_heatmaps/geo_utils.py

@@ -527,9 +521,46 @@ def compute_conf_intervals(sum_vector: np.ndarray, level=95):
  return conf_intervals, conf_interval_weighted


-def make_step(samples, eps, threshold, partial,
+def create_confidence_interval_condition(last_result, prefix, count, split_threshold):


we're ultimately just returning a boolean here, so maybe evaluate_confidence_interval_condition instead? the current name makes me think that we're returning some kind of predicate function

updated and added a docstring

samellem · 2021-12-14T01:47:34Z

analytics/location_heatmaps/run_experiment.py

@@ -122,35 +132,70 @@ def run_experiment(true_image,
      quantize: apply quantization to the vectors.
      noise_class: use specific noise, defaults to GeometricNoise.
      save_gif: saves all images as a gif.
+      count_min: use count-min sketch.


ebagdasa · 2021-12-17T00:18:24Z

yeah, thanks a lot and sorry for missed comments. Should be all good now.

samellem

approved with some minor suggestions

samellem · 2022-06-02T23:35:13Z

analytics/location_heatmaps/geo_utils.py

@@ -65,10 +85,15 @@ def coordinates_to_binary_path(xy_tuple, depth=10):
  Returns:
    binary version of the coordinate.
  """
-  x_coord, y_coord = xy_tuple
+  if len(xy_tuple) == 2:


I think this comment applies to aux_data now that that is being pulled from xy_tuple.

samellem · 2022-06-02T23:41:00Z

analytics/location_heatmaps/geo_utils.py

+  """Returns a quad tree first 4 nodes. If aux_data (boolean) provided expands
+  to 2 more bits or a specific pos/neg nodes.
+  Args:
+    aux_data: a boolean to use additional bit for data, e.g. pos/neg.


IMO aux_data sounds like an actual data object rather than a boolean parameter; I would prefer a name like "has_aux_data" or even "has_aux_bit" since a single bit is all that's supported here. This also goes for other usages of "aux_data" as a boolean in other functions, below.

Really it would be ideal to just generalize this to support an arbitrary number of extra bits with an automatic encoding from the value specified in "split", rather than a single extra bit with a predefined 'pos'-->1 and 'neg'-->0 encoding, but I understand that is probably out of scope at present.

I agree, let me change it to has_aux_bit for now, and maybe can expand it later

samellem · 2022-06-02T23:50:37Z

analytics/location_heatmaps/run_experiment.py

@@ -227,6 +225,7 @@ def run_experiment(true_image,
      noiser = noise_class(dp_round_size, sensitivity, eps)
      if ignore_start_eps and start_with_level <= i:
        print_output('Ignoring eps spent', flag=output_flag)


Nit: this is a frightening message; it would be nice to have a bit of extra context here (e.g., "Ignoring epsilon spent expanding first {start_with_level} levels, including current level {i}.").

ebagdasa added 2 commits November 4, 2021 13:34

Updates to the Location Heatmaps code.

5cd1d21

Created using Colaboratory

bb7372a

google-cla bot added the cla: no label Nov 4, 2021

google-cla bot added cla: yes and removed cla: no labels Nov 4, 2021

samellem reviewed Nov 22, 2021

View reviewed changes

ebagdasa added 6 commits December 10, 2021 12:42

addressing PR comments

b377c5e

clarify metrics

2eef817

refactor split code

b87b69b

update gaussian code

559225c

fixes

56f0dce

Created using Colaboratory

e0e3fbe

samellem suggested changes Dec 14, 2021

View reviewed changes

address PR comments

ec363b4

samellem approved these changes Jun 2, 2022

View reviewed changes

ebagdasa added 4 commits June 8, 2022 15:14

addressed comments

009edfd

addressed comments

92826fb

Created using Colaboratory

d2697fd

updated readme

8c6890d

		@@ -260,12 +260,30 @@ def rebuild_from_vector(vector, tree, image_size, contour=False, threshold=0,
		return current_image, pos_image, neg_image


		def update_tree(prefix, tree, tree_prefix_list):

		return new_tree, new_tree_prefix_list, fresh_expand


		def split_regions_aux(tree_prefix_list,

		@@ -499,18 +500,11 @@ def convert_to_dataset(image, total_size, value=None):


		def compute_conf_intervals(sum_vector: np.ndarray, level=95):
		from scipy.stats import norm

Code for the Location Heatmaps paper. #47

Are you sure you want to change the base?

Code for the Location Heatmaps paper. #47

Conversation

ebagdasa commented Nov 4, 2021

google-cla bot commented Nov 4, 2021

What to do if you already signed the CLA

Individual signers

Corporate signers

ebagdasa commented Nov 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebagdasa Dec 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebagdasa commented Dec 10, 2021

samellem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebagdasa commented Dec 17, 2021

samellem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebagdasa Dec 10, 2021 •

edited

Loading