Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dm_filter() on remote PostgreSQL db errors #2228

Open
LDalby opened this issue Sep 6, 2024 · 9 comments · May be fixed by #2275
Open

dm_filter() on remote PostgreSQL db errors #2228

LDalby opened this issue Sep 6, 2024 · 9 comments · May be fixed by #2275

Comments

@LDalby
Copy link

LDalby commented Sep 6, 2024

We have a PostgreSQL db and I would like to create local DuckDB copy of a subset of the rows on it.

My plan was to

  1. create the datamodel using dm_from_con
  2. filter the rows using dm_filter
  3. copy to local database using copy_dm_to

I currently get this error when attempting to filter:

library(dm)
#> 
#> Attaching package: 'dm'
#> The following object is masked from 'package:stats':
#> 
#>     filter

con <- camalienr::ca_connect() # internal pkg - only used here to connect to the db
remote_pg <- dm_from_con(con, schema = "data", learn_keys = TRUE)
dm_nrow(remote_pg)
#>                 chunk             detection      detectionsummary 
#>                 33781              83777960               9372360 
#>                 image             imagemeta                   job 
#>               5557936               2890422                   124 
#>               partner          plantnetcall plantnetcall_metadata 
#>                    11                159568                157834 
#> plantnetrequestparams               species     species_whitelist 
#>                     2                 49479                  2419

# In the detectionssummary table we have a column named speciesid
# integer ID on different species.
dm_filter(.dm = remote_pg,
          detectionsummary = (speciesid == 1356471))
#> Warning in igraph::graph_from_data_frame(., directed = directed, vertices =
#> def$table): In `d' `NA' elements were replaced with string "NA"
#> Error in `map()`:
#> ℹ In index: 1.
#> Caused by error in `igraph::graph_from_data_frame()`:
#> ! Some vertex names in edge list are not listed in vertex data frame

Created on 2024-09-06 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       macOS Sonoma 14.6.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Copenhagen
#>  date     2024-09-06
#>  pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date (UTC) lib source
#>  backports     1.5.0       2024-05-23 [1] CRAN (R 4.4.0)
#>  bit           4.0.5       2022-11-15 [1] CRAN (R 4.4.0)
#>  bit64         4.0.5       2020-08-30 [1] CRAN (R 4.4.0)
#>  blob          1.2.4       2023-03-17 [1] CRAN (R 4.4.0)
#>  cachem        1.1.0       2024-05-16 [1] CRAN (R 4.4.0)
#>  camalienr     0.3.7       2024-07-03 [1] git (https://gitlab.au.dk/ECOS/biodiversa/camalien/camalienr.git@67fbee4)
#>  cli           3.6.3       2024-06-21 [1] CRAN (R 4.4.0)
#>  DBI           1.2.3       2024-06-02 [1] CRAN (R 4.4.0)
#>  dbplyr        2.5.0       2024-03-19 [1] CRAN (R 4.4.0)
#>  digest        0.6.37      2024-08-19 [1] CRAN (R 4.4.1)
#>  dm          * 1.0.10.9010 2024-08-30 [1] https://cynkra.r-universe.dev (R 4.4.1)
#>  dplyr         1.1.4       2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate      0.24.0      2024-06-10 [1] CRAN (R 4.4.0)
#>  fansi         1.0.6       2023-12-08 [1] CRAN (R 4.4.0)
#>  fastmap       1.2.0       2024-05-15 [1] CRAN (R 4.4.0)
#>  fs            1.6.4       2024-04-25 [1] CRAN (R 4.4.0)
#>  generics      0.1.3       2022-07-05 [1] CRAN (R 4.4.0)
#>  glue          1.7.0       2024-01-09 [1] CRAN (R 4.4.0)
#>  hms           1.1.3       2023-03-21 [1] CRAN (R 4.4.0)
#>  htmltools     0.5.8.1     2024-04-04 [1] CRAN (R 4.4.0)
#>  httpuv        1.6.15      2024-03-26 [1] CRAN (R 4.4.0)
#>  igraph        2.0.3       2024-03-13 [1] CRAN (R 4.4.0)
#>  knitr         1.48        2024-07-07 [1] CRAN (R 4.4.0)
#>  later         1.3.2       2023-12-06 [1] CRAN (R 4.4.0)
#>  lifecycle     1.0.4       2023-11-07 [1] CRAN (R 4.4.0)
#>  lubridate     1.9.3       2023-09-27 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3       2022-03-30 [1] CRAN (R 4.4.0)
#>  memoise       2.0.1       2021-11-26 [1] CRAN (R 4.4.0)
#>  mime          0.12        2021-09-28 [1] CRAN (R 4.4.0)
#>  pillar        1.9.0       2023-03-22 [1] CRAN (R 4.4.0)
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 4.4.0)
#>  promises      1.3.0       2024-04-05 [1] CRAN (R 4.4.0)
#>  purrr         1.0.2       2023-08-10 [1] CRAN (R 4.4.0)
#>  R6            2.5.1       2021-08-19 [1] CRAN (R 4.4.0)
#>  Rcpp          1.0.13      2024-07-17 [1] CRAN (R 4.4.0)
#>  reprex        2.1.1       2024-07-06 [1] CRAN (R 4.4.0)
#>  rlang         1.1.4       2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown     2.28        2024-08-17 [1] CRAN (R 4.4.0)
#>  RPostgres     1.4.7       2024-05-27 [1] CRAN (R 4.4.0)
#>  rstudioapi    0.16.0      2024-03-24 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2       2021-12-06 [1] CRAN (R 4.4.0)
#>  shiny         1.9.1       2024-08-01 [1] CRAN (R 4.4.0)
#>  tibble        3.2.1       2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr         1.3.1       2024-01-24 [1] CRAN (R 4.4.0)
#>  tidyselect    1.2.1       2024-03-11 [1] CRAN (R 4.4.0)
#>  timechange    0.3.0       2024-01-18 [1] CRAN (R 4.4.0)
#>  utf8          1.2.4       2023-10-22 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5       2023-12-01 [1] CRAN (R 4.4.0)
#>  withr         3.0.1       2024-07-31 [1] CRAN (R 4.4.0)
#>  xfun          0.47        2024-08-17 [1] CRAN (R 4.4.0)
#>  xtable        1.8-4       2019-04-21 [1] CRAN (R 4.4.0)
#>  yaml          2.3.10      2024-07-26 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

I'm a bit confused about the error - why is {iGraph} called here?

@krlmlr
Copy link
Collaborator

krlmlr commented Nov 19, 2024

Thanks. Can you please share the output of constructive::construct(remote_pg) ?

@LDalby
Copy link
Author

LDalby commented Nov 20, 2024

Sure!

The output was too long to post as a comment, so I've attached it as txt.
dm-pg-debug.txt

@krlmlr
Copy link
Collaborator

krlmlr commented Nov 20, 2024

Thanks, this is helpful, but constructive's own logic intervened here. Can you please share constructive::construct(unclass(remote_pg)) -- just the output (which is executable R code already), not the whole reprex?

@moodymudskipper
Copy link
Collaborator

moodymudskipper commented Nov 20, 2024

On one hand it seems like constructive fails at constructing the correct fks because we have calls such as dm::dm_add_fk(NA, NA_character_, chunk, "id") that fail.
On the other hand the way connection objects are constructed now uses external pointers, this would cause the constructive message by itself (see also cynkra/constructive#150).

I think the best reprex here would be the output of :

con <- camalienr::ca_connect() # internal pkg - only used here to connect to the db
remote_pg <- dm_from_con(con, schema = "data", learn_keys = TRUE)
# construct_base will use only base functions so will bypass the shortcomings of `.cstr_construct.dm()`
# the data arg will show the connection object as `con` rather than trying to construct it
constructive::construct_base(remote_pg, data = list(con = con))

FYI Here's a simplification of the original code that fits here:

typnames1 <- data.frame(
  oid = c(
    16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L,
    71L, 75L, 81L, 83L, 114L, 142L, 194L, 3361L, 3402L, 5017L, 32L, 5069L, 600L,
    601L, 602L, 603L, 604L, 628L, 700L, 701L, 705L, 718L, 790L, 829L, 869L, 650L,
    774L, 1033L, 1042L, 1043L, 1082L, 1083L, 1114L, 1184L, 1186L, 1266L, 1560L,
    1562L, 1700L, 1790L, 2202L, 2203L, 2204L, 2205L, 4191L, 2206L, 4096L, 4089L,
    2950L, 3220L, 3614L, 3642L, 3615L, 3734L, 3769L, 3802L, 4072L, 2970L, 5038L,
    3904L, 3906L, 3908L, 3910L, 3912L, 3926L, 4451L, 4532L, 4533L, 4534L, 4535L,
    4536L, 2249L, 2287L, 2275L, 2276L, 2277L, 2278L, 2279L, 3838L, 2280L, 2281L,
    2283L, 2776L, 3500L, 3115L, 325L, 3310L, 269L, 3831L, 5077L, 5078L, 5079L,
    5080L, 4537L, 4538L, 4600L, 4601L, 1000L, 1001L, 1002L, 1003L, 1016L, 1005L,
    1006L, 1007L, 1008L, 1009L, 1028L, 1010L, 1011L, 1012L, 1013L, 210L, 270L,
    272L, 273L, 199L, 143L, 271L, 1017L, 1018L, 1019L, 1020L, 1027L, 629L, 1021L,
    1022L, 719L, 791L, 1040L, 1041L, 651L, 775L, 1034L, 1014L, 1015L, 1182L,
    1183L, 1115L, 1185L, 1187L, 1270L, 1561L, 1563L, 1231L, 2201L, 2207L, 2208L,
    2209L, 2210L, 4192L, 2211L, 4097L, 4090L, 2951L, 3221L, 3643L, 3644L, 3645L,
    3735L, 3770L, 3807L, 4073L, 2949L, 5039L, 3905L, 3907L, 3909L, 3911L, 3913L,
    3927L, 6150L, 6151L, 6152L, 6153L, 6155L, 6157L, 1263L, 12001L, 12000L,
    12003L, 12002L, 12005L, 12004L, 12007L, 12006L, 12009L, 12008L, 12011L,
    12010L, 12013L, 12012L, 12015L, 12014L, 12017L, 12016L, 12019L, 12018L,
    12021L, 12020L, 12023L, 12022L, 12025L, 12024L, 12027L, 12026L, 12029L,
    12028L, 12031L, 12030L, 12033L, 12032L, 12035L, 12034L, 12037L, 12036L,
    12039L, 12038L, 12041L, 12040L, 12043L, 12042L, 12045L, 12044L, 12047L,
    12046L, 12049L, 12048L, 12051L, 12050L, 1248L, 12052L, 12054L, 12053L, 12056L,
    12055L, 2842L, 12057L, 2843L, 12058L, 12060L, 12059L, 12062L, 12061L, 12064L,
    12063L, 12066L, 12065L, 12068L, 12067L, 12070L, 12069L, 12072L, 12071L,
    12074L, 12073L, 12076L, 12075L, 12078L, 12077L, 12080L, 12079L, 12082L,
    12081L, 12084L, 12083L, 12086L, 12085L, 12088L, 12087L, 12090L, 12089L,
    12092L, 12091L, 4066L, 12093L, 12095L, 12094L, 12097L, 12096L, 12099L, 12098L,
    12101L, 12100L, 12103L, 12102L, 12105L, 12104L, 12107L, 12106L, 6101L, 12108L,
    12110L, 12109L, 12219L, 12218L, 12224L, 12223L, 12229L, 12228L, 12233L,
    12232L, 12237L, 12236L, 12242L, 12241L, 12247L, 12246L, 12252L, 12251L,
    12257L, 12256L, 12262L, 12261L, 12267L, 12266L, 12272L, 12271L, 12277L,
    12276L, 12282L, 12281L, 12287L, 12286L, 12292L, 12291L, 12296L, 12295L,
    12300L, 12299L, 12304L, 12303L, 12309L, 12308L, 12314L, 12313L, 12318L,
    12317L, 12323L, 12322L, 12329L, 12328L, 12333L, 12332L, 12337L, 12336L,
    12341L, 12340L, 12345L, 12344L, 12349L, 12348L, 12353L, 12352L, 12357L,
    12356L, 12362L, 12361L, 12367L, 12366L, 12372L, 12371L, 12376L, 12375L,
    12381L, 12380L, 12385L, 12384L, 12390L, 12389L, 12394L, 12393L, 12398L,
    12397L, 12403L, 12402L, 12407L, 12406L, 12411L, 12410L, 12416L, 12415L,
    12420L, 12419L, 12424L, 12423L, 12429L, 12428L, 12433L, 12432L, 12437L,
    12436L, 12442L, 12441L, 12447L, 12446L, 12451L, 12450L, 12455L, 12454L,
    12460L, 12459L, 12464L, 12463L, 12468L, 12467L, 12473L, 12472L, 12477L,
    12476L, 12482L, 12481L, 12486L, 12485L, 12491L, 12490L, 12496L, 12495L,
    12500L, 12499L, 12504L, 12503L, 12508L, 12507L, 12513L, 12512L, 12518L,
    12517L, 12523L, 12522L, 12528L, 12527L, 12533L, 12532L, 12538L, 12537L,
    12543L, 12542L, 14365L, 14364L, 14368L, 14367L, 14370L, 14369L, 14373L,
    14372L, 14376L, 14375L, 14378L, 14377L, 14382L, 14381L, 14387L, 14386L,
    14391L, 14390L, 14396L, 14395L, 14401L, 14400L, 14406L, 14405L, 14411L,
    14410L, 14416L, 14415L, 14421L, 14420L, 14426L, 14425L, 14431L, 14430L,
    14436L, 14435L, 14441L, 14440L, 14446L, 14445L, 14451L, 14450L, 14456L,
    14455L, 14461L, 14460L, 14466L, 14465L, 14471L, 14470L, 14475L, 14474L,
    14480L, 14479L, 14485L, 14484L, 14490L, 14489L, 14494L, 14493L, 14499L,
    14498L, 14504L, 14503L, 14508L, 14507L, 14513L, 14512L, 14518L, 14517L,
    14523L, 14522L, 14528L, 14527L, 14532L, 14531L, 14537L, 14536L, 14542L,
    14541L, 14547L, 14546L, 14552L, 14551L, 14557L, 14556L, 14562L, 14561L,
    14567L, 14566L, 14571L, 14570L, 14576L, 14575L, 14581L, 14580L, 14586L,
    14585L, 14591L, 14590L, 14596L, 14595L, 14600L, 14599L, 14605L, 14604L,
    14609L, 14608L, 14614L, 14613L, 14619L, 14618L, 14624L, 14623L, 14629L,
    14628L, 14634L, 14633L, 14639L, 14638L, 14644L, 14643L, 14649L, 14648L,
    14653L, 14652L, 14657L, 14656L, 14661L, 14660L, 14665L, 14664L, 14670L,
    14669L, 14674L, 14673L, 14678L, 14677L, 14683L, 14682L, 14687L, 14686L,
    14691L, 14690L, 14696L, 14695L, 14701L, 14700L, 24592L, 24595L, 24596L,
    24604L, 24623L, 24626L, 24627L, 24630L, 24631L, 24634L, 24635L, 24638L,
    24893L, 24892L, 24899L, 24898L, 25006L, 25005L, 25259L, 25265L, 25283L,
    25282L, 25418L, 25417L, 25605L, 25608L, 25615L, 25614L, 25618L, 25617L,
    25645L, 25644L, 25658L, 25657L, 25682L, 25681L, 25747L, 25746L, 25999L,
    25998L, 26054L, 26053L, 26133L, 26132L, 26142L, 26141L, 774169L, 774168L,
    774181L, 774180L, 774194L, 774193L, 774208L, 774207L, 774215L, 774214L,
    774253L, 864631L, 864650L, 864649L, 865108L, 864601L, 774241L, 774240L,
    774276L, 774275L, 774261L, 774260L, 774254L, 864600L, 864632L, 864676L,
    864675L, 864696L, 864695L, 864663L, 864662L, 865109L, 869729L, 869728L,
    869736L, 869735L, 869748L, 869747L, 871659L, 871658L, 871664L, 871663L,
    1004015L, 1004014L, 1004039L, 1004038L, 1004358L, 1004357L, 1004364L,
    1004363L, 1004380L, 1004379L, 1004450L, 1004449L, 1004458L, 1004457L,
    1048798L, 1048797L, 1048873L, 1048872L, 1048884L, 1048883L, 1049004L,
    1049003L, 1049023L, 1049022L, 1049028L, 1049027L, 1049039L, 1049038L,
    1049057L, 1049056L, 1049063L, 1049062L, 1049087L, 1049086L, 1049133L,
    1049132L, 1049163L, 1049162L, 1049685L, 1049684L, 1049943L, 1049942L,
    1050665L, 1050664L
  ),
  typname = c(
    "bool", "bytea", "char", "name", "int8", "int2", "int2vector", "int4",
    "regproc", "text", "oid", "tid", "xid", "cid", "oidvector", "pg_type",
    "pg_attribute", "pg_proc", "pg_class", "json", "xml", "pg_node_tree",
    "pg_ndistinct", "pg_dependencies", "pg_mcv_list", "pg_ddl_command", "xid8",
    "point", "lseg", "path", "box", "polygon", "line", "float4", "float8",
    "unknown", "circle", "money", "macaddr", "inet", "cidr", "macaddr8",
    "aclitem", "bpchar", "varchar", "date", "time", "timestamp", "timestamptz",
    "interval", "timetz", "bit", "varbit", "numeric", "refcursor", "regprocedure",
    "regoper", "regoperator", "regclass", "regcollation", "regtype", "regrole",
    "regnamespace", "uuid", "pg_lsn", "tsvector", "gtsvector", "tsquery",
    "regconfig", "regdictionary", "jsonb", "jsonpath", "txid_snapshot",
    "pg_snapshot", "int4range", "numrange", "tsrange", "tstzrange", "daterange",
    "int8range", "int4multirange", "nummultirange", "tsmultirange",
    "tstzmultirange", "datemultirange", "int8multirange", "record", "_record",
    "cstring", "any", "anyarray", "void", "trigger", "event_trigger",
    "language_handler", "internal", "anyelement", "anynonarray", "anyenum",
    "fdw_handler", "index_am_handler", "tsm_handler", "table_am_handler",
    "anyrange", "anycompatible", "anycompatiblearray", "anycompatiblenonarray",
    "anycompatiblerange", "anymultirange", "anycompatiblemultirange",
    "pg_brin_bloom_summary", "pg_brin_minmax_multi_summary", "_bool", "_bytea",
    "_char", "_name", "_int8", "_int2", "_int2vector", "_int4", "_regproc",
    "_text", "_oid", "_tid", "_xid", "_cid", "_oidvector", "_pg_type",
    "_pg_attribute", "_pg_proc", "_pg_class", "_json", "_xml", "_xid8", "_point",
    "_lseg", "_path", "_box", "_polygon", "_line", "_float4", "_float8",
    "_circle", "_money", "_macaddr", "_inet", "_cidr", "_macaddr8", "_aclitem",
    "_bpchar", "_varchar", "_date", "_time", "_timestamp", "_timestamptz",
    "_interval", "_timetz", "_bit", "_varbit", "_numeric", "_refcursor",
    "_regprocedure", "_regoper", "_regoperator", "_regclass", "_regcollation",
    "_regtype", "_regrole", "_regnamespace", "_uuid", "_pg_lsn", "_tsvector",
    "_gtsvector", "_tsquery", "_regconfig", "_regdictionary", "_jsonb",
    "_jsonpath", "_txid_snapshot", "_pg_snapshot", "_int4range", "_numrange",
    "_tsrange", "_tstzrange", "_daterange", "_int8range", "_int4multirange",
    "_nummultirange", "_tsmultirange", "_tstzmultirange", "_datemultirange",
    "_int8multirange", "_cstring", "pg_attrdef", "_pg_attrdef", "pg_constraint",
    "_pg_constraint", "pg_inherits", "_pg_inherits", "pg_index", "_pg_index",
    "pg_operator", "_pg_operator", "pg_opfamily", "_pg_opfamily", "pg_opclass",
    "_pg_opclass", "pg_am", "_pg_am", "pg_amop", "_pg_amop", "pg_amproc",
    "_pg_amproc", "pg_language", "_pg_language", "pg_largeobject_metadata",
    "_pg_largeobject_metadata", "pg_largeobject", "_pg_largeobject",
    "pg_aggregate", "_pg_aggregate", "pg_statistic", "_pg_statistic",
    "pg_statistic_ext", "_pg_statistic_ext", "pg_statistic_ext_data",
    "_pg_statistic_ext_data", "pg_rewrite", "_pg_rewrite", "pg_trigger",
    "_pg_trigger", "pg_event_trigger", "_pg_event_trigger", "pg_description",
    "_pg_description", "pg_cast", "_pg_cast", "pg_enum", "_pg_enum",
    "pg_namespace", "_pg_namespace", "pg_conversion", "_pg_conversion",
    "pg_depend", "_pg_depend", "pg_database", "_pg_database",
    "pg_db_role_setting", "_pg_db_role_setting", "pg_tablespace",
    "_pg_tablespace", "pg_authid", "_pg_authid", "pg_auth_members",
    "_pg_auth_members", "pg_shdepend", "_pg_shdepend", "pg_shdescription",
    "_pg_shdescription", "pg_ts_config", "_pg_ts_config", "pg_ts_config_map",
    "_pg_ts_config_map", "pg_ts_dict", "_pg_ts_dict", "pg_ts_parser",
    "_pg_ts_parser", "pg_ts_template", "_pg_ts_template", "pg_extension",
    "_pg_extension", "pg_foreign_data_wrapper", "_pg_foreign_data_wrapper",
    "pg_foreign_server", "_pg_foreign_server", "pg_user_mapping",
    "_pg_user_mapping", "pg_foreign_table", "_pg_foreign_table", "pg_policy",
    "_pg_policy", "pg_replication_origin", "_pg_replication_origin",
    "pg_default_acl", "_pg_default_acl", "pg_init_privs", "_pg_init_privs",
    "pg_seclabel", "_pg_seclabel", "pg_shseclabel", "_pg_shseclabel",
    "pg_collation", "_pg_collation", "pg_partitioned_table",
    "_pg_partitioned_table", "pg_range", "_pg_range", "pg_transform",
    "_pg_transform", "pg_sequence", "_pg_sequence", "pg_publication",
    "_pg_publication", "pg_publication_rel", "_pg_publication_rel",
    "pg_subscription", "_pg_subscription", "pg_subscription_rel",
    "_pg_subscription_rel", "pg_roles", "_pg_roles", "pg_shadow", "_pg_shadow",
    "pg_group", "_pg_group", "pg_user", "_pg_user", "pg_policies", "_pg_policies",
    "pg_rules", "_pg_rules", "pg_views", "_pg_views", "pg_tables", "_pg_tables",
    "pg_matviews", "_pg_matviews", "pg_indexes", "_pg_indexes", "pg_sequences",
    "_pg_sequences", "pg_stats", "_pg_stats", "pg_stats_ext", "_pg_stats_ext",
    "pg_stats_ext_exprs", "_pg_stats_ext_exprs", "pg_publication_tables",
    "_pg_publication_tables", "pg_locks", "_pg_locks", "pg_cursors",
    "_pg_cursors", "pg_available_extensions", "_pg_available_extensions",
    "pg_available_extension_versions", "_pg_available_extension_versions",
    "pg_prepared_xacts", "_pg_prepared_xacts", "pg_prepared_statements",
    "_pg_prepared_statements", "pg_seclabels", "_pg_seclabels", "pg_settings",
    "_pg_settings", "pg_file_settings", "_pg_file_settings", "pg_hba_file_rules",
    "_pg_hba_file_rules", "pg_timezone_abbrevs", "_pg_timezone_abbrevs",
    "pg_timezone_names", "_pg_timezone_names", "pg_config", "_pg_config",
    "pg_shmem_allocations", "_pg_shmem_allocations", "pg_backend_memory_contexts",
    "_pg_backend_memory_contexts", "pg_stat_all_tables", "_pg_stat_all_tables",
    "pg_stat_xact_all_tables", "_pg_stat_xact_all_tables", "pg_stat_sys_tables",
    "_pg_stat_sys_tables", "pg_stat_xact_sys_tables", "_pg_stat_xact_sys_tables",
    "pg_stat_user_tables", "_pg_stat_user_tables", "pg_stat_xact_user_tables",
    "_pg_stat_xact_user_tables", "pg_statio_all_tables", "_pg_statio_all_tables",
    "pg_statio_sys_tables", "_pg_statio_sys_tables", "pg_statio_user_tables",
    "_pg_statio_user_tables", "pg_stat_all_indexes", "_pg_stat_all_indexes",
    "pg_stat_sys_indexes", "_pg_stat_sys_indexes", "pg_stat_user_indexes",
    "_pg_stat_user_indexes", "pg_statio_all_indexes", "_pg_statio_all_indexes",
    "pg_statio_sys_indexes", "_pg_statio_sys_indexes", "pg_statio_user_indexes",
    "_pg_statio_user_indexes", "pg_statio_all_sequences",
    "_pg_statio_all_sequences", "pg_statio_sys_sequences",
    "_pg_statio_sys_sequences", "pg_statio_user_sequences",
    "_pg_statio_user_sequences", "pg_stat_activity", "_pg_stat_activity",
    "pg_stat_replication", "_pg_stat_replication", "pg_stat_slru",
    "_pg_stat_slru", "pg_stat_wal_receiver", "_pg_stat_wal_receiver",
    "pg_stat_subscription", "_pg_stat_subscription", "pg_stat_ssl",
    "_pg_stat_ssl", "pg_stat_gssapi", "_pg_stat_gssapi", "pg_replication_slots",
    "_pg_replication_slots", "pg_stat_replication_slots",
    "_pg_stat_replication_slots", "pg_stat_database", "_pg_stat_database",
    "pg_stat_database_conflicts", "_pg_stat_database_conflicts",
    "pg_stat_user_functions", "_pg_stat_user_functions",
    "pg_stat_xact_user_functions", "_pg_stat_xact_user_functions",
    "pg_stat_archiver", "_pg_stat_archiver", "pg_stat_bgwriter",
    "_pg_stat_bgwriter", "pg_stat_wal", "_pg_stat_wal",
    "pg_stat_progress_analyze", "_pg_stat_progress_analyze",
    "pg_stat_progress_vacuum", "_pg_stat_progress_vacuum",
    "pg_stat_progress_cluster", "_pg_stat_progress_cluster",
    "pg_stat_progress_create_index", "_pg_stat_progress_create_index",
    "pg_stat_progress_basebackup", "_pg_stat_progress_basebackup",
    "pg_stat_progress_copy", "_pg_stat_progress_copy", "pg_user_mappings",
    "_pg_user_mappings", "pg_replication_origin_status",
    "_pg_replication_origin_status", "cardinal_number", "_cardinal_number",
    "character_data", "_character_data", "sql_identifier", "_sql_identifier",
    "information_schema_catalog_name", "_information_schema_catalog_name",
    "time_stamp", "_time_stamp", "yes_or_no", "_yes_or_no", "applicable_roles",
    "_applicable_roles", "administrable_role_authorizations",
    "_administrable_role_authorizations", "attributes", "_attributes",
    "character_sets", "_character_sets", "check_constraint_routine_usage",
    "_check_constraint_routine_usage", "check_constraints", "_check_constraints",
    "collations", "_collations", "collation_character_set_applicability",
    "_collation_character_set_applicability", "column_column_usage",
    "_column_column_usage", "column_domain_usage", "_column_domain_usage",
    "column_privileges", "_column_privileges", "column_udt_usage",
    "_column_udt_usage", "columns", "_columns", "constraint_column_usage",
    "_constraint_column_usage", "constraint_table_usage",
    "_constraint_table_usage", "domain_constraints", "_domain_constraints",
    "domain_udt_usage", "_domain_udt_usage", "domains", "_domains",
    "enabled_roles", "_enabled_roles", "key_column_usage", "_key_column_usage",
    "parameters", "_parameters", "referential_constraints",
    "_referential_constraints", "role_column_grants", "_role_column_grants",
    "routine_column_usage", "_routine_column_usage", "routine_privileges",
    "_routine_privileges", "role_routine_grants", "_role_routine_grants",
    "routine_routine_usage", "_routine_routine_usage", "routine_sequence_usage",
    "_routine_sequence_usage", "routine_table_usage", "_routine_table_usage",
    "routines", "_routines", "schemata", "_schemata", "sequences", "_sequences",
    "sql_features", "_sql_features", "sql_implementation_info",
    "_sql_implementation_info", "sql_parts", "_sql_parts", "sql_sizing",
    "_sql_sizing", "table_constraints", "_table_constraints", "table_privileges",
    "_table_privileges", "role_table_grants", "_role_table_grants", "tables",
    "_tables", "transforms", "_transforms", "triggered_update_columns",
    "_triggered_update_columns", "triggers", "_triggers", "udt_privileges",
    "_udt_privileges", "role_udt_grants", "_role_udt_grants", "usage_privileges",
    "_usage_privileges", "role_usage_grants", "_role_usage_grants",
    "user_defined_types", "_user_defined_types", "view_column_usage",
    "_view_column_usage", "view_routine_usage", "_view_routine_usage",
    "view_table_usage", "_view_table_usage", "views", "_views",
    "data_type_privileges", "_data_type_privileges", "element_types",
    "_element_types", "_pg_foreign_table_columns", "__pg_foreign_table_columns",
    "column_options", "_column_options", "_pg_foreign_data_wrappers",
    "__pg_foreign_data_wrappers", "foreign_data_wrapper_options",
    "_foreign_data_wrapper_options", "foreign_data_wrappers",
    "_foreign_data_wrappers", "_pg_foreign_servers", "__pg_foreign_servers",
    "foreign_server_options", "_foreign_server_options", "foreign_servers",
    "_foreign_servers", "_pg_foreign_tables", "__pg_foreign_tables",
    "foreign_table_options", "_foreign_table_options", "foreign_tables",
    "_foreign_tables", "_pg_user_mappings", "__pg_user_mappings",
    "user_mapping_options", "_user_mapping_options", "user_mappings",
    "_user_mappings", "spheroid", "_spheroid", "geometry", "_geometry", "box3d",
    "_box3d", "box2d", "_box2d", "box2df", "_box2df", "gidx", "_gidx",
    "geometry_dump", "_geometry_dump", "spatial_ref_sys", "_spatial_ref_sys",
    "valid_detail", "_valid_detail", "geography", "_geography",
    "geography_columns", "_geography_columns", "geometry_columns",
    "_geometry_columns", "raster", "_raster", "rastbandarg", "_rastbandarg",
    "geomval", "_geomval", "addbandarg", "_addbandarg", "summarystats",
    "_summarystats", "agg_count", "_agg_count", "reclassarg", "_reclassarg",
    "agg_samealignment", "_agg_samealignment", "unionarg", "_unionarg",
    "raster_columns", "_raster_columns", "raster_overviews", "_raster_overviews",
    "partner", "_partner", "job", "_job", "chunk", "_chunk", "imagemeta",
    "_imagemeta", "image", "_image", "_species", "_plantnetcall", "detection",
    "_detection", "_plantnetcall_metadata", "plantnetrequestparams",
    "plantnetcall", "_plantnetcall", "detection", "_detection", "plantnetspecies",
    "_plantnetspecies", "species", "_plantnetrequestparams", "plantnetcall",
    "species", "_species", "detectionsummary", "_detectionsummary",
    "plantnetcall_metadata", "_plantnetcall_metadata", "plantnetcall_metadata",
    "family", "_family", "genus", "_genus", "species_whitelist",
    "_species_whitelist", "image_jpg_20240301", "_image_jpg_20240301",
    "image_tiff_20240301", "_image_tiff_20240301",
    "detection_snapshot_20240809_israel", "_detection_snapshot_20240809_israel",
    "detection_snapshot_20240809", "_detection_snapshot_20240809",
    "annotation_enum", "_annotation_enum", "annotations", "_annotations",
    "experiment_meta", "_experiment_meta", "callmeta", "_callmeta", "results",
    "_results", "imagedata_status", "_imagedata_status", "session", "_session",
    "imagedata", "_imagedata", "detectionsummary", "_detectionsummary",
    "imagedata_status", "_imagedata_status", "session", "_session", "imagedata",
    "_imagedata", "additionalimagedata", "_additionalimagedata",
    "detectionsummary", "_detectionsummary", "imagefile", "_imagefile",
    "call_plantnet", "_call_plantnet", "generated_image", "_generated_image",
    "capturepoint", "_capturepoint", "capturepoint", "_capturepoint",
    "status_overview", "_status_overview"
  )
)

dm1 <- dm::dm(
  chunk = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."chunk"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "name", "created", "jobid", "state"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  detection = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."detection"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "callid", "boxid", "score", "boxsize", "centerx", "centery", "speciesid"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  detectionsummary = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."detectionsummary"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "callid", "speciesid", "maxscore", "sumscore", "count", "coverage"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  image = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."image"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c(
        "id", "chunkid", "imagemetaid", "created", "filename", "path", "type",
        "md5expected", "md5actual", "state", "generated"
      ),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  imagemeta = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."imagemeta"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c(
        "id", "tag", "position", "timestamp", "computertime", "exposuretime",
        "velocity", "gain", "altitude", "parsed"
      ),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  job = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."job"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "created", "checksum", "partnerid", "name", "imagestorageservice", "size"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  partner = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."partner"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "name", "partnerkey", "storagepath"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  plantnetcall = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."plantnetcall"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c(
        "id", "state", "imageid", "imageurl", "responsestatus", "requestparamid",
        "timestamp", "responsetime", "dateparsed", "responsefile"
      ),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  plantnetcall_metadata = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."plantnetcall_metadata"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "callid", "uid", "query_dbg", "duration", "versions"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  plantnetrequestparams = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."plantnetrequestparams"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c(
        "id", "slug", "q_stride", "q_stride_x", "q_stride_y", "q_min_size",
        "q_size_factor", "q_multi_scale", "q_min_score", "q_max_rank",
        "q_show_species", "q_show_genus", "q_show_family", "q_show_organ",
        "q_show_location"
      ),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  species = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."species"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("gbifid", "pnid", "scientificname", "gbifgenusid"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  species_whitelist = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."species_whitelist"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = "pnid",
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
)  |>
  dm::dm_add_pk(chunk, "id") |>
  dm::dm_add_pk(detection, "id") |>
  dm::dm_add_pk(detectionsummary, "id") |>
  dm::dm_add_pk(image, "id") |>
  dm::dm_add_pk(imagemeta, "id") |>
  dm::dm_add_pk(job, "id") |>
  dm::dm_add_pk(partner, "id") |>
  dm::dm_add_pk(plantnetcall, "id") |>
  dm::dm_add_pk(plantnetcall_metadata, "id") |>
  dm::dm_add_pk(plantnetrequestparams, "id") |>
  dm::dm_add_pk(species, "pnid")

dm2 <- dm1 |>
  dm::dm_add_fk(image, "chunkid", chunk, "id") |>
  dm::dm_add_fk(NA, NA_character_, chunk, "id") |>
  dm::dm_add_fk(NA, NA_character_, image, "id") |>
  dm::dm_add_fk(chunk, "jobid", job, "id") |>
  dm::dm_add_fk(job, "partnerid", partner, "id") |>
  dm::dm_add_fk(NA, NA_character_, partner, "id") |>
  dm::dm_add_fk(NA, NA_character_, partner, "id") |>
  dm::dm_add_fk(NA, NA_character_, plantnetcall, "id") |>
  dm::dm_add_fk(detection, "callid", plantnetcall, "id") |>
  dm::dm_add_fk(detectionsummary, "callid", plantnetcall, "id") |>
  dm::dm_add_fk(plantnetcall_metadata, "callid", plantnetcall, "id") |>
  dm::dm_add_fk(NA, NA_character_, plantnetrequestparams, "id") |>
  dm::dm_add_fk(plantnetcall, "requestparamid", plantnetrequestparams, "id") |>
  dm::dm_add_fk(species_whitelist, "pnid", species, "pnid")

@LDalby
Copy link
Author

LDalby commented Nov 20, 2024

If I call constructive::construct(unclass(remote_pg)) I still get an output of approx. 3700 lines.
Were I supposed to extract something from the output or is that what @moodymudskipper already did above?

@moodymudskipper
Copy link
Collaborator

I don't think unclassing will help, but constructive::construct_base(remote_pg, data = list(con = con)) might hopefully provide a much shorter output without evaluation error message.

@krlmlr
Copy link
Collaborator

krlmlr commented Nov 20, 2024

The problems with constructive aside: What I'm interested in is why dm thinks that the foreign keys have no corresponding detail tables (the dm_add_fk(NA, ...) lines). For this, we could also try a snapshot of dm:::dm_meta(con, schema = ...) , using constructive or through an .rds or similar file.

@LDalby
Copy link
Author

LDalby commented Dec 4, 2024

I've saved the output of dm::dm_meta(con, schema = "data") to an rds file.
Our sys admin asked me not to share the host publicly if possible, so I was wandering if I can send file directly to you @krlmlr ?

@krlmlr
Copy link
Collaborator

krlmlr commented Dec 5, 2024

Thanks, Lars, sure! If e-mail works, https://cran.r-project.org/web/packages/dm/index.html has my address in the "Maintainer" section.

@krlmlr krlmlr changed the title dm_filter on remote PostgreSQL db errors dm_filter() on remote PostgreSQL db errors Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants