From dff6da6ceec4e9b725f940235125ce470b61c405 Mon Sep 17 00:00:00 2001 From: Xochitl Ortiz-Ross Date: Tue, 3 Dec 2024 15:20:03 -0800 Subject: [PATCH 1/3] Deleted the .md file for my episode Since it is now an .Rmd --- .DS_Store | Bin 6148 -> 6148 bytes episodes/4-minimal-reproducible-data.md | 589 ------------------------ 2 files changed, 589 deletions(-) delete mode 100644 episodes/4-minimal-reproducible-data.md diff --git a/.DS_Store b/.DS_Store index 275c0579b7b243b61b9c222bf9a94ec0ddde95aa..0a0d46ac615b4aa75fcd0f2435cfc6cdbafc7c8c 100644 GIT binary patch delta 115 zcmZoMXfc=|#>B`mu~2NH-rdOtA}o{nnPeDuOx9s4RF|l(HZw8NQ7|?!snt=awlp%( zQ7|(!uC3+d5LMQ<4vNpt$<52}ntY5&p0RuLGbTmWjSWnUo7p+|IeB)qu~2NH-rdCtOpKEaL|7*CGs-aTnykZE=qXiQZD?p=p`&1AXi=-9 zP;F>nX{e)MW?)%c%gG_CtZy9@pPiGNm*2wx28@glnt>Ne!>HcPPZ$+hH?wo_a{#T| XEXeeoc{0C% - filter(genus == "Dipodomys")%>% - mutate(date = lubridate::ymd(paste(year, month, day, sep = "-"))) -``` - - -::: questions -- What is a minimal reproducible dataset and why do I need it? -- What do I need to include in a minimal reproducible dataset? -- How do I create a minimal reproducible dataset? -- How do I make my own dataset reproducible? -::: - -::: objectives -After completing this episode, participants should be able to... - -- Describe a minimal reproducible dataset -- List the requirements for a minimal reproducible dataset -- Identify the important aspects of the dataset that are necessary to reproduce your coding problem -- Create a dataset from scratch to reproduce a given piece of code -- Subset an existing dataset to reproduce a given piece of code -- Share your own dataset in a way that is accessible and reproducible -::: - -## 4.1 What is a minimal reproducible dataset and why do I need it? - -Now that we understand some basic errors and how to fix them, let’s look at what to do when we can’t figure out a solution to our coding problem. -This is when you really need to know how to create a minimal reproducible example (MRE) as we talked about in episode 1. -In general, an MRE will need: - A minimal dataset that can reproduce the error (or access to a such a dataset) - Minimal runnable code that can reproduce the error using the minimal dataset - Basic information about the system, R version, and packages being used - In case of random functions, a seed that will produce the same results each time (e.g., use set.seed()) - -The first step in creating an MRE is to set up some data **for your helper to be able to reproduce your error and play around with the code in order to fix it.** - -Why? -Remember our IT problem? -It would be a lot easier for the IT support person to fix your computer if they could actually touch it, see the screen, and click around. - -Another *example:* you're knitting a sweater and one of the sleeves looks wonky. -You call a friend and ask why it's messed up. -They can't possibly help without being able to hold the sweater and look at the stitches themselves. - -It would be great if we were able to give the helper the entire dataset, but usually, we can't. - -::: callout -There are several reasons why you might need to create a separate dataset that is minimal and reproducible instead of trying to use your actual dataset. -The original dataset may be: - -- too large: the Portal dataset is \~35,000 rows with 13 columns and contains data for decades. That's a lot! -- private - your dataset might not be published yet, or maybe you're studying an endangered species whose locations can't easily be shared. Another example: many medical datasets cannot be shared publically. -- hard to send - on most online forums, you can't attach supplemental files (more on this later). Even if you are just sending data to a colleague, file paths can get complicated, the data might be too large to attach, etc. -- complicated - it would be hard to locate the relevant information. One example to steer away from are needing a 'data dictionary' to understand all the meanings of the columns (e.g. what is "plot type" in `ratdat`?) We don't our helper to waste valuable time to figure out what everything means. -- highly derived/modified from the original file. As an example, you may have already done a bunch of preliminary data wrangling and you don't want to include all that code when you send the example (see later: the minimal code section), so you need to provide the intermediate dataset directly to your helper. -::: - -It's useful to strip the dataset to its essential parts to identify where exactly the problem is. -A minimal dataset is a dataset that includes the information necessary to run the code, but removes all other unnecessary parts (extra columns/rows, extra context, etc.) - -We need minimal reproducible datasets to make it easy/simple/fast for the helper to focus in on the problem at hand and "get their hands dirty" tinkering with the dataset. - - -## 4.2 What do I need to include in a minimal reproducible dataset? - -What needs to be included in a reproducible dataset? - -::: instructor -Ask the audience, wait for them to respond -::: - -It's actually all in the name: - -- it needs to be minimal, which means it only contains the necessary information to run the piece of code with which you are struggling. -- it needs to be reproducible. The data you provide must consistently reproduce the output or error with which you are struggling. -- Can alternatively think of this as "relevant" to the problem. Maybe a silly example but–if you're struggling with the behavior of a column that's a factor, it wouldn't be useful to make a dataset that contains only numeric data, no matter how simple and elegant that dataset might appear to be. -- It needs to be complete and accessible. - -Remember, your helper may not be in the room with you or have access to your computer, therefore the data must be able to stand on its own, it must be complete and free of dependencies. -Alternatively, you must ensure your helper has access to the dataset you provide. -More on this later. - -::: callout -Remember that helpers do not have access to files on your computer! - -You might be used to always reading data in as separate files, but helpers can't access those files. -Even if you sent someone a file, they would need to put it in the right directory, make sure to load it in exactly the same way, make sure their computer can open it, etc. -Since the goal is to make everyone's lives as easy as possible, we need to think about our data in a different way–as a dummy object created in the script itself. -::: - -::: callout -It can be helpful to simply look at the ?h -elp section. -Scroll down to where they have examples. -These will usually be minimal and reproducible. - -For example, let's look at the function `mean`: - -``` r -?mean -``` - -We see examples that can be run directly on the console, with no additional code. - -``` r -x <- c(0:10, 50) -xm <- mean(x) -c(xm, mean(x, trim = 0.10)) -``` -::: - -:::: challenge -### Exercise 1 - -These datasets are not well suited for use in a reprex. -For each one, try to reproduce the dataset on your own in R. -Does it work? -What happened? -Explain. - -A) (A screenshot of our dataset -- need to upload images to repo) -B) `sample_data <- read_csv(“/Users/kaija/Desktop/RProjects/ResearchProject/data/sample_dataset.csv”)` -C) `dput(complete_old[1:100,])` -D) `sample_data <- data.frame(species = species_vector, weight = c(10, 25, 14, 26, 30, 17))` - -::: solution -### Solution - -A) Not reproducible because it is a screenshot.\ -B) Not reproducible because it is a path to a file that only exists on someone else’s computer and therefore you do not have access to it using that path. -C) Not minimal, it has far too many columns and probably too many rows. It is also not reproducible because we were not given the source for “complete_old.” -D) Not reproducible because we are not given the source for “species_vector.” -::: - -For an extra challenge, can you edit the above datasets to make them minimal and reproducible? -:::: - -:::: challenge -### Exercise 2 - -Here is a piece of code that is throwing me a simple error (it is returning NA): - -``` r -mean(rodents$weight) -``` - -Which of the following represents a minimal reproducible dataset for this code? -Can you describe why the other ones are not? - -A) `sample_data <- data.frame(month = rep(7:9, each = 2), hindfoot_length = c(10, 25, 14, 26, 30, 17))` -B) `sample_data <- data.frame(weight = rnorm(10))` -C) `sample_data <- data.frame(weight = c(100, NA, 30, 60, 40, NA))` -D) `sample_data <- sample(rodents$weight, 10)` -E) `sample_data <- rodents_modified[1:20,]` - -::: solution -### Solution - -The correct answer is C! - -A) does not include the variable of interest (weight). -B) does not produce the same problem (NA result with a warning message)--the code runs just fine. -C) is not reproducible. Sample randomly samples 10 items; sometimes it may include NAs, sometime it may not (not guaranteed to reproduce the error). It can be used if a seed is set (see next section for more info). -D) uses a dataset that isn't accessible without previous data wrangling code–the object rodents_modified doesn't exist. -::: -:::: - -## 4.3 How do I create a minimal reproducible dataset? - -This is where I often get stuck: how do I recreate the key elements on my dataset in order to reproduce my error?? -That seems really hard! -If you are like me and find this initially overwhelming, don’t worry. -We will break it down into smaller steps. -First, there are two approaches to providing a dataset. -You can create one from scratch or you can use an already available dataset. - -We then need to know which elements of your dataset are necessary. -- How many variables? -- What data type is each variable? -- How many levels or observations are necessary? -- How many of the values need to be the same/different? - -### 4.3.1 Create a dummy dataset from scratch - -You can create vectors using - -``` r - vector <- c(1,2,3,4) -``` - -You can add some randomness by sampling from a vector using sample() - -For example you can sample numbers 1 through 10 in a random order - -``` r - x <- sample(1:10) -``` - -Or you can use a random normal distribution - -``` r - x <- rnorm(10) -``` - -You can also use letters to create factors. - -``` r -x <- sample(letters[1:4], 20, replace=T) -``` - -You can create a dataframe using `data.frame` (or `tibble` in the `dplyr` package). - -``` r - data <- data.frame (x = sample(letters[1:3], 20, replace=T]), y = rnorm(1:20)) -``` - -Make sure you name your variables and keep it simple. - -::: callout -For more handy functions for creating data frames and variables, see the cheatsheet. -For some questions, specific formats can be needed. -For these, one can use any of the provided as.someType functions: `as.factor`, `as.integer`, `as.numeric`, `as.character`, `as.Date`, `as.xts`. -::: - -Let's come back to our kangaroo rats example. -Since we will be working with the same dataset this year, we want to know how many kangaroo rats of each species were found in each plot type in past years so that we can better estimate what sample size we can expect. - -Here is the code you use: - -```r -krats %>% - ggplot(aes(x = date, fill = plot_type)) + - geom_histogram(alpha=0.6)+ - facet_wrap(~species)+ - theme_bw()+ - scale_fill_viridis_d(option = "plasma")+ - geom_vline(aes(xintercept = lubridate::ymd("1988-01-01")), col = "red") -``` - -Now let's say we saw this and decided we wanted to get rid of "sp." but didn't know how. -We want to ask someone online but we first need to create a minimal reproducible example. - -What variables would we need to reproduce this figure? - -A variable for species, how many? -4. -Let’s call them A, B, C, and D. - -``` r -species <- c('A','B','C','D') -``` - -We then need a variable for plot type, we have 5 but we could cut it down to 2; let's call them P1 and P2. -In reality, we probably don't even need this for this question, but for the sake of practicing let's add it in. - -``` r -plot.type <- c('P1','P2') -``` - -Lastly we need a variable for date. -The specifics don't matter here, so let's just call it days and make it 1-10. - -``` r -days <- c(1:10) -``` - -Great! -Now we have all of our variables, let's go sampling--we can simulate the data collected each day by using `sample()`. - -We need to sample each plot for 10 days and presumably find a certain number of each species in each plot. -Let's pretend we find a total of 100 species, this mean we need to sample the vector `species` 100 times. -Since we want species to repeat, we will also set replace to T. -We then need to sample the plots for the same number of times, so that each species sample is associated with either P1 or P2. -Lastly, we want to add each day we sampled. - -``` r -sample_data <- data.frame( - Species = sample(species, 100, replace=T), - Plot = sample(plot.type, 100, replace=T), - Day = day -) -``` - -Great! -Now we have a sample data set that is minimal, but is is reproducible? - -::: instructor -Give them time to think about it and answer the question. -::: - -It isn't! -Why? - -Remember: sample() creates a random dataset! -This will not be consistently reproducible. -In order to make this example fully reproducible we should first `set.seed()`. - -``` r -set.seed(1) -sample_data <- data.frame( - Species = sample(species, 100, replace=T), - Plot = sample(plot.type, 100, replace=T), - Day = day -) -sample_data -``` - -Now we have our minimal reproducible example! -But are we sure it reproduces what we are trying to reproduce? -Let's test it out. - -``` r -sample_data %>% - ggplot(aes(x = Day, fill = Plot)) + - geom_histogram(alpha=0.6)+ - facet_wrap(~Species)+ - theme_bw()+ - scale_fill_viridis_d(option = "plasma") -``` - -Yes! -It is certainly simplified, but it has the elements we want it to have. -And now we can ask how to get rid of "C". -Given that this was a very simple question, we could have simplified this example even further; we could have used 2 species and even just 2 days, in which case a simple solution could be - -``` r -sample_data2 <- data.frame( -species = sample(c('A','B'), 100, replace = T), -days = 1:2 -) - -sample_data2 %>% - ggplot(aes(x=days)) + - geom_histogram(alpha=0.6)+ - facet_wrap(~species)+ - theme_bw() -``` -which is even more simplistic than the one before but still contains the elemnts we are interested--we have a set of "species" separated into facets and we want to get rid of one of them. In reality, had we realized that we needed to get rid of the rows with "sp." in them, we could have ignored the figure entirely and posed the question about the data alone. E.g., "how do I remove rows that contain a specific name?" Then give just the example dataset we created. - -::: challenge -### Exercise 3 (10 minutes) - -Now practice doing it yourself. -Create a data frame with: - -A. One categorical variable with 5 levels. One continuous variable. -B. One continuous variable normally distributed -C. First name, last name, sex, age, and treatment type. -::: - -## 4.3.2 Create a dataset using an existing dataset - -If you don't want to create a dataset from scratch, maybe because you have too many variables or it's a more complicated structure and you are not sure where the error is, you can subset from an existing dataset. Useful functions for subsetting a dataset include `subset()`, `head()`, `tail()`, indexing with [] (e.g., iris[1:4,]). Alternatively, you can use tidyverse functions like `select()`, and `filter()`. You can also use the same `sample()` functions we covered earlier. - -A list of readily available datasets can be found using `library(help="datasets")`. You can then use `?` in front of the dataset name to get more information about what the contents of the dataset. - -When working with a built-in dataset you still have to edit your code to fit the new data, but it is probably faster than building a large dataset from scratch, and it gets easier with practice! - -Let's keep using our previous example, how can we reproduce that figure using the existing dataset `mpg`. First, let's interrogate this dataset to see what we are working with. - -```r -?mpg -``` - -Which variable from mpg do you think we could use to replace our variables? Remember: we need one for species, one for plot type, and one for date. - -There are certainly multiple options! Let's go with model for species, manufacturer for plot type, and year for date. - -```r -data <- mpg %>% select(model, manufacturer, year) -dim(data) -glimpse(data) -``` - -We only need 4 species, and 5 plots. How many do we have here? - -```r -length(unique(data$model)) -length(unique(data$manufacturer)) -``` - -Certainly more than we need. Then let's simplify. - -```r -set.seed(1) -data <- data %>% - filter(model %in% sample(model, 4, replace = F)) -``` -Cool, now we have just 4 models. BUT we also only have 2 years... so maybe year wasn't the best choice afterall, let's change it to hwy - -```r -data <- mpg %>% select(model, manufacturer, hwy) %>% - filter(model %in% sample(model, 4, replace = F)) -``` - -Now we can try out plot - -```r -data %>% - ggplot(aes(x = hwy, fill = manufacturer)) + - geom_histogram(alpha=0.6)+ - facet_wrap(~model)+ - theme_bw()+ - scale_fill_viridis_d(option = "plasma") -``` - -Do you think that works? - -It turns out that maybe manufacturer was not the best representation for plot, since we do need each car model to appear in each "plot". What would all cars have? - -Let's change model to manufacturer, and let's add class. - -```r -set.seed(1) -data2 <- mpg %>% select(manufacturer, class, hwy) %>% - filter(manufacturer %in% sample(manufacturer, 4, replace = F)) - -data2 %>% - ggplot(aes(x = hwy, fill = class)) + - geom_histogram(alpha=0.6)+ - facet_wrap(~manufacturer)+ - theme_bw()+ - scale_fill_viridis_d(option = "plasma") -``` -That's more like it! You can keep playing around with it or you can give it more thought apriori, but either way you get the idea. While what we get is not an exact replica, it's an analogy. The important thing is that we created a figure whose basic elements/structure or "key features" remain intact--namely, the number and type of variables and categories. - -Now it is your turn! - -::: challenge -For each of the following, identify which data are necessary to create a minimal reproducible dataset using `mpg`. -A) We want to know how the highway mpg has changed over the years -B) We need a list of all "types" of cars and their fuel type for each manufacturer -C) We want to compare the average city mpg for a compact car from each manufacturer -::: - -**OR change the above challenge to be about ratdat** - -**OR move to...** - -Now that we know how many of each species were captured over the years, we want to know how many of each species you might expect to catch per day. - -Let's practice how we would do this with our data. - -::: instructor -Let the students try it out and discuss outloud -::: - -We end up with the following code: - -```r -krats_per_day <- krats %>% - group_by(date, year, species) %>% - summarize(n = n()) %>% - group_by(species) - -krats_per_day %>% - ggplot(aes(x = species, y = n))+ - geom_boxplot(outlier.shape = NA)+ - geom_jitter(width = 0.2, alpha = 0.2)+ - theme_classic()+ - ylab("Number per day")+ - xlab("Species") -``` - -::: challenge -How might you reproduce this using the mpg dataset? -::: - -::: solution - Substitute krats with cars, species with class, date with year. The question becomes, how many cars of each class are produced per year? - ```{r} - set.seed(1) - cars_per_y <- mpg %>% - filter(class %in% sample(class, 4, replace=F)) %>% - group_by(class, year) %>% - summarize(n=n()) %>% - group_by(class) - - cars_per_y %>% - ggplot(aes(x = class, y = n))+ - geom_boxplot(outlier.shape = NA)+ - geom_jitter(width = 0.2, alpha=0.2)+ - theme_classic()+ - ylab("Cars per year")+ - xlab("Class") - - # this is only giving us 3 classes even though we asked for 4, why? - - # Because it is sampling from the column "class" which has many of the same class. - # Therefore, we need to specify that we want to sample from within the unique values in "class". - - cars_per_y <- mpg %>% - filter(class %in% sample(unique(mpg$class), 4, replace=F)) %>% - group_by(class, year) %>% - summarize(n=n()) %>% - group_by(class) - - cars_per_y %>% - ggplot(aes(x = class, y = n))+ - geom_boxplot(outlier.shape = NA)+ - geom_jitter(width = 0.2, alpha=0.2)+ - theme_classic()+ - ylab("Cars per year")+ - xlab("Class") - ``` -::: - -## Using your own data by creating a minimal subset - -Perhaps you are now thinking that if you can use a subset of an existing dataset, wouldn't it be easier to just subset my own data to make it minimal? You are not wrong. There are cases when you can subset your own data in the same way you would subset an existing dataset to make a minimal dataset, the key is to then make it reproducible. That's when we use the function `dput`, which essentially takes your dataframe and give you code to reproduce it! - -For example, using our previous data2 - -```r -dput(cars_per_y) -``` - -As you can see, even with our minimal dataset, it is still quite a chunk of code. What if you tried putting in krats_per_day? It is clear that either way you will still need to considerably minimize your data. Even then, it will often be simpler to provide an existing dataset or provide one from scratch. Furthermore, often we are able to discover the source of our error or solve our own problem when we have to go through the process of breaking it down into its essential components! - -Nevertheless, it remains an option for when your data appears too complex or you are not quite sure where your error lies and therefore are not sure what minimal components are needed to reproduce the example. - -::: callout -*What about NAs?* If your data has NAs and they may be causing the problem, it is important to include them in your MR dataset. You can find where there are NAs in your dataset by using `is.na`, for example: `is.na(krats$weight)`. This will return a logical vector or TRUE if the cell contains an NA and FALSE if not. -The simplest way to include NAs in your dummy dataset is to directly include it in vectors: `x <- c(1,2,3,NA)`. You can also subset a dataset that already contains NAs, or change some of the values to NAs using `mutate(ifelse())` or substitute all the values in a column by sampling from within a vector that contains NAs. - -One important thing to note when subsetting a dataset with NAs is that subsetting methods that use a condition to match rows won’t necessarily match NA values in the way you expect. For example -```r -test <- data.frame(x = c(NA, NA, 3, 1), y = rnorm(4)) -test %>% filter(x != 3) -# you might expect that the NA values would be included, since “NA is not equal to 3”. But actually, the expression NA != 3 evaluates to NA, not TRUE. So the NA rows will be dropped! -# Instead you should use is.na() to match NAs -test %>% filter(x != 3 | is.na(x)) -``` -::: - -Here are some more practice exercises if you wish to test your knowledge - -**(I copied these from excercise 6 in the google doc... but I'm not sure that they are getting at the point of the lesson...) -::: challenge -**Excercise:** Each of the following examples needs your help to create a dataset that will correctly reproduce the given result and/or warning message when the code is run. Fix the dataset shown or fill in the blanks so it reproduces the problem. -A) set.seed(1) - sample_data <- data.frame(fruit = rep(c(“apple”, “banana”), 6), weight = rnorm(12)) - ggplot(sample_data, aes(x = fruit, y = weight)) + geom_boxplot() - **HELP: how do I insert an image from clipboard?? Is it even possible?** - -B) bodyweight <- c(12, 13, 14, __, __) - max(bodyweight) - [1] NA - -C) sample_data <- data.frame(x = 1:3, y = 4:6) - mean(sample_data$x) - [1] NA - Warning message: - In mean.default(sample_data$x): argument is not numeric or logical: returning NA - -D) sample_data <- ____ - dim(sample_data) - NULL -::: -::: solution - A) "fruit" needs to be a factor and the order of the levels must be specified: - `sample_data <- data.frame(fruit = factor(rep(c("apple", "banana"), 6), levels = c("banana", "apple")), weight = rnorm(12))` - B) one of the blanks must be an NA - C) **?? + what's really the point of this one?** - D) `sample_data <- data.frame(x = factor(1:3), y = 4:6)` -::: - - -::: keypoints -- A minimal reproducible dataset contains (a) the minimum number of lines, variables, and categories, in the correct format, to reproduce a certain problem; and (b) it must be fully reproducible, meaning that someone else can reproduce the same problem using only the information provided. -- You can create a dataset from scratch using `as.data.frame`, you can use available datasets like `iris` or you can use a subset of your own dataset -- You can share your data by... -::: From 3bce90e66f206a8a5c5f28ffef480adedb2c30a5 Mon Sep 17 00:00:00 2001 From: Xochitl Ortiz-Ross Date: Tue, 3 Dec 2024 15:36:51 -0800 Subject: [PATCH 2/3] Squashed commit of the following: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 04fc52c24ec0734aceece6e21c2a710b141e69d2 Merge: 66501ae ff995ea Author: Xochitl Ortiz-Ross Date: Tue Dec 3 14:04:57 2024 -0800 Merge pull request #79 from carpentries-incubator/peter_edits changed relative path to load data commit ff995ea405d348c6de4f60287df1cdf1821019a9 Author: peterlaurin Date: Tue Dec 3 14:01:18 2024 -0800 changed relative path to load data commit 66501ae7d7bab741a3416b3a75b735252fc82c5b Merge: b2ef676 c7d87b9 Author: Xochitl Ortiz-Ross Date: Tue Dec 3 13:34:33 2024 -0800 Merge pull request #78 from carpentries-incubator/peter_edits added identify the problem episode for review, Dec2024. Changed to Rm… commit c7d87b9d9c82f2ba13528e42f9b1cb7d8e0b6b06 Author: peterlaurin Date: Tue Dec 3 12:44:26 2024 -0800 removed open tag for carpentries commit d4196912c762e8c192fd23534e84ccda4c7b81e5 Author: peterlaurin Date: Tue Dec 3 12:35:55 2024 -0800 fixed further bugs commit ad93365dad0a6f15d4b8c974cf15cf4372369503 Author: peterlaurin Date: Tue Dec 3 12:19:41 2024 -0800 fixed some last-minute R bugs commit b2ef67655cc7f64f574d19182c90ae2fec0570f9 Merge: d8b86da d82aa3b Author: peterlaurin <69608241+peterlaurin@users.noreply.github.com> Date: Tue Dec 3 12:04:53 2024 -0800 Merge pull request #77 from carpentries-incubator/xochitl_edits small edits commit c1af3046c013bd5bf05a2867b11d00c52f1ba330 Merge: 97eed2e d8b86da Author: peterlaurin <69608241+peterlaurin@users.noreply.github.com> Date: Tue Dec 3 12:01:33 2024 -0800 Merge branch 'main' into peter_edits commit 97eed2e397b10434f935c59b94d1a04df503a292 Author: peterlaurin Date: Tue Dec 3 11:41:51 2024 -0800 changed config to allow Rmd file for Peter's episode to run commit 3a8d5b8c98c0312fbbfce7cd183646f5f865b0f3 Author: peterlaurin Date: Tue Dec 3 11:25:36 2024 -0800 added identify the problem episode for review, Dec2024. Changed to Rmd to run R. commit d8b86daa937114b03fed956e2ff36c9b7885d86a Merge: 96901a7 782e862 Author: peterlaurin <69608241+peterlaurin@users.noreply.github.com> Date: Fri Nov 22 17:38:06 2024 -0800 Merge pull request #73 from carpentries-incubator/minimal-reproducible-code-draft Minimal reproducible code draft -- Resolved discrepancy in yaml with md and Rmd commit 782e8626d709fafc2935987681551097cfa2883c Merge: 761d64f 96901a7 Author: peterlaurin <69608241+peterlaurin@users.noreply.github.com> Date: Fri Nov 22 17:32:47 2024 -0800 Merge branch 'main' into minimal-reproducible-code-draft commit 96901a71051bcdd261b472da475adf0de57903b4 Merge: 5a23b4b fa15716 Author: peterlaurin <69608241+peterlaurin@users.noreply.github.com> Date: Fri Nov 22 17:30:07 2024 -0800 Merge pull request #65 from carpentries-incubator/kaijagahm-patch-3 Update pull_request_template.md commit 761d64f5286e2f061330d2d316f556302965ab53 Author: Kaija Gahm Date: Fri Nov 22 17:00:33 2024 -0800 Draft entire episode except for exercise content commit 7ce1de302f7547fe3acce1216dc218b32e65e179 Author: Kaija Gahm Date: Fri Nov 22 10:58:19 2024 -0800 fixed merge conflicts commit 5a23b4bd696fa37fdddd16ab8298daa198b976a3 Merge: 2335e86 ea9aecf Author: Kaija Gahm <37053323+kaijagahm@users.noreply.github.com> Date: Wed Nov 6 13:58:13 2024 -0800 Merge pull request #69 from carpentries-incubator/xochitl_edits Xochitl edits commit 449d69f60273af590f7997126bf154f75baefa99 Merge: 5c85579 5210f72 Author: Kaija Gahm Date: Wed Oct 23 14:57:12 2024 -0700 fixed merge conflict Merge branch 'minimal-reproducible-code-draft' of github.com:carpentries-incubator/R-help-reprexes into minimal-reproducible-code-draft # Conflicts: # episodes/5-minimal-reproducible-code.Rmd commit 5c8557964adcc5f7dd9b97a83f7303d4b3ec1cf6 Author: Kaija Gahm Date: Wed Oct 23 14:29:36 2024 -0700 15:00 end work session commit 5210f72dbc39225aa55a19eee537d7ddc690b366 Author: Kaija Gahm Date: Wed Oct 23 14:29:36 2024 -0700 2:30 end work session commit 3ccd70ebf6107083bebc5b7c7a4d6942494d9073 Author: Kaija Gahm Date: Wed Oct 23 14:03:56 2024 -0700 sketching more exercises. reordered road map slightly. commit 18170020ab6841711676807bfd0859c8985c7a31 Author: Kaija Gahm Date: Thu Oct 17 13:16:40 2024 -0700 remembered ??fun shortcut for searching commit 22659111c35e22c616327fc74fd1d2ff08c66a64 Author: Kaija Gahm Date: Wed Oct 16 15:00:23 2024 -0700 First draft of lesson text for the first part of the road map commit c73f662808ad586fec30a4c22777f39055873368 Author: Kaija Gahm Date: Wed Oct 16 14:39:05 2024 -0700 full episode outline commit d95d7cc9c532014081661be71d62130eb6e1c41c Author: Kaija Gahm Date: Wed Oct 16 14:26:29 2024 -0700 Updated LOs for Ep5; moved {reprex} to Ep6 commit 8c33a45f73c5fd3711cb4723feb602ccc3d68278 Author: Kaija Gahm Date: Wed Oct 16 14:13:58 2024 -0700 Moved progress from last week to .Rmd from .md commit fa1571679e0aa1beeba69746430a74038f35a71f Author: Kaija Gahm <37053323+kaijagahm@users.noreply.github.com> Date: Wed Oct 2 14:36:28 2024 -0700 Update pull_request_template.md Just a simple update for our own use commit 2ec2b16cb3f54064c3426aeea83d2204cb0c8705 Author: Kaija Gahm Date: Wed Oct 2 14:03:26 2024 -0700 First crack at narrative and a first exercise commit dcb0d48a799014104e81b93f0b7dfeca3ce6d690 Author: Kaija Gahm Date: Wed Oct 2 13:27:14 2024 -0700 added LOs from google doc, and corresponding questions --- .github/pull_request_template.md | 15 +- config.yaml | 15 +- episodes/3-identify-the-problem.Rmd | 279 ++++ episodes/3-identify-the-problem.md | 59 - .../5-minimal-reproducible-code-draft.Rmd | 166 +++ episodes/5-minimal-reproducible-code.Rmd | 436 ++++++ episodes/5-minimal-reproducible-code.md | 19 - episodes/6-asking-your-question.md | 8 +- renv/activate.R | 1305 +++++++++++++++++ renv/profile | 1 + renv/profiles/lesson-requirements/renv.lock | 1092 ++++++++++++++ .../lesson-requirements/renv/.gitignore | 7 + .../lesson-requirements/renv/settings.json | 19 + 13 files changed, 3328 insertions(+), 93 deletions(-) create mode 100644 episodes/3-identify-the-problem.Rmd delete mode 100644 episodes/3-identify-the-problem.md create mode 100644 episodes/5-minimal-reproducible-code-draft.Rmd create mode 100644 episodes/5-minimal-reproducible-code.Rmd delete mode 100644 episodes/5-minimal-reproducible-code.md create mode 100644 renv/activate.R create mode 100644 renv/profile create mode 100644 renv/profiles/lesson-requirements/renv.lock create mode 100644 renv/profiles/lesson-requirements/renv/.gitignore create mode 100644 renv/profiles/lesson-requirements/renv/settings.json diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index f7b3e630..1954bd1d 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,8 +1,13 @@ -If this pull request addresses an open issue on the repository, please add 'Closes #NN' below, where NN is the issue number. +----- TEMPLATE: PLEASE FILL IN ------ +# Description -Please briefly summarise the changes made in the pull request, and the reason(s) for making these changes. +*Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change.* -If any relevant discussions have taken place elsewhere, please provide links to these. +Fixes # [ISSUE NUMBER] -For review requests pertaining to specific episodes, please tag `xortizross` for lesson 4 (minimal reproducible data), `peterlaurin` for -lesson 3 (identify the problem) and `kaijagahm` for lesson 1 and 6. For other lessons, feel free to tag whomever. +# Tag maintainers +*Please tag the appropriate maintainer, depending on which episode your change pertains to* +**Identify the Problem**: tag @peterlaurin +**Minimal Reproducible Data**: tag @xortizross +**Minimal Reproducible Code**: tag @kaijagahm +**Anything else**: use your judgment! diff --git a/config.yaml b/config.yaml index 70f700b6..3fa071c6 100644 --- a/config.yaml +++ b/config.yaml @@ -1,3 +1,4 @@ + #------------------------------------------------------------ # Values for this lesson. #------------------------------------------------------------ @@ -58,26 +59,24 @@ contact: 'kgahm@ucla.edu' # - another-learner.md # Order of episodes in your lesson -episodes: +episodes: - 1-intro-reproducible-examples.md - 2-understanding-your-code.md -- 3-identify-the-problem.md +- 3-identify-the-problem.Rmd - 4-minimal-reproducible-data.Rmd -- 5-minimal-reproducible-code.md +- 5-minimal-reproducible-code.Rmd - 6-asking-your-question.md # Information for Learners -learners: +learners: # Information for Instructors -instructors: +instructors: # Learner Profiles -profiles: +profiles: # Customisation --------------------------------------------- # # This space below is where custom yaml items (e.g. pinning # sandpaper and varnish versions) should live - - diff --git a/episodes/3-identify-the-problem.Rmd b/episodes/3-identify-the-problem.Rmd new file mode 100644 index 00000000..eac1a5d5 --- /dev/null +++ b/episodes/3-identify-the-problem.Rmd @@ -0,0 +1,279 @@ + +--- +title: "Identify the problem and make a plan" +teaching: 0 +exercises: 0 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- What do I do when I encounter an error? +- What do I do when my code outputs something I don’t expect? +- Why do errors and warnings appear in R? +- Which areas of code are responsible for errors? +- How can I fix my code? What other options exist if I can't fix it? + + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +After completing this episode, participants should be able to... + +- decode/describe what an error message is trying to communicate +- Identify specific lines and/or functions generating the error message +- Lookup function syntax, use, and examples using R Documentation (?help calls) +- Describe a general category of error message (e.g. syntax error, semantic errors, package-specific errors, etc.) # be more explicit about semantic errors? +- Describe the output of code you are seeking. ## identify relevant warnings or code output +- Identify and quickly fix commonly-encountered R errors ###### what was I thinking here +- Identify which problems are better suited for asking for further help, including online help and reprex + +:::::::::::::::::::::::::::::::::::::::::::::::: + + +```{r} +library(readr) +library(dplyr) +library(ggplot2) +library(stringr) + +# Read in the data +rodents <- read_csv("data/surveys_complete_77_89.csv") + +``` + + +As simple as it may seem, the first step we'll cover is what to do when encountering an error or other undesired output from your code. It is our experience that many seemingly-impossible errors can be fixed on the route to create a reproducible example for an expert helper. With this episode, we hope to teach you the basics about identifying how your error might be occurring, and how to isolate the problem for others to look at. + + +## 3.1 What do I do when I encounter an error? + +Something that may go wrong is an error in your code. By this, we mean any code which generates an error message. This happens when R is unable to run your code, for a variety of reasons: some common ones include R being unable to read or interpret your commands, expecting different input types than those inputted, and user-written errors from checks you or other package creators have added to ensure your code is running correctly. + +The accompanying error message attempts to tell you exactly how your code failed. For example, consider the following error message that occurs when I run this command in the R console: + +```{r,error=T} +ggplot(x = taxa) + geom_bar() +``` + +Though we know somewhere there is an object called `taxa` (it is actually a column of the dataset `rodents`), R is trying to communicate that it cannot find any such object in the local environment. Let's try again, appropriately pointing ggplot to the `rodents` dataset and `taxa` column using the `$` operator. + +```{r, error=T} +ggplot(aes(x = rodents$taxa)) + geom_bar() +``` + + + Whoops! Here we see another error message -- this time, R responds with a perhaps more-uninterpretable message. + +Let's go over each part briefly. First, we see an error from a function called `fortify`, which we didn't even call! Then a much more helpful informational message: Did we accidentally pass `aes()` to the `data` argument? This does seem to relate to our function call, as we do pass `aes`, potentially where our data should go. A helpful starting place when attempting to decipher an error message is checking the documentation for the function which caused the error: + +` ?ggplot` + +Here, a Help window pops up in RStudio which provides some more information. Skipping the general description at the top, we see ggplot takes positional arguments of `data`, **then** `mapping`, which uses the `aes` call. We can see in "Arguments" that the `aes(x = rodents$taxa)` object used in the plot is attempted by `fortify` to be converted to a data.frame: now the picture is clear! We accidentally passed our `mapping` argument (telling ggplot how to map variables to the plot) into the position it expected `data` in the form of a data frame. And if we scroll down to "Examples", to "Pattern 1", we can see exactly how ggplot expects these arguments in practice. Let's amend our result: + +```{r} +ggplot(rodents, aes(x = taxa)) + geom_bar() +``` + + +Here we see our desired plot. + + + +## Summary + +In general, when encountering an error message for which a remedy is not immediately apparent, some steps to take include: + +1. Reading each part of the error message, to see if we can interpret and act on any. + +2. Pulling up the R Documentation for that function (which may be specific to a package, such as with ggplot). + +3. Reading through the documentation's Description, Usage, Arguments, Details and Examples entries for greater insight into our error. + +4. Copying and pasting the error message into a search engine for more interpretable explanations. + +And, when all else fails, preparing our code into a reproducible example for expert help. + + +## 3.2 What do I do when my code outputs something I don't expect + +Another type of 'error' which you may encounter is when your R code runs without errors, but does not produce the desired output. You may sometimes see these called "semantic errors" (as opposed to syntax errors, though these term themselves are vague within computer science and describe a variety of different scenarios). As with actual R errors, semantic errors may occur for a variety of non-intuitive reasons, and are often harder to solve as there is no description of the error -- you must work out where your code is defective yourself! + +With our rodent analysis, the next step in the plan is to subset to just the `Rodent` taxa (as opposed to other taxa: Bird, Rabbit, Reptile or NA). Let's quickly check to see how much data we'd be throwing out by doing so: + +```{r} +table(rodents$taxa) +``` + +We're interested in the Rodents, and thankfully it seems like a majority of our observations will be maintained when subsetting to rodents. Except wait. In our plot, we can clearly see the presence of NA values. Why are we not seeing them here? This is an example of a semantic error -- our command is executed correctly, but the output is not everything we intended. Having no error message to interpret, let's jump straight to the documentation: + +```{r} +?table +``` + +Here, the documentation provides some clues: there seems to be an argument called `useNA` that accepts "no", "ifany", and "always", but it's not immediately apparent which one we should use. As a second approach, let's go to `Examples` to see if we can find any quick fixes. Here we see a couple lines further down: + +```r +table(a) # does not report NA's +table(a, exclude = NULL) # reports NA's +``` + +That seems like it should be inclusive. Let's try again: + +```{r} +table(rodents$taxa, exclude = NULL) +``` +Now, we do see that by subsetting to the "Rodent" taxa, we are losing about 357 NAs, which themselves could be rodents! However, in this case, it seems a small enough portion to safely omit. Let's subset our data to the rodent taxa + +```{r} +rodents <- rodents %>% filter(taxa == "Rodent") +``` + +## Summary + +In general, when encountering a semantic error for which a remedy is not immediately apparent, some steps to take include: + +1. Reading any warning or informational messages that may pop up when executing your code. + +2. Changing the input to your function call to see if the behavior is ... + +2. Pulling up the R Documentation for that function (which may be specific to a package, such as with ggplot). + +3. Reading through the documentation's Description, Usage, Arguments, Details and Examples entries for greater insight into our error. + +And, when all else fails, preparing our code into a reproducible example for expert help. Note, there are fewer options available as when an error message prevents your code from running. You may find yourself isolating and reproducing your problem more often with semantic errors as easily solvable syntax errors. + + +::::::::::::::::::::: callout + +Generally, the more your code deviates from just using base R functions, or the more you use specific packages, both the quality of documentation and online help available from search engines and Googling gets worse and worse. While base R errors will often be solvable in a couple of minutes from a quick `?help` check or a long online discussion and solutions on a website like Stack Overflow, errors arising from little-used packages applied in bespoke analyses might merit isolating your specific problem to a reproducible example for online help, or even getting in touch with the developers! Such community input and questions are often the way packages and documentation improves over time. + +::::::::::::::::::::: + +## 3.3 How can I find where my code is failing? + +Isolating your problem may not be as simple as assessing the output from a single function call on the line of code which produces your error. Often, it may be difficult to determine which line(s) in your code are producing the error. + +Consider the example below, where we now are attempting to see how which species of kangaroo rodents appear in different plot types over the years. + + +```{r, error=T} +krats <- rodents %>% filter(genus == "Dipadomys") #kangaroo rat genus + +ggplot(krats, aes(year, fill=plot_type)) + +geom_histogram() + +facet_wrap(~species) + +``` + +Uh-oh. Another error here, this time in "combine_vars?" What is that? "Faceting variables must have at least one value": What does that mean? + +Well it may be clear enough that we seem to be missing "species" values where we intend. Maybe we can try to make a different graph looking at what species values are present? Or perhaps there's an error earlier -- our safest approach may actually be seeing what krats looks like: + + +```{r} +krats +``` +It's empty! What went wrong with our "Dipadomys" filter? + +```{r} +rodents %>% count(genus) +``` +We see two things here. For one, we've misspelled Dipodomys, which we can now amend. This quick function call also tells us we should expect a data frame with 9573 values resulting after subsetting to the genus Dipodomys. + +```{r} +krats <- rodents %>% filter(genus == "Dipodomys") #kangaroo rat genus +dim(krats) + +ggplot(krats, aes(year, fill=plot_type)) + +geom_histogram() + +facet_wrap(~species) + +``` + + +Our improved code here looks good. A quick "dim" call confirms we now have all the Dipodomys observations, and our plot is looking better. In general, having a 'print' statement or some other output before plots or other major steps can be a good way to check your code is producing intermediate results consistent with your expectations. + +However, there's something funky happening here. The bins are definitely weirdly spaced -- we can see some bins are not filled with any observations, while those exactly containing one of the integer years happens to contain all the observations for that year. + +:::::::: challenge + +As a group, name some potential problems or undesired outcomes from this graph... + +:::::::: + +:::::::: solution + + - The graph looks sparse, and unevenly so -- many bins have no observations + - Suggests that some years had more observations and others fewer based on somewhat arbitrary measurements (i.e. what calendar year happened to fall on) + - Hard to compare trends across time, or even subsequent years... + +:::::::: + +As we discussed in the challenge, there are some issues to visualizing our data this way. A solution here might be to tinker with the bin width in the histogram code, but let's step back a minute. Do we necessarily need to dive into the depths of tinkering with the plot? We can evalulate this problem not in terms of the plot having a problem, but with our data type having a problem. There's an opportunity to encode the observation times outside of coarse, somewhat arbitrary year groupings with the real, interpretable date they were collected. Let's do that using the tidyverse's 'lubridate' package. The important details here are that we are creating a 'datetime'-type variable using the recorded years, months, and days, which are currently all encoded as numeric types. + +```{r} +krats <- rodents %>% filter(genus == "Dipodomys") #kangaroo rat genus +dim(krats) + +krats <- krats %>% mutate(date = lubridate::ymd(paste(year,month,day,sep='-'))) + +ggplot(krats, aes(date, fill=plot_type)) + +geom_histogram() + +facet_wrap(~species) + +``` + +This looks much better, and is easier to see the trends over time as well. Note our x axis still shows bins with year labelings, but the continuous spread of our data over these bins shows that dates are treated more continuously and fall more continuously within histogram bins. + +:::::::: callout +One aspect we can see with this exercise above is that by setting up a reproducible example, we can isolate the problem with data rather than simply asking a proximal problem (i.e. 'how can i change my plot to look like so'). This allows helpers and you to directly improve your code, but also allows the community to help in identifying the problem. You don't always need to understand what exact lines of code or function calls are going wrong in order to get help! +:::::::: + + +## Summary + +In general, we need to isolate the specific areas of code causing the bug or problem. There is no general rule of thumb as to how large this needs to be, but in general, think about what we would want to include in a reprex. Any early lines which we know run correctly and as intended may not need to be included, and we should seek to isolate the problem area as much as we can to make it understandable to others. + +Let's add to our list of steps for identifying the problem: + +0. Identify the problem area -- add print statements immediately upstream or downstream of problem areas, step into functions, and see whether any intermediate output can be further isolated. + +1. Read each part of the error or warning message (if applicable), to see if we can immediately interpret and act on any. + +2. Pulling up the R Documentation for any function calls causing the error (which may be specific to a package, such as with ggplot). + +3. Reading through the documentation's Description, Usage, Arguments, Details and Examples entries for greater insight into our error. + +4. Copying and pasting the error message into a search engine for more interpretable explanations. + +And, when all else fails, preparing our code into a reproducible example for expert help. + +Whereas before we had a list of steps for addressing spot problems arising in one or two lines, we can now organize identifying the problem into a more organizational workflow. Any step in the above that helps us identify the specific areas or aspects of our code that are failing in particular, we can zoom in on and restart the checklist. We can stop as soon as we don't understand anymore how our code fails, at which point we can excise that area for further help. + + + + +Finally, let's make our plot publication-ready by changing some aesthetics. Let's also add a vertical line to show when the study design changed on the exclosures. + +```{r} +krats <- rodents %>% filter(genus == "Dipodomys") #kangaroo rat genus +dim(krats) + +krats <- krats %>% mutate(date = lubridate::ymd(paste(year,month,day,sep='-'))) + + +krats %>% + ggplot(aes(x = date, fill = plot_type)) + + geom_histogram()+ + facet_wrap(~species, ncol = 1)+ + theme_bw()+ + scale_fill_viridis_d(option = "plasma")+ + geom_vline(aes(xintercept = lubridate::ymd("1988-01-01")), col = "dodgerblue") + +``` + +It looks like the study change helped to reduce merriami sightings in the Rodent and Short-term Krat exclosures. + + + diff --git a/episodes/3-identify-the-problem.md b/episodes/3-identify-the-problem.md deleted file mode 100644 index 808d52bc..00000000 --- a/episodes/3-identify-the-problem.md +++ /dev/null @@ -1,59 +0,0 @@ - ---- -title: "Identify the problem and make a plan" -teaching: 0 -exercises: 0 ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- What do I do when I encounter an error? -- What do I do when my code outputs something I don’t expect? -- Why do errors and warnings appear in R? -- Which areas of code are responsible for errors? -- How can I fix my code? What other options exist if I can't fix it? - - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -After completing this episode, participants should be able to... - -- decode/describe what an error message is trying to communicate -- Identify specific lines and/or functions generating the error message -- Lookup function syntax, use, and examples using R Documentation (?help calls) -- Describe the general category of error message (e.g. syntax error, semantic errors, package-specific errors, etc.) -- Describe the output of code you are seeking -- Identify and quickly fix commonly-encountered R errors -- Identify which problems are better suited for asking for further help, including online help and reprex - -:::::::::::::::::::::::::::::::::::::::::::::::: - - -:::::::::::::::::::::::::::::::::::::: challenge - -### Predict the output from a base R function call - -Which of the following results when running the following line of code: - -```r -length(5, 6, 7) -``` - -a. 3 -b. Error in length(5, 6, 7) : - 3 arguments passed to 'length' which requires 1 -c. NULL -d. 1, 1, 1 - -:::::::::::::: solution - -### Solution Title - -b. Error in length(5, 6, 7) : - 3 arguments passed to 'length' which requires 1 - -::::::::::::::::::::::::: - -:::::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/episodes/5-minimal-reproducible-code-draft.Rmd b/episodes/5-minimal-reproducible-code-draft.Rmd new file mode 100644 index 00000000..25f6ef05 --- /dev/null +++ b/episodes/5-minimal-reproducible-code-draft.Rmd @@ -0,0 +1,166 @@ +--- +title: "Minimal Reproducible Code" +teaching: 10 +exercises: 2 +--- + +::: questions +- Which part of my code is causing an error message or an incorrect result? +- I want to make my code minimal, but where do I even start? +- How do I make non-reproducible code reproducible? +- How do I tell whether a code snippet is reproducible or not? +::: + +::: objectives +- Identify the step that is generating the error +- Implement a stepwise approach to make minimal code +- Edit a piece of code to make it reproducible +- Evaluate whether a piece of code is reproducible as is or not. If not, identify what is missing. +::: + +In the last episode, we focused in on how to make minimal datasets that would reproduce a target piece of code, but we didn't talk much about the code itself. In each example, we already had a piece of code that needed a dataset. But how do you know which part of your code is the problem and should be focused on? + +At this point, we as researchers have been exploring our kangaroo rat data for a while and making several plots. Now we know how many kangaroo rats we can expect to catch, and we can move on to our second research question: Do the rodent exclusion plots actually working at keeping the target species out of certain areas? + +If the exclusion plots work, we would expect differences in abundance in the different plot types. Specifically, we'd expect to see fewer kangaroo rats overall in any of the exclusion plots than in the control. + +We start by making a plot to examine this + +```{r, include = F} +library(readr) +library(dplyr) +library(ggplot2) +library(stringr) +library(here) # XXX COME BACK TO THIS, SEE https://github.com/carpentries-incubator/R-help-reprexes/issues/61 +rodents <- read_csv(here("scripts/data/surveys_complete_77_89.csv")) +rodents <- rodents %>% + filter(taxa == "Rodent") +krats <- rodents %>% + filter(genus == "Dipodomys") +krats <- krats %>% + mutate(date = lubridate::ymd(paste(year, month, day, sep = "-"))) +krats <- krats %>% + mutate(time_period = ifelse(year < 1988, "early", "late")) +krats <- krats %>% + filter(species != "sp.") +krats_per_day <- krats %>% + group_by(date, year, species) %>% + summarize(n = n()) %>% + group_by(species) +``` + +To start figuring this out, we decide to make a plot of counts per day, per plot type, per species, per year. + +```{r} +counts_per_day <- krats %>% + group_by(year, plot_id, plot_type, month, day, species_id) %>% + summarize(count_per_day = n()) +``` + +Then, we use that information to visualize the distribution of counts per day in the different plot types to see if there's a difference overall. + +```{r} +counts_per_day %>% + ggplot(aes(x = plot_type, y = count_per_day, fill = species_id, group = interaction(plot_type, species_id)))+ + geom_boxplot(outlier.size = 0.5)+ + theme_minimal()+ + labs(title = "Kangaroo rat captures, all years", + x = "Plot type", + y = "Individuals per day", + fill = "Species") +``` + +Interestingly, we don't see a difference in the number of k-rats captured in the different plot types! We expected to catch more k-rats in the control plots (which don't exclude rodents) than in the various rodent exclosure plots, but the rates appear to be about the same. That doesn't bode well for the effectiveness of these experimental plots! + +You're really interested in this result, and you want to talk to your coworker about it. One of your coworkers, Taylor, asks to see the code you used to make this plot so that they can investigate it on their own. You say "No problem, I'll send the code over!" + +You send the following email: + +*Hi Taylor,* +*Here's the code I used to make that plot! I hope it works.* +``` +counts_per_day %>% + ggplot(aes(x = plot_type, y = count_per_day, fill = species_id, group = interaction(plot_type, species_id)))+ + geom_boxplot(outlier.size = 0.5)+ + theme_minimal()+ + labs(title = "Kangaroo rat captures, all years", + x = "Plot type", + y = "Individuals per day", + fill = "Species") +``` + +Unfortunately, Taylor soon writes back. +*Hey Sam,* +*That code didn't run properly for me. Maybe you need to include the data?* + +Of course! The data! That's important. +::: challenge +**Exercise:** On the Etherpad or in your own notes, identify the dataset that you'll need to send to Taylor so they can run your code. What are a few different ways that you could give him the data? What are the advantages or disadvantages to each? +::: +::: solution +XXX INSERT SOLUTION +::: + +You decide that instead of sending Taylor the modified file, you're going to send him more of the code so he can reproduce the entire thing himself. You look back over the code you've written so far. It's kind of messy! +XXX TODO: Add messy comments and other tangents to this script to make it long. +XXX TOOD: How do we show a script that includes line numbers? That will be important--they all need to have the same reference point for this challenge. +``` +# Note: your code might look a little different! That's okay. +library(readr) +library(dplyr) +library(ggplot2) +library(stringr) +library(here) # XXX COME BACK TO THIS, SEE https://github.com/carpentries-incubator/R-help-reprexes/issues/61 +rodents <- read_csv(here("scripts/data/surveys_complete_77_89.csv")) +rodents <- rodents %>% + filter(taxa == "Rodent") +krats <- rodents %>% + filter(genus == "Dipodomys") +krats <- krats %>% + mutate(date = lubridate::ymd(paste(year, month, day, sep = "-"))) +krats <- krats %>% + mutate(time_period = ifelse(year < 1988, "early", "late")) +krats <- krats %>% + filter(species != "sp.") +krats_per_day <- krats %>% + group_by(date, year, species) %>% + summarize(n = n()) %>% + group_by(species) +counts_per_day <- krats %>% + group_by(year, plot_id, plot_type, month, day, species_id) %>% + summarize(count_per_day = n()) +counts_per_day %>% + ggplot(aes(x = plot_type, y = count_per_day, fill = species_id, group = interaction(plot_type, species_id)))+ + geom_boxplot(outlier.size = 0.5)+ + theme_minimal()+ + labs(title = "Kangaroo rat captures, all years", + x = "Plot type", + y = "Individuals per day", + fill = "Species") +``` + +::: challenge +**Excercise:** Which lines of code in the script above does Taylor absolutely NEED to run in order to reproduce your plot? +Bonus: What are some lines of code that Taylor doesn't NEED to run but which might provide him useful extra context? + +(For this challenge, use the line numbers in the script above, even if your script looks slightly different. For extra practice, do the same exercise with your own script! Do you find any different answers?) +::: +::: solution +XXX INSERT SOLUTION, referencing line numbers +::: + +Taylor emails you back again: +*Hi Sam,* +*Thanks, that looks like it will probably work. Do I really have to install all those packages, though? I'm a little worried about running out of space on my computer.* + +You roll your eyes internally, but he's right--probably not all those packages are totally necessary! + +::: challenge +**Excercise:** Email Taylor back and tell him which packages he needs in order to run your code. +::: +::: solution +*Hi Taylor,* +*Good point. Yeah, you don't need all those packages. Some of them were from other parts of the code that I didn't include here, and there are some that I just forgot to remove when I stopped using them! You can just install {dplyr}, {ggplot2}, {here}, and {readr} and the code should run fine!* +*Sam* +::: + diff --git a/episodes/5-minimal-reproducible-code.Rmd b/episodes/5-minimal-reproducible-code.Rmd new file mode 100644 index 00000000..f8d4a851 --- /dev/null +++ b/episodes/5-minimal-reproducible-code.Rmd @@ -0,0 +1,436 @@ +--- +title: "Minimal Reproducible Code" +teaching: 40 +exercises: 35 +--- + +::: questions +- Why can't I just post my whole script? +- Which parts of my code are directly relevant to my problem? +- Which parts of my code are necessary in order for the problem area to run correctly? +- I feel overwhelmed by this script--where do I even start? +::: + +::: objectives +- Explain the value of a minimal code snippet. +- Simplify a script down to a minimal code example. +- Identify the problem area of the code. +- Identify supporting parts of the code that are essential to include. +- Identify pieces of code that can be removed without affecting the central functionality. +- Have a road map to follow to simplify your code. +::: + +You're excited by how much progress you're making in your research. You've made a lot of descriptive plots and gained some interesting insights into your data. Now you're excited to investigate whether the k-rat exclusion plots are actually working. You set to work writing a bunch of code to do this, using a combination of descriptive visualizations and linear models. + +So far, you've been saving all of your analysis in a script called "krat-analysis.R". At this point, it looks something like this: +```{r "krat-analysis.R"} +#| eval: false +#| code-fold: true +# Kangaroo rat analysis using the Portal data +# Created by: Research McResearchface +# Last updated: 2024-11-22 + +# Load packages to use in this script +library(readr) +library(dplyr) +library(ggplot2) +library(stringr) + +# Read in the data +rodents <- read_csv("scripts/data/surveys_complete_77_89.csv") + +### DATA WRANGLING #### +glimpse(rodents) # or click on the environment +str(rodents) # an alternative that does the same thing +head(rodents) # or open fully with View() or click in environment + +table(rodents$taxa) + +# Abundance distribution of taxa +rodents %>% + ggplot(aes(x=taxa))+ + geom_bar() + +# Examine NA values +## How do we find NAs anyway? ---- +head(is.na(rodents$taxa)) # logical--tells us when an observation is an NA (T or F) + +# Not very helpful. BUT +sum(is.na(rodents$taxa)) # sum considers T = 1 and F = 0 + +# Simplify down to just rodents +rodents <- rodents %>% + filter(taxa == "Rodent") +glimpse(rodents) + +# Just kangaroo rats because this is what we are studying +krats <- rodents %>% + filter(genus == "Dipodomys") +dim(krats) # okay, so that's a lot smaller, great. +glimpse(krats) + +# Prep for time analysis +# To examine trends over time, we'll need to create a date column +krats <- krats %>% + mutate(date = lubridate::ymd(paste(year, month, day, sep = "-"))) + +# Examine differences in different time periods +krats <- krats %>% + mutate(time_period = ifelse(year < 1988, "early", "late")) + +# Check that this went through; check for NAs +table(krats$time_period, exclude = NULL) # learned how to do this earlier + +### QUESTION 1: How many k-rats over time in the past? ### +# How many kangaroo rats of each species were found at the study site in past years (so you know what to expect for a sample size this year)? + +# Numbers over time by plot type +krats %>% + ggplot(aes(x = date, fill = plot_type)) + + geom_histogram()+ + facet_wrap(~species)+ + theme_bw()+ + scale_fill_viridis_d(option = "plasma")+ + geom_vline(aes(xintercept = lubridate::ymd("1988-01-01")), col = "dodgerblue") + +# Oops we gotta get rid of the unidentified k-rats +krats <- krats %>% + filter(species != "sp.") + +# Re-do the plot above +krats %>% + ggplot(aes(x = date, fill = plot_type)) + + geom_histogram()+ + facet_wrap(~species)+ + theme_bw()+ + scale_fill_viridis_d(option = "plasma")+ + geom_vline(aes(xintercept = lubridate::ymd("1988-01-01")), col = "dodgerblue") + +# How many individuals caught per day? +krats_per_day <- krats %>% + group_by(date, year, species) %>% + summarize(n = n()) %>% + group_by(species) + +krats_per_day %>% + ggplot(aes(x = species, y = n))+ + geom_boxplot(outlier.shape = NA)+ + geom_jitter(width = 0.2, alpha = 0.2, aes(col = year))+ + theme_classic()+ + ylab("Number per day")+ + xlab("Species") + +#### QUESTION 2: Do the k-rat exclusion plots work? ##### +# Do the k-rat exclusion plots work? (i.e. Does the abundance of each species differ by plot?) +# If the k-rat plots work, then we would expect: +# A. Fewer k-rats overall in any of the exclusion plots than in the control, with the fewest in the long-term k-rat exclusion plot +counts_per_day <- krats %>% + group_by(year, plot_id, plot_type, month, day, species_id) %>% + summarize(count_per_day = n()) + +counts_per_day %>% + ggplot(aes(x = plot_type, y = count_per_day, fill = species_id, group = interaction(plot_type, species_id)))+ + geom_boxplot(outlier.size = 0.5)+ + theme_minimal()+ + labs(title = "Kangaroo rat captures, all years", + x = "Plot type", + y = "Individuals per day", + fill = "Species") + +# B. For Spectabilis-specific exclosure, we expect a lower proportion of spectabilis there than in the other plots. +control_spectab <- krats %>% + filter(plot_type %in% c("Control", "Spectab exclosure")) + +prop_spectab <- control_spectab %>% + group_by(year, plot_type, species_id) %>% + summarize(total_count = n(), .groups = "drop_last") %>% + mutate(prop = total_count/sum(total_count)) %>% + filter(species_id == "DS") # keep only spectabilis + +prop_spectab %>% + ggplot(aes(x = year, y = prop, col = plot_type))+ + geom_point()+ + geom_line()+ + theme_minimal()+ + labs(title = "Spectab exclosures did not reduce proportion of\nspectab captures", + y = "Spectabilis proportion", + x = "Year", + color = "Plot type") + +#### MODELING #### +counts_mod <- lm(count_per_day ~ plot_type + species_id, data = counts_per_day) +summary(counts_mod) + +# with interaction term: +counts_mod_interact <- lm(count_per_day ~ plot_type*species_id, data = counts_per_day) +summary(counts_mod_interact) + +summary(counts_mod) +summary(counts_mod_interact) +``` + +## Why is it important to simplify code? + +Learning how to simplify your code is one of the most important parts of making a minimal reproducible example, asking others for help, and helping yourself. + +::::::::::::::::::::::::::::::::::::::::::: challenge +## Making sense of code + +Reflect on a time when you opened a coding project after a long time away from it. Or maybe you had to look through and try to run someone else's code. + +(If you have easy access to one of your past projects, maybe try opening it now and taking a look through it right now!) + +How do situations like this make you feel? Write some reflections on the Etherpad. + +This exercise should take about 5 minutes. +::::::::::::::::::::::::::::::::::::::::::: + +Debugging is a time when it's common to have to read through long and complex code (either your own or someone else's). That means that the person doing the debugging is likely to experience some of the emotions we just talked about. + +The more we can reduce the negative emotions and make the experience of solving errors easy and painless, the likelier you are to find solutions to your problems (or convince others to take the time to help you). Helpers are doing us a favor--why put barriers in their way? + +Let's illustrate the importance of simplifying our code by focusing on an error in the big long analysis script we created, shown above. Let's imagine we're getting ready to show these preliminary results to our advisor, but when we re-run the whole script, we realize there's a problem. + +[DESCRIPTION OF PROBLEM HERE] + +## A road map for simplifying your code + +In this episode, we're going to walk through a road map for breaking your code down to its simplest form while making sure that 1) it still runs, and 2) it reproduces the problem you care about solving. + +For now, we'll go through this road map step by step. At the end, we'll review the whole thing. One takeaway from this lesson is that there is a step by step process to follow, and you can refer back to it if you feel lost in the future. + +### Step 0. Create a separate script + +When we know there's a problem with our script, it helps to start solving it by examining smaller parts of the code in a separate script, instead of editing the original. + +:::::::::::::::::::::::::::::::::::::challenge +## A separate place for minimal code + +Create a new, blank R script and give it a name, such as "reprex-script.R" + +There are several ways to make an R script +- File > New File > R Script +- Click the white square with a green plus sign at the top left corner of your RStudio window +- Use a keyboard shortcut: Cmd + Shift + N (on a Mac) or Ctrl + Shift + N (on Windows) + +Once you've created the script, click the Save button to name and save it. + +This exercise should take about 2 minutes. +::::::::::::::::::::::::::::::::::::::::::: + +### Step 1. Identify the problem area + +Now that we have a script, let's zero in on what's broken. + +First, we should use some of the techniques we learned in the "Identify the Problem" episode and see if they help us solve our error. + +[MORE CONTENT THAT CALLS BACK TO PL'S EPISODE HERE] + +In this particular case, though, we weren't able to completely resolve our error. + +[WHY? maybe because it's not an error but a case of "the plot isn't returning what we want"? Or maybe it's an extra difficult error message that we can't find an easy answer to? + +I need to figure out what error to introduce into the script in the first place... that will determine the justification to use here.] + +(*Using the plot example for now*) + +Okay, so we know that the plot doesn't look the way we want it to. Which part of the code created that plot? One way to figure this out if we're not sure is to step through the code line by line. + +:::::::::::::::::::::::::::::::::::::callout +## Stepping through code, line by line + +Placing your cursor on a line of code and using the keyboard shortcut Cmd + Enter (Mac) or Ctrl + Enter (Windows) will run that line of code *and* it will automatically advance your cursor to the next line. This makes it easy to "step through" your code without having to click or highlight. +::::::::::::::::::::::::::::::::::::::::::: + +Yay, we found the trouble spot! Let's go ahead and copy that line of code and paste it over into the empty script we created, "reprex-script.R". + +### Step 2. Give context: functions and packages + +R code consists primarily of *variables* and *functions*. + +::::::::::::::::::::::::::::::::::::::::::: challenge +## Where do functions come from? + +When coding in R, we use a lot of different functions. Where do those functions come from? How can we make sure that our helpers have access to those sources? Take a moment to brainstorm. + +This exercise should take about 3 minutes. +::::::::::::::::::::::::::::::::::::::::::: +::: solution +Functions in R typically come from packages. Some packages, such as `{base}` and `{stats}`, are loaded in R by default, so you might not have realized that they are packages too. + +You can see a complete list of functions in `{base}` and `{stats}` by running `library(help = "base")` or `library(help = "stats")`. + +Some functions might be user-defined. In that case, you'll need to make sure to include the function definition in your reprex. +::: + +::::::::::::::::::::::::::::::::::::::::::: callout +## Finding functions + +Sometimes it can be hard to figure out where a function comes from. Especially if a function comes from a package you use frequently, you might not remember where it comes from! + +You can search for a function in the help docs with `??fun` (where "fun" is the name of the function). To explicitly declare which package a function comes from, you can use a double colon `::`--for example, `dplyr::select()`. Declaring the function with a double colon also allows you to use that function even if the package is not loaded, as long as it's installed. +::::::::::::::::::::::::::::::::::::::::::: + +The quickest way to make sure others have access to the functions contained in packages is to include a `library()` call in your reprex, so they know to load the package too. + +::::::::::::::::::::::::::::::::::::::::::: challenge +## Which packages are essential? + +In each of the following code snippets, identify the necessary packages (or other code) to make the example reproducible. + +- [Example (including an ambiguous function: `dplyr::select()` is a good one because it masks `plyr::select()`)] +- [Example where you have to look up which package a function comes from] +- [Example with a user-defined function that doesn't exist in any package] + +This exercise should take about 5 minutes. +::::::::::::::::::::::::::::::::::::::::::: +::: solution +FIXME +::: + +Looking through the problem area that we isolated, we can see that we'll need to load the following packages: FIXME +- `{package}` +- `{package}` +- `{package}` + +Let's go ahead and add those as `library()` calls to the top of our script. + +::::::::::::::::::::::::::::::::::::::::::: callout +## Installing vs. loading packages + +But what if our helper doesn't have all of these packages installed? Won't the code not be reproducible? + +Typically, we don't include `install.packages()` in our code for each of the packages that we include in the `library()` calls, because `install.packages()` is a one-time piece of code that doesn't need to be repeated every time the script is run. We assume that our helper will see `library(specialpackage)` and know that they need to go install "specialpackage" on their own. + +Technically, this makes that part of the code not reproducible! But it's also much more "polite". Our helper might have their own way of managing package versions, and forcing them to install a package when they run our code risks messing up our workflow. It is a common convention to stick with `library()` and let them figure it out from there. +FIXME this feels over-explained... pare it down! +::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::::: callout +## Installing packages conditionally + +There is an alternative approach to installing packages [insert content/example of the if(require()) thing--but note that explaining this properly requires explaining why require() is different from library(), why it returns a logical, etc. and is kind of a rabbit hole that I don't want to go down here.] +::::::::::::::::::::::::::::::::::::::::::: + +### Step 3. Give context: variables and datasets + +Isolating the problem area and loading the necessary packages and functions was an important step to making our example code self-contained. But we're still not done making the code minimal and reproducible. Almost certainly, our code snippet relies on variables, such as datasets, that our helper won't have access to. + +The piece of code that we copied over came from line [LINE NUMBER] of our analysis script. We had done a lot of analyses before then, including modifying datasets and creating intermediate objects/variables. + +Our code snippet depends on all those previous steps, so when we isolate it in a new script, it might not be able to run anymore. More importantly, when a helper doesn't have access to the rest of our script, the code might not run for them either. + +To fix this, we need to provide some additional context around our reprex so that it runs. + +::::::::::::::::::::::::::::::::::::::::::: challenge +## Identifying variables + +For each of the following code snippets, identify all the variables used + +- [Straightforward example] +- [Example where they use a built-in dataset but it contains a column that that dataset doesn't actually contain, i.e. because it's been modified previously. Might be good to use the `date` column that we put into `krats` for this] + +This exercise should take about 5 minutes. +::::::::::::::::::::::::::::::::::::::::::: +::: solution +FIXME +::: + +As you might have noticed, identifying these variables isn't always straightforward. Sometimes variables depend on other variables, and before you know it, you end up needing the entire script. + +Let's work together as a group to sketch out which variables depend on which others. A helpful way to do this is to start with the variables included in our code snippet and ask, for each one, "Where did this come from?" + +[Make a big dependency graph. The point is to illustrate that it gets very long and you can't always rely on this process to identify a simple way to include the needed variables.] + +How can we make sure that helpers can access these objects too, without providing them the entire long script? + +Theoretically, we could meticulously trace each object back and make sure to include the code to create all of its predecessors from the original data, which we would provide to our helper. But pretty soon, we might find that we're just giving the helper the original (long, complicated) script! + +As with other types of writing, creating a good minimal reprex takes hard work and time. + +> "I would have written a shorter letter, but I did not have the time." +> +> - Blaise Pascal, *Lettres Provinciales*, 1657 + + +::::::::::::::::::::::::::::::::::::::::::: callout +## Computational reproducibility + +Every object should be able to map back to either a file, a built-in dataset in R, or another intermediate step. If you found any variables where you weren't able to answer the "Where did this come from?" question, then that's a problem! Did you build a script that mistakenly relied on an object that was in your environment but was never properly defined? + +Mapping exercises like this can be a great way to check whether entire script is reproducible. Reproducibility is important in more cases than just debugging! More and more journals are requiring full analysis code to be posted, and if that code isn't reproducible, it will severely hamper other researchers' efforts to confirm and expand on what you've done. + +Various packages can help you keep track of your code and make it more reproducible. Check out the [`{targets}`](https://books.ropensci.org/targets/) and [`{renv}`](https://rstudio.github.io/renv/articles/renv.html) packages in particular if you're interested in learning more. +::::::::::::::::::::::::::::::::::::::::::: + +Luckily, we can make our lives easier if we realize that helpers don't always need the exact same variables and datasets, just reasonably good stand-ins. Let's think back to the last episode, where we talked about different ways to create minimal reproducible datasets. We can lean on those skills here to make our example reproducible and greatly reduce the amount of code that we need to include. + +:::::::::::::::::::::::::::::::::::::::::::challenge +## Incorporating minimal datasets + +Brainstorm some places in our reprex where you could use minimal reproducible data to make your problem area code snippet reproducible. + +Which of the techniques from the [data episode](LINK TO DATA EPISODE) will you choose in each case, and why? + +This exercise should take about 5 minutes. +::::::::::::::::::::::::::::::::::::::::::: +::: solution +FIXME +::: + +_**Using a minimal dataset simplifies not just your data but also your code, because it lets you avoid including data wrangling steps in your reprex!**_ + +### Step 4. Simplify + +We're almost done! Now we have code that runs because it includes the necessary `library()` calls and makes use of minimal datasets that still allow us to showcase the problem. Our script is almost ready to send to our helpers. + +But reading someone else's code can be slow! We want to make it very, very easy for our helper to see which part of the code is important to focus on. Let's see if there are any places where we can trim code down even more to eliminate distractions. + +Often, analysis code contains exploratory steps or other analyses that don't directly relate to the problem, such as calls to `head()`, `View()`, `str()`, or similar functions. (Exception: if you're using these directly to show things like dimension changes that help to illustrate the problem). + +Some other common examples are exploratory analyses, extra formatting added to plots, and [ANOTHER EXAMPLE]. + +When cutting these things, we have to be careful not to remove anything that would cause the code to no longer reproduce our problem. In general, it's a good idea to comment out the line you think is extraneous, re-run the code, and check that the focal problem persists before removing it entirely. + +:::::::::::::::::::::::::::::::::::::::::::challenge +## Trimming down the bells and whistles + +[Ex: removing various things, observing what happens, identifying whether or not we care about those things. (Need to include at least one that's tricky, like maybe it does change the actual values but it doesn't change their relationship to each other)] + +This exercise should take about 5 minutes. +::::::::::::::::::::::::::::::::::::::::::: +::: solution +FIXME +::: + +Great work! We've created a minimal reproducible example. In the next episode, we'll learn about `{reprex}`, a package that can help us double-check that our example is reproducible by running it in a clean environment. (As an added bonus, `{reprex}` will format our example nicely so it's easy to post to places like Slack, GitHub, and StackOverflow.) + +More on that soon. For now, let's review the road map that we just practiced. + +## Road map review +### Step 0. Create a separate script + - It helps to have a separate place to work on your minimal code snippet. + +### Step 1. Identify the problem area + - Which part of the code is causing the problem? Move it over to the reprex script so we can focus on it. + +### Step 2. Give context: functions and packages + - Make sure that helpers have access to all the functions they'll need to run your code snippet. + +### Step 3. Give context: variables and datasets + - Make sure that helpers have access to all the variables they'll need to run your code snippet, or reasonable stand-ins. + +### Step 4. Simplify + - Remove any extra code that isn't absolutely necessary to demonstrate your problem. + +:::::::::::::::::::::::::::::::::::::::::::challenge +## Reflection + +Let's take a moment to reflect on this process. + +- What's one thing you learned in this episode? An insight; a new skill; a process? + +- What is one thing you're still confused about? What questions do you have? + +This exercise should take about 5 minutes. +::::::::::::::::::::::::::::::::::::::::::: diff --git a/episodes/5-minimal-reproducible-code.md b/episodes/5-minimal-reproducible-code.md deleted file mode 100644 index 790b9638..00000000 --- a/episodes/5-minimal-reproducible-code.md +++ /dev/null @@ -1,19 +0,0 @@ ---- -title: "Minimal Reproducible Code" -teaching: 10 -exercises: 2 ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- To do - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- To do - -:::::::::::::::::::::::::::::::::::::::::::::::: - -Idea: take some of the same exercises from the understanding your code section and use those to get people to break down code into minimal chunks. Emphasize that it's not always going to be linear ( diff --git a/episodes/6-asking-your-question.md b/episodes/6-asking-your-question.md index 77134cf2..b6010c0e 100644 --- a/episodes/6-asking-your-question.md +++ b/episodes/6-asking-your-question.md @@ -6,6 +6,7 @@ exercises: 2 :::::::::::::::::::::::::::::::::::::: questions +- How can I make sure my minimal reproducible example will actually run correctly for someone else? - How can I easily share a reproducible example with a mentor or helper, or online? - How do I ask a good question? @@ -13,13 +14,16 @@ exercises: 2 ::::::::::::::::::::::::::::::::::::: objectives -- To do +- Use the reprex package to support making reproducible examples. +- Use the reprex package to format reprexes for posting online. +- Understand the benefits and drawbacks of different places to get help. +- Have a road map to follow when posting a question to make sure it's a good question. :::::::::::::::::::::::::::::::::::::::::::::::: - ::::::::::::::::::::::::::::::::::::: keypoints - The {reprex} package makes it easy to format and share your reproducible examples. +- Following a certain set of steps will make your questions clearer and likelier to get answered. :::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/renv/activate.R b/renv/activate.R new file mode 100644 index 00000000..0eb51088 --- /dev/null +++ b/renv/activate.R @@ -0,0 +1,1305 @@ + +local({ + + # the requested version of renv + version <- "1.0.11" + attr(version, "sha") <- NULL + + # the project directory + project <- Sys.getenv("RENV_PROJECT") + if (!nzchar(project)) + project <- getwd() + + # use start-up diagnostics if enabled + diagnostics <- Sys.getenv("RENV_STARTUP_DIAGNOSTICS", unset = "FALSE") + if (diagnostics) { + start <- Sys.time() + profile <- tempfile("renv-startup-", fileext = ".Rprof") + utils::Rprof(profile) + on.exit({ + utils::Rprof(NULL) + elapsed <- signif(difftime(Sys.time(), start, units = "auto"), digits = 2L) + writeLines(sprintf("- renv took %s to run the autoloader.", format(elapsed))) + writeLines(sprintf("- Profile: %s", profile)) + print(utils::summaryRprof(profile)) + }, add = TRUE) + } + + # figure out whether the autoloader is enabled + enabled <- local({ + + # first, check config option + override <- getOption("renv.config.autoloader.enabled") + if (!is.null(override)) + return(override) + + # if we're being run in a context where R_LIBS is already set, + # don't load -- presumably we're being run as a sub-process and + # the parent process has already set up library paths for us + rcmd <- Sys.getenv("R_CMD", unset = NA) + rlibs <- Sys.getenv("R_LIBS", unset = NA) + if (!is.na(rlibs) && !is.na(rcmd)) + return(FALSE) + + # next, check environment variables + # TODO: prefer using the configuration one in the future + envvars <- c( + "RENV_CONFIG_AUTOLOADER_ENABLED", + "RENV_AUTOLOADER_ENABLED", + "RENV_ACTIVATE_PROJECT" + ) + + for (envvar in envvars) { + envval <- Sys.getenv(envvar, unset = NA) + if (!is.na(envval)) + return(tolower(envval) %in% c("true", "t", "1")) + } + + # enable by default + TRUE + + }) + + # bail if we're not enabled + if (!enabled) { + + # if we're not enabled, we might still need to manually load + # the user profile here + profile <- Sys.getenv("R_PROFILE_USER", unset = "~/.Rprofile") + if (file.exists(profile)) { + cfg <- Sys.getenv("RENV_CONFIG_USER_PROFILE", unset = "TRUE") + if (tolower(cfg) %in% c("true", "t", "1")) + sys.source(profile, envir = globalenv()) + } + + return(FALSE) + + } + + # avoid recursion + if (identical(getOption("renv.autoloader.running"), TRUE)) { + warning("ignoring recursive attempt to run renv autoloader") + return(invisible(TRUE)) + } + + # signal that we're loading renv during R startup + options(renv.autoloader.running = TRUE) + on.exit(options(renv.autoloader.running = NULL), add = TRUE) + + # signal that we've consented to use renv + options(renv.consent = TRUE) + + # load the 'utils' package eagerly -- this ensures that renv shims, which + # mask 'utils' packages, will come first on the search path + library(utils, lib.loc = .Library) + + # unload renv if it's already been loaded + if ("renv" %in% loadedNamespaces()) + unloadNamespace("renv") + + # load bootstrap tools + ansify <- function(text) { + if (renv_ansify_enabled()) + renv_ansify_enhanced(text) + else + renv_ansify_default(text) + } + + renv_ansify_enabled <- function() { + + override <- Sys.getenv("RENV_ANSIFY_ENABLED", unset = NA) + if (!is.na(override)) + return(as.logical(override)) + + pane <- Sys.getenv("RSTUDIO_CHILD_PROCESS_PANE", unset = NA) + if (identical(pane, "build")) + return(FALSE) + + testthat <- Sys.getenv("TESTTHAT", unset = "false") + if (tolower(testthat) %in% "true") + return(FALSE) + + iderun <- Sys.getenv("R_CLI_HAS_HYPERLINK_IDE_RUN", unset = "false") + if (tolower(iderun) %in% "false") + return(FALSE) + + TRUE + + } + + renv_ansify_default <- function(text) { + text + } + + renv_ansify_enhanced <- function(text) { + + # R help links + pattern <- "`\\?(renv::(?:[^`])+)`" + replacement <- "`\033]8;;ide:help:\\1\a?\\1\033]8;;\a`" + text <- gsub(pattern, replacement, text, perl = TRUE) + + # runnable code + pattern <- "`(renv::(?:[^`])+)`" + replacement <- "`\033]8;;ide:run:\\1\a\\1\033]8;;\a`" + text <- gsub(pattern, replacement, text, perl = TRUE) + + # return ansified text + text + + } + + renv_ansify_init <- function() { + + envir <- renv_envir_self() + if (renv_ansify_enabled()) + assign("ansify", renv_ansify_enhanced, envir = envir) + else + assign("ansify", renv_ansify_default, envir = envir) + + } + + `%||%` <- function(x, y) { + if (is.null(x)) y else x + } + + catf <- function(fmt, ..., appendLF = TRUE) { + + quiet <- getOption("renv.bootstrap.quiet", default = FALSE) + if (quiet) + return(invisible()) + + msg <- sprintf(fmt, ...) + cat(msg, file = stdout(), sep = if (appendLF) "\n" else "") + + invisible(msg) + + } + + header <- function(label, + ..., + prefix = "#", + suffix = "-", + n = min(getOption("width"), 78)) + { + label <- sprintf(label, ...) + n <- max(n - nchar(label) - nchar(prefix) - 2L, 8L) + if (n <= 0) + return(paste(prefix, label)) + + tail <- paste(rep.int(suffix, n), collapse = "") + paste0(prefix, " ", label, " ", tail) + + } + + heredoc <- function(text, leave = 0) { + + # remove leading, trailing whitespace + trimmed <- gsub("^\\s*\\n|\\n\\s*$", "", text) + + # split into lines + lines <- strsplit(trimmed, "\n", fixed = TRUE)[[1L]] + + # compute common indent + indent <- regexpr("[^[:space:]]", lines) + common <- min(setdiff(indent, -1L)) - leave + text <- paste(substring(lines, common), collapse = "\n") + + # substitute in ANSI links for executable renv code + ansify(text) + + } + + startswith <- function(string, prefix) { + substring(string, 1, nchar(prefix)) == prefix + } + + bootstrap <- function(version, library) { + + friendly <- renv_bootstrap_version_friendly(version) + section <- header(sprintf("Bootstrapping renv %s", friendly)) + catf(section) + + # attempt to download renv + catf("- Downloading renv ... ", appendLF = FALSE) + withCallingHandlers( + tarball <- renv_bootstrap_download(version), + error = function(err) { + catf("FAILED") + stop("failed to download:\n", conditionMessage(err)) + } + ) + catf("OK") + on.exit(unlink(tarball), add = TRUE) + + # now attempt to install + catf("- Installing renv ... ", appendLF = FALSE) + withCallingHandlers( + status <- renv_bootstrap_install(version, tarball, library), + error = function(err) { + catf("FAILED") + stop("failed to install:\n", conditionMessage(err)) + } + ) + catf("OK") + + # add empty line to break up bootstrapping from normal output + catf("") + + return(invisible()) + } + + renv_bootstrap_tests_running <- function() { + getOption("renv.tests.running", default = FALSE) + } + + renv_bootstrap_repos <- function() { + + # get CRAN repository + cran <- getOption("renv.repos.cran", "https://cloud.r-project.org") + + # check for repos override + repos <- Sys.getenv("RENV_CONFIG_REPOS_OVERRIDE", unset = NA) + if (!is.na(repos)) { + + # check for RSPM; if set, use a fallback repository for renv + rspm <- Sys.getenv("RSPM", unset = NA) + if (identical(rspm, repos)) + repos <- c(RSPM = rspm, CRAN = cran) + + return(repos) + + } + + # check for lockfile repositories + repos <- tryCatch(renv_bootstrap_repos_lockfile(), error = identity) + if (!inherits(repos, "error") && length(repos)) + return(repos) + + # retrieve current repos + repos <- getOption("repos") + + # ensure @CRAN@ entries are resolved + repos[repos == "@CRAN@"] <- cran + + # add in renv.bootstrap.repos if set + default <- c(FALLBACK = "https://cloud.r-project.org") + extra <- getOption("renv.bootstrap.repos", default = default) + repos <- c(repos, extra) + + # remove duplicates that might've snuck in + dupes <- duplicated(repos) | duplicated(names(repos)) + repos[!dupes] + + } + + renv_bootstrap_repos_lockfile <- function() { + + lockpath <- Sys.getenv("RENV_PATHS_LOCKFILE", unset = "renv.lock") + if (!file.exists(lockpath)) + return(NULL) + + lockfile <- tryCatch(renv_json_read(lockpath), error = identity) + if (inherits(lockfile, "error")) { + warning(lockfile) + return(NULL) + } + + repos <- lockfile$R$Repositories + if (length(repos) == 0) + return(NULL) + + keys <- vapply(repos, `[[`, "Name", FUN.VALUE = character(1)) + vals <- vapply(repos, `[[`, "URL", FUN.VALUE = character(1)) + names(vals) <- keys + + return(vals) + + } + + renv_bootstrap_download <- function(version) { + + sha <- attr(version, "sha", exact = TRUE) + + methods <- if (!is.null(sha)) { + + # attempting to bootstrap a development version of renv + c( + function() renv_bootstrap_download_tarball(sha), + function() renv_bootstrap_download_github(sha) + ) + + } else { + + # attempting to bootstrap a release version of renv + c( + function() renv_bootstrap_download_tarball(version), + function() renv_bootstrap_download_cran_latest(version), + function() renv_bootstrap_download_cran_archive(version) + ) + + } + + for (method in methods) { + path <- tryCatch(method(), error = identity) + if (is.character(path) && file.exists(path)) + return(path) + } + + stop("All download methods failed") + + } + + renv_bootstrap_download_impl <- function(url, destfile) { + + mode <- "wb" + + # https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17715 + fixup <- + Sys.info()[["sysname"]] == "Windows" && + substring(url, 1L, 5L) == "file:" + + if (fixup) + mode <- "w+b" + + args <- list( + url = url, + destfile = destfile, + mode = mode, + quiet = TRUE + ) + + if ("headers" %in% names(formals(utils::download.file))) { + headers <- renv_bootstrap_download_custom_headers(url) + if (length(headers) && is.character(headers)) + args$headers <- headers + } + + do.call(utils::download.file, args) + + } + + renv_bootstrap_download_custom_headers <- function(url) { + + headers <- getOption("renv.download.headers") + if (is.null(headers)) + return(character()) + + if (!is.function(headers)) + stopf("'renv.download.headers' is not a function") + + headers <- headers(url) + if (length(headers) == 0L) + return(character()) + + if (is.list(headers)) + headers <- unlist(headers, recursive = FALSE, use.names = TRUE) + + ok <- + is.character(headers) && + is.character(names(headers)) && + all(nzchar(names(headers))) + + if (!ok) + stop("invocation of 'renv.download.headers' did not return a named character vector") + + headers + + } + + renv_bootstrap_download_cran_latest <- function(version) { + + spec <- renv_bootstrap_download_cran_latest_find(version) + type <- spec$type + repos <- spec$repos + + baseurl <- utils::contrib.url(repos = repos, type = type) + ext <- if (identical(type, "source")) + ".tar.gz" + else if (Sys.info()[["sysname"]] == "Windows") + ".zip" + else + ".tgz" + name <- sprintf("renv_%s%s", version, ext) + url <- paste(baseurl, name, sep = "/") + + destfile <- file.path(tempdir(), name) + status <- tryCatch( + renv_bootstrap_download_impl(url, destfile), + condition = identity + ) + + if (inherits(status, "condition")) + return(FALSE) + + # report success and return + destfile + + } + + renv_bootstrap_download_cran_latest_find <- function(version) { + + # check whether binaries are supported on this system + binary <- + getOption("renv.bootstrap.binary", default = TRUE) && + !identical(.Platform$pkgType, "source") && + !identical(getOption("pkgType"), "source") && + Sys.info()[["sysname"]] %in% c("Darwin", "Windows") + + types <- c(if (binary) "binary", "source") + + # iterate over types + repositories + for (type in types) { + for (repos in renv_bootstrap_repos()) { + + # build arguments for utils::available.packages() call + args <- list(type = type, repos = repos) + + # add custom headers if available -- note that + # utils::available.packages() will pass this to download.file() + if ("headers" %in% names(formals(utils::download.file))) { + headers <- renv_bootstrap_download_custom_headers(repos) + if (length(headers) && is.character(headers)) + args$headers <- headers + } + + # retrieve package database + db <- tryCatch( + as.data.frame( + do.call(utils::available.packages, args), + stringsAsFactors = FALSE + ), + error = identity + ) + + if (inherits(db, "error")) + next + + # check for compatible entry + entry <- db[db$Package %in% "renv" & db$Version %in% version, ] + if (nrow(entry) == 0) + next + + # found it; return spec to caller + spec <- list(entry = entry, type = type, repos = repos) + return(spec) + + } + } + + # if we got here, we failed to find renv + fmt <- "renv %s is not available from your declared package repositories" + stop(sprintf(fmt, version)) + + } + + renv_bootstrap_download_cran_archive <- function(version) { + + name <- sprintf("renv_%s.tar.gz", version) + repos <- renv_bootstrap_repos() + urls <- file.path(repos, "src/contrib/Archive/renv", name) + destfile <- file.path(tempdir(), name) + + for (url in urls) { + + status <- tryCatch( + renv_bootstrap_download_impl(url, destfile), + condition = identity + ) + + if (identical(status, 0L)) + return(destfile) + + } + + return(FALSE) + + } + + renv_bootstrap_download_tarball <- function(version) { + + # if the user has provided the path to a tarball via + # an environment variable, then use it + tarball <- Sys.getenv("RENV_BOOTSTRAP_TARBALL", unset = NA) + if (is.na(tarball)) + return() + + # allow directories + if (dir.exists(tarball)) { + name <- sprintf("renv_%s.tar.gz", version) + tarball <- file.path(tarball, name) + } + + # bail if it doesn't exist + if (!file.exists(tarball)) { + + # let the user know we weren't able to honour their request + fmt <- "- RENV_BOOTSTRAP_TARBALL is set (%s) but does not exist." + msg <- sprintf(fmt, tarball) + warning(msg) + + # bail + return() + + } + + catf("- Using local tarball '%s'.", tarball) + tarball + + } + + renv_bootstrap_github_token <- function() { + for (envvar in c("GITHUB_TOKEN", "GITHUB_PAT", "GH_TOKEN")) { + envval <- Sys.getenv(envvar, unset = NA) + if (!is.na(envval)) + return(envval) + } + } + + renv_bootstrap_download_github <- function(version) { + + enabled <- Sys.getenv("RENV_BOOTSTRAP_FROM_GITHUB", unset = "TRUE") + if (!identical(enabled, "TRUE")) + return(FALSE) + + # prepare download options + token <- renv_bootstrap_github_token() + if (nzchar(Sys.which("curl")) && nzchar(token)) { + fmt <- "--location --fail --header \"Authorization: token %s\"" + extra <- sprintf(fmt, token) + saved <- options("download.file.method", "download.file.extra") + options(download.file.method = "curl", download.file.extra = extra) + on.exit(do.call(base::options, saved), add = TRUE) + } else if (nzchar(Sys.which("wget")) && nzchar(token)) { + fmt <- "--header=\"Authorization: token %s\"" + extra <- sprintf(fmt, token) + saved <- options("download.file.method", "download.file.extra") + options(download.file.method = "wget", download.file.extra = extra) + on.exit(do.call(base::options, saved), add = TRUE) + } + + url <- file.path("https://api.github.com/repos/rstudio/renv/tarball", version) + name <- sprintf("renv_%s.tar.gz", version) + destfile <- file.path(tempdir(), name) + + status <- tryCatch( + renv_bootstrap_download_impl(url, destfile), + condition = identity + ) + + if (!identical(status, 0L)) + return(FALSE) + + renv_bootstrap_download_augment(destfile) + + return(destfile) + + } + + # Add Sha to DESCRIPTION. This is stop gap until #890, after which we + # can use renv::install() to fully capture metadata. + renv_bootstrap_download_augment <- function(destfile) { + sha <- renv_bootstrap_git_extract_sha1_tar(destfile) + if (is.null(sha)) { + return() + } + + # Untar + tempdir <- tempfile("renv-github-") + on.exit(unlink(tempdir, recursive = TRUE), add = TRUE) + untar(destfile, exdir = tempdir) + pkgdir <- dir(tempdir, full.names = TRUE)[[1]] + + # Modify description + desc_path <- file.path(pkgdir, "DESCRIPTION") + desc_lines <- readLines(desc_path) + remotes_fields <- c( + "RemoteType: github", + "RemoteHost: api.github.com", + "RemoteRepo: renv", + "RemoteUsername: rstudio", + "RemotePkgRef: rstudio/renv", + paste("RemoteRef: ", sha), + paste("RemoteSha: ", sha) + ) + writeLines(c(desc_lines[desc_lines != ""], remotes_fields), con = desc_path) + + # Re-tar + local({ + old <- setwd(tempdir) + on.exit(setwd(old), add = TRUE) + + tar(destfile, compression = "gzip") + }) + invisible() + } + + # Extract the commit hash from a git archive. Git archives include the SHA1 + # hash as the comment field of the tarball pax extended header + # (see https://www.kernel.org/pub/software/scm/git/docs/git-archive.html) + # For GitHub archives this should be the first header after the default one + # (512 byte) header. + renv_bootstrap_git_extract_sha1_tar <- function(bundle) { + + # open the bundle for reading + # We use gzcon for everything because (from ?gzcon) + # > Reading from a connection which does not supply a 'gzip' magic + # > header is equivalent to reading from the original connection + conn <- gzcon(file(bundle, open = "rb", raw = TRUE)) + on.exit(close(conn)) + + # The default pax header is 512 bytes long and the first pax extended header + # with the comment should be 51 bytes long + # `52 comment=` (11 chars) + 40 byte SHA1 hash + len <- 0x200 + 0x33 + res <- rawToChar(readBin(conn, "raw", n = len)[0x201:len]) + + if (grepl("^52 comment=", res)) { + sub("52 comment=", "", res) + } else { + NULL + } + } + + renv_bootstrap_install <- function(version, tarball, library) { + + # attempt to install it into project library + dir.create(library, showWarnings = FALSE, recursive = TRUE) + output <- renv_bootstrap_install_impl(library, tarball) + + # check for successful install + status <- attr(output, "status") + if (is.null(status) || identical(status, 0L)) + return(status) + + # an error occurred; report it + header <- "installation of renv failed" + lines <- paste(rep.int("=", nchar(header)), collapse = "") + text <- paste(c(header, lines, output), collapse = "\n") + stop(text) + + } + + renv_bootstrap_install_impl <- function(library, tarball) { + + # invoke using system2 so we can capture and report output + bin <- R.home("bin") + exe <- if (Sys.info()[["sysname"]] == "Windows") "R.exe" else "R" + R <- file.path(bin, exe) + + args <- c( + "--vanilla", "CMD", "INSTALL", "--no-multiarch", + "-l", shQuote(path.expand(library)), + shQuote(path.expand(tarball)) + ) + + system2(R, args, stdout = TRUE, stderr = TRUE) + + } + + renv_bootstrap_platform_prefix <- function() { + + # construct version prefix + version <- paste(R.version$major, R.version$minor, sep = ".") + prefix <- paste("R", numeric_version(version)[1, 1:2], sep = "-") + + # include SVN revision for development versions of R + # (to avoid sharing platform-specific artefacts with released versions of R) + devel <- + identical(R.version[["status"]], "Under development (unstable)") || + identical(R.version[["nickname"]], "Unsuffered Consequences") + + if (devel) + prefix <- paste(prefix, R.version[["svn rev"]], sep = "-r") + + # build list of path components + components <- c(prefix, R.version$platform) + + # include prefix if provided by user + prefix <- renv_bootstrap_platform_prefix_impl() + if (!is.na(prefix) && nzchar(prefix)) + components <- c(prefix, components) + + # build prefix + paste(components, collapse = "/") + + } + + renv_bootstrap_platform_prefix_impl <- function() { + + # if an explicit prefix has been supplied, use it + prefix <- Sys.getenv("RENV_PATHS_PREFIX", unset = NA) + if (!is.na(prefix)) + return(prefix) + + # if the user has requested an automatic prefix, generate it + auto <- Sys.getenv("RENV_PATHS_PREFIX_AUTO", unset = NA) + if (is.na(auto) && getRversion() >= "4.4.0") + auto <- "TRUE" + + if (auto %in% c("TRUE", "True", "true", "1")) + return(renv_bootstrap_platform_prefix_auto()) + + # empty string on failure + "" + + } + + renv_bootstrap_platform_prefix_auto <- function() { + + prefix <- tryCatch(renv_bootstrap_platform_os(), error = identity) + if (inherits(prefix, "error") || prefix %in% "unknown") { + + msg <- paste( + "failed to infer current operating system", + "please file a bug report at https://github.com/rstudio/renv/issues", + sep = "; " + ) + + warning(msg) + + } + + prefix + + } + + renv_bootstrap_platform_os <- function() { + + sysinfo <- Sys.info() + sysname <- sysinfo[["sysname"]] + + # handle Windows + macOS up front + if (sysname == "Windows") + return("windows") + else if (sysname == "Darwin") + return("macos") + + # check for os-release files + for (file in c("/etc/os-release", "/usr/lib/os-release")) + if (file.exists(file)) + return(renv_bootstrap_platform_os_via_os_release(file, sysinfo)) + + # check for redhat-release files + if (file.exists("/etc/redhat-release")) + return(renv_bootstrap_platform_os_via_redhat_release()) + + "unknown" + + } + + renv_bootstrap_platform_os_via_os_release <- function(file, sysinfo) { + + # read /etc/os-release + release <- utils::read.table( + file = file, + sep = "=", + quote = c("\"", "'"), + col.names = c("Key", "Value"), + comment.char = "#", + stringsAsFactors = FALSE + ) + + vars <- as.list(release$Value) + names(vars) <- release$Key + + # get os name + os <- tolower(sysinfo[["sysname"]]) + + # read id + id <- "unknown" + for (field in c("ID", "ID_LIKE")) { + if (field %in% names(vars) && nzchar(vars[[field]])) { + id <- vars[[field]] + break + } + } + + # read version + version <- "unknown" + for (field in c("UBUNTU_CODENAME", "VERSION_CODENAME", "VERSION_ID", "BUILD_ID")) { + if (field %in% names(vars) && nzchar(vars[[field]])) { + version <- vars[[field]] + break + } + } + + # join together + paste(c(os, id, version), collapse = "-") + + } + + renv_bootstrap_platform_os_via_redhat_release <- function() { + + # read /etc/redhat-release + contents <- readLines("/etc/redhat-release", warn = FALSE) + + # infer id + id <- if (grepl("centos", contents, ignore.case = TRUE)) + "centos" + else if (grepl("redhat", contents, ignore.case = TRUE)) + "redhat" + else + "unknown" + + # try to find a version component (very hacky) + version <- "unknown" + + parts <- strsplit(contents, "[[:space:]]")[[1L]] + for (part in parts) { + + nv <- tryCatch(numeric_version(part), error = identity) + if (inherits(nv, "error")) + next + + version <- nv[1, 1] + break + + } + + paste(c("linux", id, version), collapse = "-") + + } + + renv_bootstrap_library_root_name <- function(project) { + + # use project name as-is if requested + asis <- Sys.getenv("RENV_PATHS_LIBRARY_ROOT_ASIS", unset = "FALSE") + if (asis) + return(basename(project)) + + # otherwise, disambiguate based on project's path + id <- substring(renv_bootstrap_hash_text(project), 1L, 8L) + paste(basename(project), id, sep = "-") + + } + + renv_bootstrap_library_root <- function(project) { + + prefix <- renv_bootstrap_profile_prefix() + + path <- Sys.getenv("RENV_PATHS_LIBRARY", unset = NA) + if (!is.na(path)) + return(paste(c(path, prefix), collapse = "/")) + + path <- renv_bootstrap_library_root_impl(project) + if (!is.null(path)) { + name <- renv_bootstrap_library_root_name(project) + return(paste(c(path, prefix, name), collapse = "/")) + } + + renv_bootstrap_paths_renv("library", project = project) + + } + + renv_bootstrap_library_root_impl <- function(project) { + + root <- Sys.getenv("RENV_PATHS_LIBRARY_ROOT", unset = NA) + if (!is.na(root)) + return(root) + + type <- renv_bootstrap_project_type(project) + if (identical(type, "package")) { + userdir <- renv_bootstrap_user_dir() + return(file.path(userdir, "library")) + } + + } + + renv_bootstrap_validate_version <- function(version, description = NULL) { + + # resolve description file + # + # avoid passing lib.loc to `packageDescription()` below, since R will + # use the loaded version of the package by default anyhow. note that + # this function should only be called after 'renv' is loaded + # https://github.com/rstudio/renv/issues/1625 + description <- description %||% packageDescription("renv") + + # check whether requested version 'version' matches loaded version of renv + sha <- attr(version, "sha", exact = TRUE) + valid <- if (!is.null(sha)) + renv_bootstrap_validate_version_dev(sha, description) + else + renv_bootstrap_validate_version_release(version, description) + + if (valid) + return(TRUE) + + # the loaded version of renv doesn't match the requested version; + # give the user instructions on how to proceed + dev <- identical(description[["RemoteType"]], "github") + remote <- if (dev) + paste("rstudio/renv", description[["RemoteSha"]], sep = "@") + else + paste("renv", description[["Version"]], sep = "@") + + # display both loaded version + sha if available + friendly <- renv_bootstrap_version_friendly( + version = description[["Version"]], + sha = if (dev) description[["RemoteSha"]] + ) + + fmt <- heredoc(" + renv %1$s was loaded from project library, but this project is configured to use renv %2$s. + - Use `renv::record(\"%3$s\")` to record renv %1$s in the lockfile. + - Use `renv::restore(packages = \"renv\")` to install renv %2$s into the project library. + ") + catf(fmt, friendly, renv_bootstrap_version_friendly(version), remote) + + FALSE + + } + + renv_bootstrap_validate_version_dev <- function(version, description) { + expected <- description[["RemoteSha"]] + is.character(expected) && startswith(expected, version) + } + + renv_bootstrap_validate_version_release <- function(version, description) { + expected <- description[["Version"]] + is.character(expected) && identical(expected, version) + } + + renv_bootstrap_hash_text <- function(text) { + + hashfile <- tempfile("renv-hash-") + on.exit(unlink(hashfile), add = TRUE) + + writeLines(text, con = hashfile) + tools::md5sum(hashfile) + + } + + renv_bootstrap_load <- function(project, libpath, version) { + + # try to load renv from the project library + if (!requireNamespace("renv", lib.loc = libpath, quietly = TRUE)) + return(FALSE) + + # warn if the version of renv loaded does not match + renv_bootstrap_validate_version(version) + + # execute renv load hooks, if any + hooks <- getHook("renv::autoload") + for (hook in hooks) + if (is.function(hook)) + tryCatch(hook(), error = warnify) + + # load the project + renv::load(project) + + TRUE + + } + + renv_bootstrap_profile_load <- function(project) { + + # if RENV_PROFILE is already set, just use that + profile <- Sys.getenv("RENV_PROFILE", unset = NA) + if (!is.na(profile) && nzchar(profile)) + return(profile) + + # check for a profile file (nothing to do if it doesn't exist) + path <- renv_bootstrap_paths_renv("profile", profile = FALSE, project = project) + if (!file.exists(path)) + return(NULL) + + # read the profile, and set it if it exists + contents <- readLines(path, warn = FALSE) + if (length(contents) == 0L) + return(NULL) + + # set RENV_PROFILE + profile <- contents[[1L]] + if (!profile %in% c("", "default")) + Sys.setenv(RENV_PROFILE = profile) + + profile + + } + + renv_bootstrap_profile_prefix <- function() { + profile <- renv_bootstrap_profile_get() + if (!is.null(profile)) + return(file.path("profiles", profile, "renv")) + } + + renv_bootstrap_profile_get <- function() { + profile <- Sys.getenv("RENV_PROFILE", unset = "") + renv_bootstrap_profile_normalize(profile) + } + + renv_bootstrap_profile_set <- function(profile) { + profile <- renv_bootstrap_profile_normalize(profile) + if (is.null(profile)) + Sys.unsetenv("RENV_PROFILE") + else + Sys.setenv(RENV_PROFILE = profile) + } + + renv_bootstrap_profile_normalize <- function(profile) { + + if (is.null(profile) || profile %in% c("", "default")) + return(NULL) + + profile + + } + + renv_bootstrap_path_absolute <- function(path) { + + substr(path, 1L, 1L) %in% c("~", "/", "\\") || ( + substr(path, 1L, 1L) %in% c(letters, LETTERS) && + substr(path, 2L, 3L) %in% c(":/", ":\\") + ) + + } + + renv_bootstrap_paths_renv <- function(..., profile = TRUE, project = NULL) { + renv <- Sys.getenv("RENV_PATHS_RENV", unset = "renv") + root <- if (renv_bootstrap_path_absolute(renv)) NULL else project + prefix <- if (profile) renv_bootstrap_profile_prefix() + components <- c(root, renv, prefix, ...) + paste(components, collapse = "/") + } + + renv_bootstrap_project_type <- function(path) { + + descpath <- file.path(path, "DESCRIPTION") + if (!file.exists(descpath)) + return("unknown") + + desc <- tryCatch( + read.dcf(descpath, all = TRUE), + error = identity + ) + + if (inherits(desc, "error")) + return("unknown") + + type <- desc$Type + if (!is.null(type)) + return(tolower(type)) + + package <- desc$Package + if (!is.null(package)) + return("package") + + "unknown" + + } + + renv_bootstrap_user_dir <- function() { + dir <- renv_bootstrap_user_dir_impl() + path.expand(chartr("\\", "/", dir)) + } + + renv_bootstrap_user_dir_impl <- function() { + + # use local override if set + override <- getOption("renv.userdir.override") + if (!is.null(override)) + return(override) + + # use R_user_dir if available + tools <- asNamespace("tools") + if (is.function(tools$R_user_dir)) + return(tools$R_user_dir("renv", "cache")) + + # try using our own backfill for older versions of R + envvars <- c("R_USER_CACHE_DIR", "XDG_CACHE_HOME") + for (envvar in envvars) { + root <- Sys.getenv(envvar, unset = NA) + if (!is.na(root)) + return(file.path(root, "R/renv")) + } + + # use platform-specific default fallbacks + if (Sys.info()[["sysname"]] == "Windows") + file.path(Sys.getenv("LOCALAPPDATA"), "R/cache/R/renv") + else if (Sys.info()[["sysname"]] == "Darwin") + "~/Library/Caches/org.R-project.R/R/renv" + else + "~/.cache/R/renv" + + } + + renv_bootstrap_version_friendly <- function(version, shafmt = NULL, sha = NULL) { + sha <- sha %||% attr(version, "sha", exact = TRUE) + parts <- c(version, sprintf(shafmt %||% " [sha: %s]", substring(sha, 1L, 7L))) + paste(parts, collapse = "") + } + + renv_bootstrap_exec <- function(project, libpath, version) { + if (!renv_bootstrap_load(project, libpath, version)) + renv_bootstrap_run(version, libpath) + } + + renv_bootstrap_run <- function(version, libpath) { + + # perform bootstrap + bootstrap(version, libpath) + + # exit early if we're just testing bootstrap + if (!is.na(Sys.getenv("RENV_BOOTSTRAP_INSTALL_ONLY", unset = NA))) + return(TRUE) + + # try again to load + if (requireNamespace("renv", lib.loc = libpath, quietly = TRUE)) { + return(renv::load(project = getwd())) + } + + # failed to download or load renv; warn the user + msg <- c( + "Failed to find an renv installation: the project will not be loaded.", + "Use `renv::activate()` to re-initialize the project." + ) + + warning(paste(msg, collapse = "\n"), call. = FALSE) + + } + + renv_json_read <- function(file = NULL, text = NULL) { + + jlerr <- NULL + + # if jsonlite is loaded, use that instead + if ("jsonlite" %in% loadedNamespaces()) { + + json <- tryCatch(renv_json_read_jsonlite(file, text), error = identity) + if (!inherits(json, "error")) + return(json) + + jlerr <- json + + } + + # otherwise, fall back to the default JSON reader + json <- tryCatch(renv_json_read_default(file, text), error = identity) + if (!inherits(json, "error")) + return(json) + + # report an error + if (!is.null(jlerr)) + stop(jlerr) + else + stop(json) + + } + + renv_json_read_jsonlite <- function(file = NULL, text = NULL) { + text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n") + jsonlite::fromJSON(txt = text, simplifyVector = FALSE) + } + + renv_json_read_default <- function(file = NULL, text = NULL) { + + # find strings in the JSON + text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n") + pattern <- '["](?:(?:\\\\.)|(?:[^"\\\\]))*?["]' + locs <- gregexpr(pattern, text, perl = TRUE)[[1]] + + # if any are found, replace them with placeholders + replaced <- text + strings <- character() + replacements <- character() + + if (!identical(c(locs), -1L)) { + + # get the string values + starts <- locs + ends <- locs + attr(locs, "match.length") - 1L + strings <- substring(text, starts, ends) + + # only keep those requiring escaping + strings <- grep("[[\\]{}:]", strings, perl = TRUE, value = TRUE) + + # compute replacements + replacements <- sprintf('"\032%i\032"', seq_along(strings)) + + # replace the strings + mapply(function(string, replacement) { + replaced <<- sub(string, replacement, replaced, fixed = TRUE) + }, strings, replacements) + + } + + # transform the JSON into something the R parser understands + transformed <- replaced + transformed <- gsub("{}", "`names<-`(list(), character())", transformed, fixed = TRUE) + transformed <- gsub("[[{]", "list(", transformed, perl = TRUE) + transformed <- gsub("[]}]", ")", transformed, perl = TRUE) + transformed <- gsub(":", "=", transformed, fixed = TRUE) + text <- paste(transformed, collapse = "\n") + + # parse it + json <- parse(text = text, keep.source = FALSE, srcfile = NULL)[[1L]] + + # construct map between source strings, replaced strings + map <- as.character(parse(text = strings)) + names(map) <- as.character(parse(text = replacements)) + + # convert to list + map <- as.list(map) + + # remap strings in object + remapped <- renv_json_read_remap(json, map) + + # evaluate + eval(remapped, envir = baseenv()) + + } + + renv_json_read_remap <- function(json, map) { + + # fix names + if (!is.null(names(json))) { + lhs <- match(names(json), names(map), nomatch = 0L) + rhs <- match(names(map), names(json), nomatch = 0L) + names(json)[rhs] <- map[lhs] + } + + # fix values + if (is.character(json)) + return(map[[json]] %||% json) + + # handle true, false, null + if (is.name(json)) { + text <- as.character(json) + if (text == "true") + return(TRUE) + else if (text == "false") + return(FALSE) + else if (text == "null") + return(NULL) + } + + # recurse + if (is.recursive(json)) { + for (i in seq_along(json)) { + json[i] <- list(renv_json_read_remap(json[[i]], map)) + } + } + + json + + } + + # load the renv profile, if any + renv_bootstrap_profile_load(project) + + # construct path to library root + root <- renv_bootstrap_library_root(project) + + # construct library prefix for platform + prefix <- renv_bootstrap_platform_prefix() + + # construct full libpath + libpath <- file.path(root, prefix) + + # run bootstrap code + renv_bootstrap_exec(project, libpath, version) + + invisible() + +}) diff --git a/renv/profile b/renv/profile new file mode 100644 index 00000000..6d4023b5 --- /dev/null +++ b/renv/profile @@ -0,0 +1 @@ +lesson-requirements diff --git a/renv/profiles/lesson-requirements/renv.lock b/renv/profiles/lesson-requirements/renv.lock new file mode 100644 index 00000000..a6eb81ad --- /dev/null +++ b/renv/profiles/lesson-requirements/renv.lock @@ -0,0 +1,1092 @@ +{ + "R": { + "Version": "4.4.2", + "Repositories": [ + { + "Name": "carpentries", + "URL": "https://carpentries.r-universe.dev" + }, + { + "Name": "carpentries_archive", + "URL": "https://carpentries.github.io/drat" + }, + { + "Name": "CRAN", + "URL": "https://cran.rstudio.com" + } + ] + }, + "Packages": { + "MASS": { + "Package": "MASS", + "Version": "7.3-61", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics", + "methods", + "stats", + "utils" + ], + "Hash": "0cafd6f0500e5deba33be22c46bf6055" + }, + "Matrix": { + "Package": "Matrix", + "Version": "1.7-1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics", + "grid", + "lattice", + "methods", + "stats", + "utils" + ], + "Hash": "5122bb14d8736372411f955e1b16bc8a" + }, + "R6": { + "Package": "R6", + "Version": "2.5.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "470851b6d5d0ac559e9d01bb352b4021" + }, + "RColorBrewer": { + "Package": "RColorBrewer", + "Version": "1.1-3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "45f0398006e83a5b10b72a90663d8d8c" + }, + "Rcpp": { + "Package": "Rcpp", + "Version": "1.0.13-1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "methods", + "utils" + ], + "Hash": "6b868847b365672d6c1677b1608da9ed" + }, + "RcppEigen": { + "Package": "RcppEigen", + "Version": "0.3.4.0.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "Rcpp", + "stats", + "utils" + ], + "Hash": "4ac8e423216b8b70cb9653d1b3f71eb9" + }, + "base64enc": { + "Package": "base64enc", + "Version": "0.1-3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "543776ae6848fde2f48ff3816d0628bc" + }, + "bit": { + "Package": "bit", + "Version": "4.5.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "5dc7b2677d65d0e874fc4aaf0e879987" + }, + "bit64": { + "Package": "bit64", + "Version": "4.5.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "bit", + "methods", + "stats", + "utils" + ], + "Hash": "e84984bf5f12a18628d9a02322128dfd" + }, + "boot": { + "Package": "boot", + "Version": "1.3-31", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "graphics", + "stats" + ], + "Hash": "de2a4646c18661d6a0a08ec67f40b7ed" + }, + "bslib": { + "Package": "bslib", + "Version": "0.8.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "base64enc", + "cachem", + "fastmap", + "grDevices", + "htmltools", + "jquerylib", + "jsonlite", + "lifecycle", + "memoise", + "mime", + "rlang", + "sass" + ], + "Hash": "b299c6741ca9746fb227debcb0f9fb6c" + }, + "cachem": { + "Package": "cachem", + "Version": "1.1.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "fastmap", + "rlang" + ], + "Hash": "cd9a672193789068eb5a2aad65a0dedf" + }, + "callr": { + "Package": "callr", + "Version": "3.7.6", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "processx", + "utils" + ], + "Hash": "d7e13f49c19103ece9e58ad2d83a7354" + }, + "cli": { + "Package": "cli", + "Version": "3.6.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "utils" + ], + "Hash": "b21916dd77a27642b447374a5d30ecf3" + }, + "clipr": { + "Package": "clipr", + "Version": "0.8.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "utils" + ], + "Hash": "3f038e5ac7f41d4ac41ce658c85e3042" + }, + "colorspace": { + "Package": "colorspace", + "Version": "2.1-1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics", + "methods", + "stats" + ], + "Hash": "d954cb1c57e8d8b756165d7ba18aa55a" + }, + "cpp11": { + "Package": "cpp11", + "Version": "0.5.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "91570bba75d0c9d3f1040c835cee8fba" + }, + "crayon": { + "Package": "crayon", + "Version": "1.5.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "grDevices", + "methods", + "utils" + ], + "Hash": "859d96e65ef198fd43e82b9628d593ef" + }, + "digest": { + "Package": "digest", + "Version": "0.6.37", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "utils" + ], + "Hash": "33698c4b3127fc9f506654607fb73676" + }, + "dplyr": { + "Package": "dplyr", + "Version": "1.1.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "cli", + "generics", + "glue", + "lifecycle", + "magrittr", + "methods", + "pillar", + "rlang", + "tibble", + "tidyselect", + "utils", + "vctrs" + ], + "Hash": "fedd9d00c2944ff00a0e2696ccf048ec" + }, + "evaluate": { + "Package": "evaluate", + "Version": "1.0.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "3fd29944b231036ad67c3edb32e02201" + }, + "fansi": { + "Package": "fansi", + "Version": "1.0.6", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "utils" + ], + "Hash": "962174cf2aeb5b9eea581522286a911f" + }, + "farver": { + "Package": "farver", + "Version": "2.1.2", + "Source": "Repository", + "Repository": "CRAN", + "Hash": "680887028577f3fa2a81e410ed0d6e42" + }, + "fastmap": { + "Package": "fastmap", + "Version": "1.2.0", + "Source": "Repository", + "Repository": "CRAN", + "Hash": "aa5e1cd11c2d15497494c5292d7ffcc8" + }, + "fontawesome": { + "Package": "fontawesome", + "Version": "0.5.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "htmltools", + "rlang" + ], + "Hash": "bd1297f9b5b1fc1372d19e2c4cd82215" + }, + "fs": { + "Package": "fs", + "Version": "1.6.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "methods" + ], + "Hash": "7f48af39fa27711ea5fbd183b399920d" + }, + "generics": { + "Package": "generics", + "Version": "0.1.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "methods" + ], + "Hash": "15e9634c0fcd294799e9b2e929ed1b86" + }, + "ggplot2": { + "Package": "ggplot2", + "Version": "3.5.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "MASS", + "R", + "cli", + "glue", + "grDevices", + "grid", + "gtable", + "isoband", + "lifecycle", + "mgcv", + "rlang", + "scales", + "stats", + "tibble", + "vctrs", + "withr" + ], + "Hash": "44c6a2f8202d5b7e878ea274b1092426" + }, + "glue": { + "Package": "glue", + "Version": "1.8.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "methods" + ], + "Hash": "5899f1eaa825580172bb56c08266f37c" + }, + "gtable": { + "Package": "gtable", + "Version": "0.3.6", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "grid", + "lifecycle", + "rlang", + "stats" + ], + "Hash": "de949855009e2d4d0e52a844e30617ae" + }, + "here": { + "Package": "here", + "Version": "1.0.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "rprojroot" + ], + "Hash": "24b224366f9c2e7534d2344d10d59211" + }, + "highr": { + "Package": "highr", + "Version": "0.11", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "xfun" + ], + "Hash": "d65ba49117ca223614f71b60d85b8ab7" + }, + "hms": { + "Package": "hms", + "Version": "1.1.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "lifecycle", + "methods", + "pkgconfig", + "rlang", + "vctrs" + ], + "Hash": "b59377caa7ed00fa41808342002138f9" + }, + "htmltools": { + "Package": "htmltools", + "Version": "0.5.8.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "base64enc", + "digest", + "fastmap", + "grDevices", + "rlang", + "utils" + ], + "Hash": "81d371a9cc60640e74e4ab6ac46dcedc" + }, + "isoband": { + "Package": "isoband", + "Version": "0.2.7", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "grid", + "utils" + ], + "Hash": "0080607b4a1a7b28979aecef976d8bc2" + }, + "jquerylib": { + "Package": "jquerylib", + "Version": "0.1.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "htmltools" + ], + "Hash": "5aab57a3bd297eee1c1d862735972182" + }, + "jsonlite": { + "Package": "jsonlite", + "Version": "1.8.9", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "methods" + ], + "Hash": "4e993b65c2c3ffbffce7bb3e2c6f832b" + }, + "knitr": { + "Package": "knitr", + "Version": "1.49", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "evaluate", + "highr", + "methods", + "tools", + "xfun", + "yaml" + ], + "Hash": "9fcb189926d93c636dea94fbe4f44480" + }, + "labeling": { + "Package": "labeling", + "Version": "0.4.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "graphics", + "stats" + ], + "Hash": "b64ec208ac5bc1852b285f665d6368b3" + }, + "lattice": { + "Package": "lattice", + "Version": "0.22-6", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics", + "grid", + "stats", + "utils" + ], + "Hash": "cc5ac1ba4c238c7ca9fa6a87ca11a7e2" + }, + "lifecycle": { + "Package": "lifecycle", + "Version": "1.0.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "rlang" + ], + "Hash": "b8552d117e1b808b09a832f589b79035" + }, + "lme4": { + "Package": "lme4", + "Version": "1.1-35.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "MASS", + "Matrix", + "R", + "Rcpp", + "RcppEigen", + "boot", + "graphics", + "grid", + "lattice", + "methods", + "minqa", + "nlme", + "nloptr", + "parallel", + "splines", + "stats", + "utils" + ], + "Hash": "16a08fc75007da0d08e0c0388c7c33e6" + }, + "lubridate": { + "Package": "lubridate", + "Version": "1.9.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "generics", + "methods", + "timechange" + ], + "Hash": "680ad542fbcf801442c83a6ac5a2126c" + }, + "magrittr": { + "Package": "magrittr", + "Version": "2.0.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "7ce2733a9826b3aeb1775d56fd305472" + }, + "memoise": { + "Package": "memoise", + "Version": "2.0.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "cachem", + "rlang" + ], + "Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c" + }, + "mgcv": { + "Package": "mgcv", + "Version": "1.9-1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "Matrix", + "R", + "graphics", + "methods", + "nlme", + "splines", + "stats", + "utils" + ], + "Hash": "110ee9d83b496279960e162ac97764ce" + }, + "mime": { + "Package": "mime", + "Version": "0.12", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "tools" + ], + "Hash": "18e9c28c1d3ca1560ce30658b22ce104" + }, + "minqa": { + "Package": "minqa", + "Version": "1.2.8", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "Rcpp" + ], + "Hash": "785ef8e22389d4a7634c6c944f2dc07d" + }, + "munsell": { + "Package": "munsell", + "Version": "0.5.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "colorspace", + "methods" + ], + "Hash": "4fd8900853b746af55b81fda99da7695" + }, + "nlme": { + "Package": "nlme", + "Version": "3.1-166", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "graphics", + "lattice", + "stats", + "utils" + ], + "Hash": "ccbb8846be320b627e6aa2b4616a2ded" + }, + "nloptr": { + "Package": "nloptr", + "Version": "2.1.1", + "Source": "Repository", + "Repository": "CRAN", + "Hash": "27550641889a3abf3aec4d91186311ec" + }, + "pillar": { + "Package": "pillar", + "Version": "1.9.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "cli", + "fansi", + "glue", + "lifecycle", + "rlang", + "utf8", + "utils", + "vctrs" + ], + "Hash": "15da5a8412f317beeee6175fbc76f4bb" + }, + "pkgconfig": { + "Package": "pkgconfig", + "Version": "2.0.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "utils" + ], + "Hash": "01f28d4278f15c76cddbea05899c5d6f" + }, + "prettyunits": { + "Package": "prettyunits", + "Version": "1.2.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "6b01fc98b1e86c4f705ce9dcfd2f57c7" + }, + "processx": { + "Package": "processx", + "Version": "3.8.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "ps", + "utils" + ], + "Hash": "0c90a7d71988856bad2a2a45dd871bb9" + }, + "progress": { + "Package": "progress", + "Version": "1.2.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "crayon", + "hms", + "prettyunits" + ], + "Hash": "f4625e061cb2865f111b47ff163a5ca6" + }, + "ps": { + "Package": "ps", + "Version": "1.8.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "utils" + ], + "Hash": "b4404b1de13758dea1c0484ad0d48563" + }, + "rappdirs": { + "Package": "rappdirs", + "Version": "0.3.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "5e3c5dc0b071b21fa128676560dbe94d" + }, + "ratdat": { + "Package": "ratdat", + "Version": "1.1.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "5764aa379c01dc65f275059dd11e4556" + }, + "readr": { + "Package": "readr", + "Version": "2.1.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "cli", + "clipr", + "cpp11", + "crayon", + "hms", + "lifecycle", + "methods", + "rlang", + "tibble", + "tzdb", + "utils", + "vroom" + ], + "Hash": "9de96463d2117f6ac49980577939dfb3" + }, + "renv": { + "Package": "renv", + "Version": "1.0.11", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "utils" + ], + "Hash": "47623f66b4e80b3b0587bc5d7b309888" + }, + "reprex": { + "Package": "reprex", + "Version": "2.1.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "callr", + "cli", + "clipr", + "fs", + "glue", + "knitr", + "lifecycle", + "rlang", + "rmarkdown", + "rstudioapi", + "utils", + "withr" + ], + "Hash": "97b1d5361a24d9fb588db7afe3e5bcbf" + }, + "rlang": { + "Package": "rlang", + "Version": "1.1.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "utils" + ], + "Hash": "3eec01f8b1dee337674b2e34ab1f9bc1" + }, + "rmarkdown": { + "Package": "rmarkdown", + "Version": "2.29", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "bslib", + "evaluate", + "fontawesome", + "htmltools", + "jquerylib", + "jsonlite", + "knitr", + "methods", + "tinytex", + "tools", + "utils", + "xfun", + "yaml" + ], + "Hash": "df99277f63d01c34e95e3d2f06a79736" + }, + "rprojroot": { + "Package": "rprojroot", + "Version": "2.0.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "4c8415e0ec1e29f3f4f6fc108bef0144" + }, + "rstudioapi": { + "Package": "rstudioapi", + "Version": "0.17.1", + "Source": "Repository", + "Repository": "CRAN", + "Hash": "5f90cd73946d706cfe26024294236113" + }, + "sass": { + "Package": "sass", + "Version": "0.4.9", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R6", + "fs", + "htmltools", + "rappdirs", + "rlang" + ], + "Hash": "d53dbfddf695303ea4ad66f86e99b95d" + }, + "scales": { + "Package": "scales", + "Version": "1.3.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "RColorBrewer", + "cli", + "farver", + "glue", + "labeling", + "lifecycle", + "munsell", + "rlang", + "viridisLite" + ], + "Hash": "c19df082ba346b0ffa6f833e92de34d1" + }, + "stringi": { + "Package": "stringi", + "Version": "1.8.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "stats", + "tools", + "utils" + ], + "Hash": "39e1144fd75428983dc3f63aa53dfa91" + }, + "stringr": { + "Package": "stringr", + "Version": "1.5.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "lifecycle", + "magrittr", + "rlang", + "stringi", + "vctrs" + ], + "Hash": "960e2ae9e09656611e0b8214ad543207" + }, + "tibble": { + "Package": "tibble", + "Version": "3.2.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "fansi", + "lifecycle", + "magrittr", + "methods", + "pillar", + "pkgconfig", + "rlang", + "utils", + "vctrs" + ], + "Hash": "a84e2cc86d07289b3b6f5069df7a004c" + }, + "tidyselect": { + "Package": "tidyselect", + "Version": "1.2.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "lifecycle", + "rlang", + "vctrs", + "withr" + ], + "Hash": "829f27b9c4919c16b593794a6344d6c0" + }, + "timechange": { + "Package": "timechange", + "Version": "0.3.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cpp11" + ], + "Hash": "c5f3c201b931cd6474d17d8700ccb1c8" + }, + "tinytex": { + "Package": "tinytex", + "Version": "0.54", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "xfun" + ], + "Hash": "3ec7e3ddcacc2d34a9046941222bf94d" + }, + "tzdb": { + "Package": "tzdb", + "Version": "0.4.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cpp11" + ], + "Hash": "f561504ec2897f4d46f0c7657e488ae1" + }, + "utf8": { + "Package": "utf8", + "Version": "1.2.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "62b65c52671e6665f803ff02954446e9" + }, + "vctrs": { + "Package": "vctrs", + "Version": "0.6.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "lifecycle", + "rlang" + ], + "Hash": "c03fa420630029418f7e6da3667aac4a" + }, + "viridisLite": { + "Package": "viridisLite", + "Version": "0.4.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "c826c7c4241b6fc89ff55aaea3fa7491" + }, + "vroom": { + "Package": "vroom", + "Version": "1.6.5", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "bit64", + "cli", + "cpp11", + "crayon", + "glue", + "hms", + "lifecycle", + "methods", + "progress", + "rlang", + "stats", + "tibble", + "tidyselect", + "tzdb", + "vctrs", + "withr" + ], + "Hash": "390f9315bc0025be03012054103d227c" + }, + "withr": { + "Package": "withr", + "Version": "3.0.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "graphics" + ], + "Hash": "cc2d62c76458d425210d1eb1478b30b4" + }, + "xfun": { + "Package": "xfun", + "Version": "0.49", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "grDevices", + "stats", + "tools" + ], + "Hash": "8687398773806cfff9401a2feca96298" + }, + "yaml": { + "Package": "yaml", + "Version": "2.3.10", + "Source": "Repository", + "Repository": "CRAN", + "Hash": "51dab85c6c98e50a18d7551e9d49f76c" + } + } +} diff --git a/renv/profiles/lesson-requirements/renv/.gitignore b/renv/profiles/lesson-requirements/renv/.gitignore new file mode 100644 index 00000000..0ec0cbba --- /dev/null +++ b/renv/profiles/lesson-requirements/renv/.gitignore @@ -0,0 +1,7 @@ +library/ +local/ +cellar/ +lock/ +python/ +sandbox/ +staging/ diff --git a/renv/profiles/lesson-requirements/renv/settings.json b/renv/profiles/lesson-requirements/renv/settings.json new file mode 100644 index 00000000..ffdbb320 --- /dev/null +++ b/renv/profiles/lesson-requirements/renv/settings.json @@ -0,0 +1,19 @@ +{ + "bioconductor.version": null, + "external.libraries": [], + "ignored.packages": [], + "package.dependency.fields": [ + "Imports", + "Depends", + "LinkingTo" + ], + "ppm.enabled": null, + "ppm.ignored.urls": [], + "r.version": null, + "snapshot.type": "implicit", + "use.cache": true, + "vcs.ignore.cellar": true, + "vcs.ignore.library": true, + "vcs.ignore.local": true, + "vcs.manage.ignores": true +} From f1ae7891d94c5fb7c117b434a28873197d535639 Mon Sep 17 00:00:00 2001 From: Xochitl Ortiz-Ross Date: Tue, 3 Dec 2024 15:40:23 -0800 Subject: [PATCH 3/3] Changed episode #s Now that we got rid of original episode 2 --- .DS_Store | Bin 6148 -> 6148 bytes episodes/.DS_Store | Bin 6148 -> 6148 bytes ...problem.Rmd => 2-identify-the-problem.Rmd} | 0 ...ta.Rmd => 3-minimal-reproducible-data.Rmd} | 0 ... => 4-minimal-reproducible-code-draft.Rmd} | 0 ...de.Rmd => 4-minimal-reproducible-code.Rmd} | 0 ...-question.md => 5-asking-your-question.md} | 0 episodes/archive_temp/.DS_Store | Bin 0 -> 6148 bytes .../2-understanding-your-code.md | 0 episodes/{ => archive_temp}/introduction.md | 0 10 files changed, 0 insertions(+), 0 deletions(-) rename episodes/{3-identify-the-problem.Rmd => 2-identify-the-problem.Rmd} (100%) rename episodes/{4-minimal-reproducible-data.Rmd => 3-minimal-reproducible-data.Rmd} (100%) rename episodes/{5-minimal-reproducible-code-draft.Rmd => 4-minimal-reproducible-code-draft.Rmd} (100%) rename episodes/{5-minimal-reproducible-code.Rmd => 4-minimal-reproducible-code.Rmd} (100%) rename episodes/{6-asking-your-question.md => 5-asking-your-question.md} (100%) create mode 100644 episodes/archive_temp/.DS_Store rename episodes/{ => archive_temp}/2-understanding-your-code.md (100%) rename episodes/{ => archive_temp}/introduction.md (100%) diff --git a/.DS_Store b/.DS_Store index 0a0d46ac615b4aa75fcd0f2435cfc6cdbafc7c8c..264cdc415ee44cd2dbedbae684361ab7bbbee2a0 100644 GIT binary patch delta 21 ccmZoMXffEZfRV%0z)(lQ$jET>3dR^Q07mQv2LJ#7 delta 21 ccmZoMXffEZfRV$@#6(BI*u-S>3dR^Q07r5L6#xJL diff --git a/episodes/.DS_Store b/episodes/.DS_Store index 5008ddfcf53c02e82d7eee2e57c38e5672ef89f6..736a694575ee701924ccf0e3fbbfdd7488d204da 100644 GIT binary patch literal 6148 zcmeHK%}T>S5T32oCaBPZg2x4~1)GY3c!{+h^k_s6Dm5|12GeY4Q+p`Iob`RgC-4z` z1fRp%pNiuDMNns8_M6@PCfofIb~^yT>iy6GC94F{My3#%dPW2n z(1Ipp*pmGQ1?0P%0Ux?>3T=4){yLBbgI*fMB0$bVWK!WRVvw~>C&;+ z76UKyvwpAccMoN6Plz}e$9}LMwFb@d+@?q}KT29%l@Nt3q#PYYNhk(&(NDsj%JuYw zZCiG;yf7TDRGkX1)tpg<57%nd3SV`WMZJx)SL|kc+8Ge7tW(|(v8iRND z7oR!aspzC*@<-f`Xf%u#*7nU#@Njvz@i9OA7C&u&{kL3r-6R&t8LmIx_p>cfND7bw zq`;>Nn6ty0{**}sQh*frZwkosful=w4dxov(SeOB0T7EgtPEpWOUN8<&^4HA#2GZE zQxSD4GbIMo={U{}&o!89)ak%X@xjc>%v2~$t&Y#AjHu~2NHo+1YW5HK<@2yDK{Y{s(r0dp1eW_AvK4xj>{$am(+{342+ UKzW7)kiy9(Jj$D6L{=~Z06bFH1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3 zem<@ulZcFPQ@L2!n>{z**++&mCkOWA81W14cNZlEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ zLs35+`xjp>T0