-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
181 lines (129 loc) · 6.32 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# crandep
<!-- badges: start -->
<!-- badges: end -->
The goal of crandep is to provide functions for analysing the dependencies of CRAN packages using social network analysis.
## Installation
You can install crandep from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("clement-lee/crandep")
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, message = FALSE}
library(crandep)
library(dplyr)
library(ggplot2)
library(igraph)
```
## Overview
The functions and example dataset can be divided into the following categories:
1. For obtaining data frames of package dependencies, use `get_dep()`, `get_dep_all_packages()`.
2. For obtaining igraph objects of package dependencies, use `get_graph_all_packages()` and `df_to_graph()`.
3. For modelling the number of dependencies, use `*pol()` and `*mix2()`.
4. There is also an example data set `cran_dependencies`.
## One or multiple types of dependencies
To obtain the information about various kinds of dependencies of a package, we can use the function `get_dep()` which takes the package name and the type of dependencies as the first and second arguments, respectively. Currently, the second argument accepts a character vector of one or more of the following words: `Depends`, `Imports`, `LinkingTo`, `Suggests`, `Enhances`, `Reverse_depends`, `Reverse_imports`, `Reverse_linking_to`, `Reverse_suggests`, and `Reverse_enhances`, or any variations in their letter cases, or if the underscore "_" is replaced by a space.
```{r}
get_dep("dplyr", "Imports")
get_dep("MASS", c("depends", "suggests"))
```
For more information on different types of dependencies, see [the official guidelines](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-Dependencies) and [https://r-pkgs.org/description.html](https://r-pkgs.org/description.html).
In the output, the column `type` is the type of the dependency converted to lower case. Also, `LinkingTo` is now converted to `linking to` for consistency.
```{r}
get_dep("xts", "LinkingTo")
get_dep("xts", "linking to")
```
For the reverse dependencies, instead of including the prefix "Reverse " in `type`, we use the argument `reverse`:
```{r}
get_dep("abc", c("depends", "depends"), reverse = TRUE)
get_dep("xts", c("linking to", "linking to"), reverse = TRUE)
```
Theoretically, for each forward dependency
```{r, echo=FALSE}
data.frame(from = "A", to = "B", type = "c", reverse = FALSE)
```
there should be an equivalent reverse dependency
```{r, echo=FALSE}
data.frame(from = "B", to = "A", type = "c", reverse = TRUE)
```
Aligning the `type` in the forward dependency and the reverse dependency enables this to be checked easily.
To obtain all types of dependencies, we can use `"all"` in the second argument, instead of typing a character vector of all words:
```{r}
df0.rstan <- get_dep("rstan", "all")
dplyr::count(df0.rstan, type)
df1.rstan <- get_dep("rstan", "all", reverse = TRUE) # too many rows to display
dplyr::count(df1.rstan, type) # hence the summary using count()
```
```{r,echo=FALSE}
df0.all <- get_dep_all_packages()
df1.all <- df0.all |> dplyr::count(from, type, reverse) |> dplyr::count(from)
v9.all <- dplyr::filter(df1.all, n == 9L)$from
v0.all <- dplyr::filter(df1.all, n == 10L)$from
```
As of `r Sys.Date()`, there are `r length(v0.all)` packages that have all 10 types of dependencies, and `r length(v9.all)` packages that have 9 types of dependencies: `r paste(v9.all, collapse = ", ")`.
## Building and visualising a dependency network
To build a dependency network, we have to obtain the dependencies for multiple packages. For illustration, we choose the [core packages of the tidyverse](https://www.tidyverse.org/packages/), and find out what each package `Imports`. We put all the dependencies into one data frame, in which the package in the `from` column imports the package in the `to` column. This is essentially the edge list of the dependency network.
```{r}
df0.imports <- rbind(
get_dep("ggplot2", "Imports"),
get_dep("dplyr", "Imports"),
get_dep("tidyr", "Imports"),
get_dep("readr", "Imports"),
get_dep("purrr", "Imports"),
get_dep("tibble", "Imports"),
get_dep("stringr", "Imports"),
get_dep("forcats", "Imports")
)
head(df0.imports)
tail(df0.imports)
```
## All types of dependencies, in a data frame
The example dataset `cran_dependencies` contains all dependencies as of 2020-05-09.
```{r}
data(cran_dependencies)
cran_dependencies
dplyr::count(cran_dependencies, type, reverse)
```
This is essentially a snapshot of CRAN. We can obtain all the current dependencies using `get_dep_all_packages()`, which requires no arguments:
```{r}
df0.cran <- get_dep_all_packages()
head(df0.cran)
dplyr::count(df0.cran, type, reverse) # numbers in general larger than above
```
## Network of one type of dependencies, as an igraph object
We can build dependency network using `get_graph_all_packages()`. Furthermore, we can verify that the forward and reverse dependency networks are (almost) the same, by looking at their size (number of edges) and order (number of nodes).
```{r get_graph_all_packages}
g0.depends <- get_graph_all_packages(type = "depends")
g0.depends
```
We could obtain essentially the same graph, but with the direction of the edges reversed, by specifying `type = "reverse depends"`:
```{r get_graph_all_packages_rev, eval = FALSE}
# Not run
g0.rev_depends <- get_graph_all_packages(type = "depends", reverse = TRUE)
g0.rev_depends
```
The dependency words accepted by the argument `type` is the same as in `get_dep()`. The two networks' size and order should be very close if not identical to each other. Because of the dependency direction, their edge lists should be the same but with the column names `from` and `to` swapped.
For verification, the exact same graphs can be obtained by filtering the data frame for the required dependency and applying `df_to_graph()`:
```{r forward_equivalent}
g1.depends <- df0.cran |>
dplyr::filter(type == "depends" & !reverse) |>
df_to_graph(nodelist = dplyr::rename(df0.cran, name = from))
g1.depends # same as g0.depends
```