The pins package supports back-and-forth collaboration for publishing
and consuming using, for example, board_s3()
. The goal of
this vignette is to show how to publish a board of pins to a website,
bringing your pins to a wider audience. How does this work?
- For consumers,
board_url()
offers read-only collaboration. - For publishers, a helper function is provided,
write_board_manifest()
. A manifest contains a list of pins and versions, enabling aboard_url()
to read like aboard_s3()
orboard_folder()
.
Publishing
The steps for publishing a board that can be read by consumers using
board_url()
are:
- create a board,
- write pins to the board,
- write a manifest file using
write_board_manifest()
, and - publish your website.
The first and last steps will be specific to how you deploy your board on the web; we discuss options in the Publishing platforms section. Regardless of platform, you’ll write the pins and the manifest the same way.
For this first demonstration, we’ll start by creating a board, and finish by showing how the board works after being served.
library(pins)
board <- board_temp(versioned = TRUE)
We’re using a temporary board for this demonstration, but in
practice, you might use board_folder()
in a project folder
or GitHub repo, or perhaps board_s3()
.
Let’s make the mtcars
dataset available as a JSON
file:
board %>% pin_write(mtcars, type = "json")
#> Using `name = 'mtcars'`
#> Creating new version '20241007T174722Z-c2702'
#> Writing to pin 'mtcars'
Let’s make a new version of this data by adding a column:
lper100km
, consumption in liters per 100 km. This could
make our data friendlier to folks outside the United States.
mtcars_metric <- mtcars
mtcars_metric$lper100km <- 235.215 / mtcars$mpg
board %>% pin_write(mtcars_metric, name = "mtcars", type = "json")
#> Creating new version '20241007T174724Z-8416c'
#> Writing to pin 'mtcars'
Let’s check our board to ensure we have one pin named
"mtcars"
, with two versions:
board %>% pin_list()
#> [1] "mtcars"
board %>% pin_versions("mtcars")
#> # A tibble: 2 × 3
#> version created hash
#> <chr> <dttm> <chr>
#> 1 20241007T174722Z-c2702 2024-10-07 17:47:22 c2702
#> 2 20241007T174724Z-8416c 2024-10-07 17:47:24 8416c
Because a board_url()
is consumed over the web, it
doesn’t have access to a file system the way, for example, a
board_folder()
has; we can work around this by creating a
manifest file. When a board_url()
is set up by a consumer
for reading, the pins package uses this file to discover the pins and
their versions. The manifest file is the key to
board_url()
’s ability to discover pins as if it were a
file-system-based board.
After writing pins but before publishing, call
write_board_manifest()
:
board %>% write_board_manifest()
#> Manifest file written to root folder of board, as `_pins.yaml`
The maintenance of this manifest file is not automated; it is your responsibility as the board publisher to keep the manifest up to date.
Let’s confirm that there is a file called
_pins.yaml
:
We can inspect its contents to see each pin in the board, and each version of each pin:
At this point, we would publish the folder containing the board as a
part of a web site. Let’s pretend that we have served the folder from
our fake website, https://not.real.website.co/pins/
.
Consuming
With an up-to-date manifest file, a board_url()
can
behave as a read-only version of a board_folder()
. Let’s
create a board_url()
using our fake URL:
web_board <- board_url("https://not.real.website.co/pins/")
The board_url()
function reads the manifest file to
discover the pins and versions:
web_board %>% pin_list()
#> [1] "mtcars"
versions <- web_board %>% pin_versions("mtcars")
versions
#> # A tibble: 2 × 3
#> version created hash
#> <chr> <dttm> <chr>
#> 1 20241007T174722Z-c2702 2024-10-07 17:47:22 c2702
#> 2 20241007T174724Z-8416c 2024-10-07 17:47:24 8416c
We can read the most-recent version of the "mtcars"
pin:
web_board %>% pin_read("mtcars") %>% head()
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#> lper100km
#> Mazda RX4 11.2007
#> Mazda RX4 Wag 11.2007
#> Datsun 710 10.3164
#> Hornet 4 Drive 10.9914
#> Hornet Sportabout 12.5783
#> Valiant 12.9953
We can also read the first version:
web_board %>% pin_read("mtcars", version = versions$version[[1]]) %>% head()
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Publishing platforms
The goal of this section is to illustrate ways to publish a board as a part of a website.
pkgdown
Pins offers another way for package developers to share data associated with an R package. Publishing a package dataset as a pin can extend your data’s “audience” to those who have not installed the package.
Using pkgdown,
any files you save in the directory pkgdown/assets/
will be
copied to the website’s root directory when
pkgdown::build_site()
is run.
The R
Packages book suggests using a folder called
data-raw
for working with datasets; this can be adapted to
use pins. You would start with usethis::use_data_raw()
. In
a file in your /data-raw
directory, wrangle and clean your
datasets in the same way as if you were going to use
usethis::use_data()
. To offer such datasets on a web-based
board instead of as a built-in package dataset, in your
/data-raw
file you would:
- Create a board:
board_folder(here::here("pkgdown/assets/pins-board"))
(you might use a different name than"pins-board"
). - Write your datasets to the board using
pin_write()
. - At the end, call
write_board_manifest()
.
Now when you build your pkgdown site and serve it (perhaps via GitHub
Pages at a URL like
https://user-name.github.io/repo-name/
), your datasets are
available as pins.
The R Packages book offers this observation on CRAN and package data:
Generally, package data should be smaller than a megabyte - if it’s larger you’ll need to argue for an exemption.
Publishing a board on your pkgdown site provides a way to offer datasets too large for CRAN or extended versions of your data. A consumer can read your pins by setting up a board like:
board <- board_url("https://user-name.github.io/repo-name/pins-board/")
S3
S3 buckets can be made available to different users using permissions;
buckets can even be made publicly accessible. Publishing data as a pin
in an S3 bucket can allow your collaborators to read without dealing
with the authentication required by board_s3()
.
To offer datasets as a pin on S3 via board_url()
you
would:
- Create a board:
board_s3("your-existing-bucket")
(set the bucket’s permissions to give appropriate people access). - Write your datasets to the board using
pin_write()
. - At the end, call
write_board_manifest()
.
S3 buckets typically have a URL like
https://your-existing-bucket.s3.us-west-2.amazonaws.com/
.
For a person who has access to your bucket, they can read your pins by
setting up a board like:
board <- board_url("https://your-existing-bucket.s3.us-west-2.amazonaws.com/")