class: center, middle, inverse, title-slide # Язык R и его применение в биоинформатике ### Анастасия Жарикова, Анна Валяева ### 24.09.2021 --- # Функции - Если вы заметили, что несколько раз используете один и тот же код, то запишите его в функцию. ```r add_ten <- function(x){ x + 10 } add_ten(32) ``` ``` [1] 42 ``` --- # Взглянуть на код функции И на свою функцию посмотреть: ```r add_ten ``` ``` function(x){ x + 10 } ``` И на чужую: ```r xor ``` ``` function (x, y) { (x | y) & !(x & y) } <bytecode: 0x55f4f02562b0> <environment: namespace:base> ``` --- # Return - Функция возвращает результат последнего выражения либо то, что указано как `return(...)`. ```r add_ten_or_not <- function(x){ return(x + 10) x + 100 } add_ten_or_not(10) ``` ``` [1] 20 ``` --- .pull-left[ ## Implicit return ```r check_sign_i <- function(x){ # check if x is positive if (x > 0){ "positive" } # check if x is negative else if (x < 0){ "negative" } # check if x is not positive nor negative else{ "zero" } } check_sign_i(10) ``` ``` [1] "positive" ``` ] .pull-right[ ## Explicit return ```r check_sign_e <- function(x){ # check if x is positive if (x > 0){ return("positive") } # check if x is negative else if (x < 0){ return("negative") } # check if x is not positive nor negative else{ return("zero") } } check_sign_e(10) ``` ``` [1] "positive" ``` ] --- # Multiple returns - Чтобы функция возвращала несколько объектов, нужно эти объекты возвращать в виде `list`. ```r return_two_and_four <- function(){ list(2, 4) } return_two_and_four() ``` ``` [[1]] [1] 2 [[2]] [1] 4 ``` --- # Локальные переменные ```r x <- 1000 add_ten <- function(x){ x + 10 } add_ten(32) ``` ``` [1] 42 ``` ```r x ``` ``` [1] 1000 ``` --- # Зарезервированные названия переменных Такие названия использовать не стоит. ```r c <- 10 T <- FALSE mean <- function(x) sum(x) ``` --- # Значение аргумента по умолчанию ```r add_number <- function(x, val = 0){ x + val } add_number(10) ``` ``` [1] 10 ``` ```r add_number(10, 5) ``` ``` [1] 15 ``` --- # Избегайте ошибок ```r add_number("two") ``` ``` Error in x + val: non-numeric argument to binary operator ``` -- ```r add_number <- function(x, val = 0){ if (!is.numeric(x)){ stop("`x` must be a number") } x + val } add_number("two") ``` ``` Error in add_number("two"): `x` must be a number ``` --- # И избегайте циклов! - Если нужно применить функцию к элементам некоторого списка, то используйте `map` ```r # Вместо for (i in 1:3){ f(i) } # или list(f(1), f(2), f(3)) # нужно всего лишь... map(1:3, f) ``` -- .pull-left[ ```r library(purrr) # library(tidyverse) ``` ] .pull-right[ <img src="img/2021-10-01/purrr.png" width="35%" style="display: block; margin: auto;" /> ] --- # Семейство функций `map` .pull-left[ ```r cube <- function(x) x * 3 map(1:3, cube) ``` ``` [[1]] [1] 3 [[2]] [1] 6 [[3]] [1] 9 ``` ] -- .pull-right[ <img src="img/2021-10-01/map.png" width="100%" style="display: block; margin: auto;" /> ] --- # Разные `map_` - Простой `map()` всегда возвращает list. - Если вы уверены, что ваш результат подходит под определение вектора (данные одного типа), используйте `map_`: - `map_chr` - `map_lgl` - `map_int` - `map_dbl` ```r map_dbl(1:3, cube) ``` ``` [1] 3 6 9 ``` --- # Разные `map_` ```r map_chr(mtcars, typeof) ``` ``` mpg cyl disp hp drat wt qsec vs "double" "double" "double" "double" "double" "double" "double" "double" am gear carb "double" "double" "double" ``` ```r map_lgl(mtcars, is.double) ``` ``` mpg cyl disp hp drat wt qsec vs am gear carb TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ``` ```r map_int(mtcars, n_distinct) ``` ``` mpg cyl disp hp drat wt qsec vs am gear carb 25 3 27 22 22 29 30 2 2 3 6 ``` --- # Anonymous functions / lambda functions ```r # Если забыли про n_distinct из dplyr: map_dbl(mtcars, function(x) length(unique(x))) ``` ``` mpg cyl disp hp drat wt qsec vs am gear carb 25 3 27 22 22 29 30 2 2 3 6 ``` ```r # Лень писать так много. Так проще: map_dbl(mtcars, ~ length(unique(.x))) ``` ``` mpg cyl disp hp drat wt qsec vs am gear carb 25 3 27 22 22 29 30 2 2 3 6 ``` --- # `map_df` - `map_dfr` - это `map()` + `bind_rows()` - `map_dfc` - это `map()` + `bind_cols()` -- ```r input_files <- c("file1.csv", "file2.csv", "file3.csv") ``` -- .pull-left[ ```r file1 <- read_csv("file1.csv") file2 <- read_csv("file2.csv") file3 <- read_csv("file3.csv") file <- bind_rows(file1, file2, file3) ``` ] -- .pull-right[ ```r file <- map_dfr(input_files, read_csv) ``` ] --- # `map` ❤️ списки ```r x <- list( list(-1, x = 1, y = c(2), z = "a"), list(-2, x = 4, y = c(5, 6), z = "b"), list(-3, x = 8, y = c(9, 10, 11))) map_dbl(x, "x") # по имени элемента ``` ``` [1] 1 4 8 ``` ```r map_dbl(x, list("y", 1)) # по позиции ``` ``` [1] 2 5 9 ``` ```r map_chr(x, "z", .default = NA) ``` ``` [1] "a" "b" NA ``` --- # Аргументы функции .pull-left[ ```r x <- list(1:5, c(1:10, NA)) # Не очень map_dbl(x, ~ mean(.x, na.rm = TRUE)) ``` ``` [1] 3.0 5.5 ``` ```r # Получше map_dbl(x, mean, na.rm = TRUE) ``` ``` [1] 3.0 5.5 ``` ] -- .pull-right[ <img src="img/2021-10-01/map-arg.png" width="100%" style="display: block; margin: auto;" /> <img src="img/2021-10-01/map-arg-recycle.png" width="100%" style="display: block; margin: auto;" /> ] --- # `map2` - Если нужно итерировать и по элементам списка, и по вектору аргумента функции. ```r xs <- map(1:8, ~ runif(10)) ws <- map(1:8, ~ rpois(10, 5) + 1) map2_dbl(xs, ws, weighted.mean, na.rm = TRUE) ``` ``` [1] 0.3485683 0.4950948 0.5957083 0.6036551 0.4957436 0.5629462 0.4741791 [8] 0.3276506 ``` -- <img src="img/2021-10-01/map2-arg.png" width="60%" style="display: block; margin: auto;" /> --- # `pmap` - Когда не хватает `map2`, а нужен `map3` или даже `map4`... - Нужно подать список всех аргументов функции. - `map2(x, y, f)` - то же, что и `pmap(list(x, y), f)`. ```r pmap_dbl(list(xs, ws), weighted.mean) ``` ``` [1] 0.3485683 0.4950948 0.5957083 0.6036551 0.4957436 0.5629462 0.4741791 [8] 0.3276506 ``` --- # `imap` - `imap(x, f)` - то же, что и `map2(x, seq_along(x), f)` или `map2(x, names(x), f)` ```r # .y - это название элемента списка # .x - элемент списка x <- map(1:6, ~ sample(1000, 10)) imap_chr(x, ~ paste0("The highest value of ", .y, " is ", max(.x))) ``` ``` [1] "The highest value of 1 is 987" "The highest value of 2 is 742" [3] "The highest value of 3 is 810" "The highest value of 4 is 980" [5] "The highest value of 5 is 958" "The highest value of 6 is 990" ``` --- # `walk` .pull-left[ ```r ggplots <- list(gg1, gg2, gg3) output_files <- c("plot1.png", "plot2.png", "plot3.png") walk2(output_files, ggplots, ggsave) ``` ] -- .pull-right[ <img src="img/2021-10-01/walk.png" width="50%" style="display: block; margin: auto;" /> <img src="img/2021-10-01/walk2.png" width="60%" style="display: block; margin: auto;" /> ] --- # Отслеживать и ловить ошибки
- `safely()` возвращает список из двух элементов: - `result` нужный результат или NULL если была ошибка, - `error` error object или NULL, если ошибки не было. - `possibly()` позволяет использовать default value, если возникает ошибка. - `quietly()` выдает result, output, messages и warnings. --- # Что почитать про функции и purrr - [Functions Chapter in R4DS](https://r4ds.had.co.nz/functions.html) - [Functions Chapter in Advanced R](https://adv-r.hadley.nz/functions.html) - [purrr cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/purrr.pdf) - [Functional Chapter in Advanced R](https://adv-r.hadley.nz/functionals.html) - [purrr tutorial by Jenny Bryan](https://jennybc.github.io/purrr-tutorial/index.html) - Картинки, иллюстрирующие принцип работы функций из **purrr**, взяты из 'Advanced R' by Hadley Wickham. --- # Объединение таблиц Датасет про лемуров ай-ай. Первая таблица - общая информация о лемурах, проживающих или проживавших в Duke Lemur Center. ```r aye_lemurs <- read_csv("data/2021-10-01/aye_info.csv") str(aye_lemurs) ``` ``` spec_tbl_df [51 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame) $ animal_id : num [1:51] 6201 6202 6261 6262 6451 ... $ name : chr [1:51] "Nosferatu" "Poe" "Samantha" "Annabel Lee" ... $ taxonomic_code: chr [1:51] "DMAD" "DMAD" "DMAD" "DMAD" ... $ species : chr [1:51] "Aye-aye" "Aye-aye" "Aye-aye" "Aye-aye" ... $ sex : chr [1:51] "M" "M" "F" "F" ... $ birth_date : Date[1:51], format: "1985-12-18" "1986-12-19" ... $ birth_type : chr [1:51] "wild" "wild" "wild" "captive" ... $ litter_size : num [1:51] NA NA NA NA NA NA NA NA 1 1 ... $ death_date : Date[1:51], format: "2018-07-30" NA ... - attr(*, "spec")= .. cols( .. animal_id = col_double(), .. name = col_character(), .. taxonomic_code = col_character(), .. species = col_character(), .. sex = col_character(), .. birth_date = col_date(format = ""), .. birth_type = col_character(), .. litter_size = col_double(), .. death_date = col_date(format = "") .. ) - attr(*, "problems")=<externalptr> ``` --- # Объединение таблиц Вторая таблица - результаты трёх взвешиваний лемуров. ```r lemur_weights <- read_csv("data/2021-10-01/lemurs_weights.csv") lemur_weights ``` ``` # A tibble: 2,270 × 4 animal_id weight_1 weight_2 weight_3 <dbl> <dbl> <dbl> <dbl> 1 5 1190 1190 1086 2 6 1174 947 NA 3 9 899 910 899 4 10 1185 1236 NA 5 14 897 968 NA 6 17 848 812 NA 7 18 1074 1054 NA 8 21 1048 1048 1004 9 23 805 915 805 10 24 702 646 702 # … with 2,260 more rows ``` --- # Объединение таблиц ```r aye <- aye_lemurs %>% left_join(lemur_weights, by = "animal_id") str(aye) ``` ``` spec_tbl_df [51 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame) $ animal_id : num [1:51] 6201 6202 6261 6262 6451 ... $ name : chr [1:51] "Nosferatu" "Poe" "Samantha" "Annabel Lee" ... $ taxonomic_code: chr [1:51] "DMAD" "DMAD" "DMAD" "DMAD" ... $ species : chr [1:51] "Aye-aye" "Aye-aye" "Aye-aye" "Aye-aye" ... $ sex : chr [1:51] "M" "M" "F" "F" ... $ birth_date : Date[1:51], format: "1985-12-18" "1986-12-19" ... $ birth_type : chr [1:51] "wild" "wild" "wild" "captive" ... $ litter_size : num [1:51] NA NA NA NA NA NA NA NA 1 1 ... $ death_date : Date[1:51], format: "2018-07-30" NA ... $ weight_1 : num [1:51] 2860 2700 2242 944 2760 ... $ weight_2 : num [1:51] 2505 2610 2360 1180 2520 ... $ weight_3 : num [1:51] 2930 2680 2415 1689 2620 ... - attr(*, "spec")= .. cols( .. animal_id = col_double(), .. name = col_character(), .. taxonomic_code = col_character(), .. species = col_character(), .. sex = col_character(), .. birth_date = col_date(format = ""), .. birth_type = col_character(), .. litter_size = col_double(), .. death_date = col_date(format = "") .. ) - attr(*, "problems")=<externalptr> ``` --- # Объединение таблиц
- `left_join` - `right_join` - `inner_join` - `full_join` <br> - `semi_join` - `anti_join` --- # А если таблиц много? - `reduce()` берет на вход вектор длины n и возвращает вектор длины 1, применяя функцию к элементам вектора попарно: - `reduce(1:4, f)` равно `f(f(f(1, 2), 3), 4)` .pull-left[ ```r df <- reduce( list(df1, df2, df3, df4), left_join, by = "ID") ``` ] .pull-right[ <img src="img/2021-10-01/reduce-arg.png" width="80%" style="display: block; margin: auto;" /> ] --- # Трансформация таблиц ## Каждый лемур - сам себе группа. Задача: по 3 взвешиваниям посчитать средний вес каждого лемура. ```r aye %>% select(name, matches("weight")) %>% mutate(avg_weight = (weight_1 + weight_2 + weight_3) / 3) %>% head(5) ``` ``` # A tibble: 5 × 5 name weight_1 weight_2 weight_3 avg_weight <chr> <dbl> <dbl> <dbl> <dbl> 1 Nosferatu 2860 2505 2930 2765 2 Poe 2700 2610 2680 2663. 3 Samantha 2242 2360 2415 2339 4 Annabel Lee 944 1180 1689 1271 5 Mephistopheles 2760 2520 2620 2633. ``` --- # Трансформация таблиц ## Каждый лемур - сам себе группа. Задача: по 3 взвешиваниям посчитать средний вес каждого лемура. ```r aye %>% select(name, matches("weight")) %>% * rowwise() %>% mutate(avg_weight = mean(c(weight_1, weight_2, weight_3), na.rm = TRUE)) %>% head(5) ``` ``` # A tibble: 5 × 5 # Rowwise: name weight_1 weight_2 weight_3 avg_weight <chr> <dbl> <dbl> <dbl> <dbl> 1 Nosferatu 2860 2505 2930 2765 2 Poe 2700 2610 2680 2663. 3 Samantha 2242 2360 2415 2339 4 Annabel Lee 944 1180 1689 1271 5 Mephistopheles 2760 2520 2620 2633. ``` --- # Трансформация таблиц ## Каждый лемур - сам себе группа. Задача: по 3 взвешиваниям посчитать средний вес каждого лемура. ```r aye %>% select(name, matches("weight")) %>% rowwise() %>% mutate(avg_weight = mean(c_across(where(is.numeric)), na.rm = TRUE)) %>% head(5) ``` ``` # A tibble: 5 × 5 # Rowwise: name weight_1 weight_2 weight_3 avg_weight <chr> <dbl> <dbl> <dbl> <dbl> 1 Nosferatu 2860 2505 2930 2765 2 Poe 2700 2610 2680 2663. 3 Samantha 2242 2360 2415 2339 4 Annabel Lee 944 1180 1689 1271 5 Mephistopheles 2760 2520 2620 2633. ``` --- # Выбор столбцов Функции tidyselect: - `starts_with()` - `ends_with()` - `contains()` - `matches()` - `num_range()` <br> - `one_of(col_names_vec)` - `where(is.numeric)` Поменять порядок столбцов: ```r aye %>% select(taxonomic_code, name, everything()) ``` --- # Трансформировать сразу все столбцы ```r aye %>% * mutate(across(everything(), tolower)) %>% str() ``` ``` spec_tbl_df [51 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame) $ animal_id : chr [1:51] "6201" "6202" "6261" "6262" ... $ name : chr [1:51] "nosferatu" "poe" "samantha" "annabel lee" ... $ taxonomic_code: chr [1:51] "dmad" "dmad" "dmad" "dmad" ... $ species : chr [1:51] "aye-aye" "aye-aye" "aye-aye" "aye-aye" ... $ sex : chr [1:51] "m" "m" "f" "f" ... $ birth_date : chr [1:51] "1985-12-18" "1986-12-19" "1978-08-15" "1988-05-24" ... $ birth_type : chr [1:51] "wild" "wild" "wild" "captive" ... $ litter_size : chr [1:51] NA NA NA NA ... $ death_date : chr [1:51] "2018-07-30" NA "1995-12-20" "1989-12-09" ... $ weight_1 : chr [1:51] "2860" "2700" "2242" "944" ... $ weight_2 : chr [1:51] "2505" "2610" "2360" "1180" ... $ weight_3 : chr [1:51] "2930" "2680" "2415" "1689" ... - attr(*, "spec")= .. cols( .. animal_id = col_double(), .. name = col_character(), .. taxonomic_code = col_character(), .. species = col_character(), .. sex = col_character(), .. birth_date = col_date(format = ""), .. birth_type = col_character(), .. litter_size = col_double(), .. death_date = col_date(format = "") .. ) - attr(*, "problems")=<externalptr> ``` --- # Трансформировать несколько столбцов ```r aye %>% * mutate(across(c("name", "taxonomic_code"), tolower)) %>% str() ``` ``` spec_tbl_df [51 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame) $ animal_id : num [1:51] 6201 6202 6261 6262 6451 ... $ name : chr [1:51] "nosferatu" "poe" "samantha" "annabel lee" ... $ taxonomic_code: chr [1:51] "dmad" "dmad" "dmad" "dmad" ... $ species : chr [1:51] "Aye-aye" "Aye-aye" "Aye-aye" "Aye-aye" ... $ sex : chr [1:51] "M" "M" "F" "F" ... $ birth_date : Date[1:51], format: "1985-12-18" "1986-12-19" ... $ birth_type : chr [1:51] "wild" "wild" "wild" "captive" ... $ litter_size : num [1:51] NA NA NA NA NA NA NA NA 1 1 ... $ death_date : Date[1:51], format: "2018-07-30" NA ... $ weight_1 : num [1:51] 2860 2700 2242 944 2760 ... $ weight_2 : num [1:51] 2505 2610 2360 1180 2520 ... $ weight_3 : num [1:51] 2930 2680 2415 1689 2620 ... - attr(*, "spec")= .. cols( .. animal_id = col_double(), .. name = col_character(), .. taxonomic_code = col_character(), .. species = col_character(), .. sex = col_character(), .. birth_date = col_date(format = ""), .. birth_type = col_character(), .. litter_size = col_double(), .. death_date = col_date(format = "") .. ) - attr(*, "problems")=<externalptr> ``` --- # Трансформировать несколько столбцов ```r aye %>% select(name, matches("weight")) %>% * mutate(across(where(is.numeric), log)) ``` ``` # A tibble: 51 × 4 name weight_1 weight_2 weight_3 <chr> <dbl> <dbl> <dbl> 1 Nosferatu 7.96 7.83 7.98 2 Poe 7.90 7.87 7.89 3 Samantha 7.72 7.77 7.79 4 Annabel Lee 6.85 7.07 7.43 5 Mephistopheles 7.92 7.83 7.87 6 Endora 7.86 7.77 7.88 7 Ozma 7.82 7.80 7.87 8 Morticia 7.90 7.84 7.72 9 Blue Devil 7.19 7.51 7.81 10 Goblin 7.07 7.29 7.05 # … with 41 more rows ``` --- # Трансформировать несколько столбцов ```r aye %>% select(name, matches("weight")) %>% * mutate(across(where(is.numeric), ~ .x/1000)) ``` ``` # A tibble: 51 × 4 name weight_1 weight_2 weight_3 <chr> <dbl> <dbl> <dbl> 1 Nosferatu 2.86 2.50 2.93 2 Poe 2.7 2.61 2.68 3 Samantha 2.24 2.36 2.42 4 Annabel Lee 0.944 1.18 1.69 5 Mephistopheles 2.76 2.52 2.62 6 Endora 2.6 2.36 2.64 7 Ozma 2.5 2.44 2.62 8 Morticia 2.7 2.55 2.26 9 Blue Devil 1.33 1.82 2.46 10 Goblin 1.18 1.46 1.15 # … with 41 more rows ``` --- # Трансформировать несколько столбцов ```r aye %>% select(name, matches("weight")) %>% * mutate(across(where(is.numeric), list(kg = ~ .x/1000))) ``` ``` # A tibble: 51 × 7 name weight_1 weight_2 weight_3 weight_1_kg weight_2_kg weight_3_kg <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Nosferatu 2860 2505 2930 2.86 2.50 2.93 2 Poe 2700 2610 2680 2.7 2.61 2.68 3 Samantha 2242 2360 2415 2.24 2.36 2.42 4 Annabel Lee 944 1180 1689 0.944 1.18 1.69 5 Mephistopheles 2760 2520 2620 2.76 2.52 2.62 6 Endora 2600 2360 2645 2.6 2.36 2.64 7 Ozma 2500 2440 2620 2.5 2.44 2.62 8 Morticia 2700 2550 2255 2.7 2.55 2.26 9 Blue Devil 1330 1820 2460 1.33 1.82 2.46 10 Goblin 1180 1460 1150 1.18 1.46 1.15 # … with 41 more rows ``` --- # `summarise` по нескольким столбцам ```r aye %>% select(name, matches("weight")) %>% * summarise(across(where(is.numeric), list(kg = ~ .x/1000), .names = "{.col} KG")) ``` ``` # A tibble: 51 × 3 `weight_1 KG` `weight_2 KG` `weight_3 KG` <dbl> <dbl> <dbl> 1 2.86 2.50 2.93 2 2.7 2.61 2.68 3 2.24 2.36 2.42 4 0.944 1.18 1.69 5 2.76 2.52 2.62 6 2.6 2.36 2.64 7 2.5 2.44 2.62 8 2.7 2.55 2.26 9 1.33 1.82 2.46 10 1.18 1.46 1.15 # … with 41 more rows ``` --- # `summarise` по нескольким столбцам с группировкой ```r aye %>% select(sex, birth_type, matches("weight")) %>% * group_by(sex, birth_type) %>% summarise(across(where(is.numeric), mean, na.rm = TRUE)) ``` ``` # A tibble: 5 × 5 # Groups: sex [3] sex birth_type weight_1 weight_2 weight_3 <chr> <chr> <dbl> <dbl> <dbl> 1 F captive 1604. 1490. 1722. 2 F wild 2510. 2428. 2484. 3 M captive 1893. 1589. 1410. 4 M wild 2773. 2545 2743. 5 NA captive NaN NaN NaN ``` --- # `summarise` по нескольким столбцам с группировкой...
```r aye %>% select(sex, birth_type, matches("weight")) %>% * group_by(sex, birth_type) %>% summarise(across(where(is.numeric), mean, na.rm = TRUE)) %>% ungroup() %>% drop_na(sex) %>% rowwise() %>% mutate(avg_weight = mean(c_across(where(is.numeric)), na.rm = TRUE)) ``` ``` # A tibble: 4 × 6 # Rowwise: sex birth_type weight_1 weight_2 weight_3 avg_weight <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 F captive 1604. 1490. 1722. 1605. 2 F wild 2510. 2428. 2484. 2474. 3 M captive 1893. 1589. 1410. 1631. 4 M wild 2773. 2545 2743. 2687. ``` --- # `filter` по нескольким столбцам ```r aye %>% select(name, matches("weight")) %>% * filter(across(where(is.numeric), ~ . > 2500)) ``` ``` # A tibble: 3 × 4 name weight_1 weight_2 weight_3 <chr> <dbl> <dbl> <dbl> 1 Nosferatu 2860 2505 2930 2 Poe 2700 2610 2680 3 Mephistopheles 2760 2520 2620 ``` --- # Полезные функции из tidyr
- `unite` ```r aye %>% select(animal_id:species) %>% unite(col = "name & ID", name, animal_id, sep = " & ") ``` ``` # A tibble: 51 × 3 `name & ID` taxonomic_code species <chr> <chr> <chr> 1 Nosferatu & 6201 DMAD Aye-aye 2 Poe & 6202 DMAD Aye-aye 3 Samantha & 6261 DMAD Aye-aye 4 Annabel Lee & 6262 DMAD Aye-aye 5 Mephistopheles & 6451 DMAD Aye-aye 6 Endora & 6452 DMAD Aye-aye 7 Ozma & 6453 DMAD Aye-aye 8 Morticia & 6454 DMAD Aye-aye 9 Blue Devil & 6480 DMAD Aye-aye 10 Goblin & 6514 DMAD Aye-aye # … with 41 more rows ``` --- # Полезные функции из tidyr
- `separate` ```r aye %>% select(animal_id:species) %>% separate(col = species, into = c("Aye 1", "Aye 2"), sep = "-") ``` ``` # A tibble: 51 × 5 animal_id name taxonomic_code `Aye 1` `Aye 2` <dbl> <chr> <chr> <chr> <chr> 1 6201 Nosferatu DMAD Aye aye 2 6202 Poe DMAD Aye aye 3 6261 Samantha DMAD Aye aye 4 6262 Annabel Lee DMAD Aye aye 5 6451 Mephistopheles DMAD Aye aye 6 6452 Endora DMAD Aye aye 7 6453 Ozma DMAD Aye aye 8 6454 Morticia DMAD Aye aye 9 6480 Blue Devil DMAD Aye aye 10 6514 Goblin DMAD Aye aye # … with 41 more rows ``` --- # Что почитать про tidyr и продвинутый dplyr - [tidyr cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/tidyr.pdf) - [dplyr cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-transformation.pdf) - [Tidy data Chapter in R4DS](https://r4ds.had.co.nz/tidy-data.html) - `?across` и прочие хелпы...