Язык R и его применение в биоинформатике

class: center, middle, inverse, title-slide

# Язык R и его применение в биоинформатике
### Анастасия Жарикова, Анна Валяева
### 24.09.2021

---

# Функции

- Если вы заметили, что несколько раз используете один и тот же код, то запишите его в функцию.

```r
add_ten <- function(x){
  x + 10
}

add_ten(32)
```

```
[1] 42
```

---
# Взглянуть на код функции

И на свою функцию посмотреть:

```r
add_ten
```

```
function(x){
  x + 10
}
```

И на чужую:

```r
xor
```

```
function (x, y) 
{
    (x | y) & !(x & y)
}
<bytecode: 0x55f4f02562b0>
<environment: namespace:base>
```

---
# Return

- Функция возвращает результат последнего выражения либо то, что указано как `return(...)`.

```r
add_ten_or_not <- function(x){
  return(x + 10)
  x + 100
}

add_ten_or_not(10)
```

```
[1] 20
```

---
.pull-left[
## Implicit return

```r
check_sign_i <- function(x){
  # check if x is positive
  if (x > 0){
    "positive"
  }
  # check if x is negative
  else if (x < 0){
    "negative"
  }
  # check if x is not positive nor negative
  else{
    "zero"
  }
}

check_sign_i(10)
```

```
[1] "positive"
```
]

.pull-right[
## Explicit return

```r
check_sign_e <- function(x){
  # check if x is positive
  if (x > 0){
    return("positive")
  }
  # check if x is negative
  else if (x < 0){
    return("negative")
  }
  # check if x is not positive nor negative
  else{
    return("zero")
  }
}

check_sign_e(10)
```

```
[1] "positive"
```
]

---
# Multiple returns

- Чтобы функция возвращала несколько объектов, нужно эти объекты возвращать в виде `list`.

```r
return_two_and_four <- function(){
  list(2, 4)
}

return_two_and_four()
```

```
[[1]]
[1] 2

[[2]]
[1] 4
```

---
# Локальные переменные

```r
x <- 1000

add_ten <- function(x){
  x + 10
}

add_ten(32)
```

```
[1] 42
```

```r
x
```

```
[1] 1000
```

---
# Зарезервированные названия переменных

Такие названия использовать не стоит.

```r
c <- 10

T <-  FALSE

mean <- function(x) sum(x)
```

---
# Значение аргумента по умолчанию

```r
add_number <- function(x, val = 0){
  x + val
}

add_number(10)
```

```
[1] 10
```

```r
add_number(10, 5)
```

```
[1] 15
```

---
# Избегайте ошибок

```r
add_number("two")
```

```
Error in x + val: non-numeric argument to binary operator
```

```r
add_number <- function(x, val = 0){
  if (!is.numeric(x)){
    stop("`x` must be a number")
  }
  
  x + val
}

add_number("two")
```

```
Error in add_number("two"): `x` must be a number
```

---
# И избегайте циклов!

- Если нужно применить функцию к элементам некоторого списка, то используйте `map`

```r
# Вместо
for (i in 1:3){
  f(i)
}

# или
list(f(1), f(2), f(3))

# нужно всего лишь...
map(1:3, f)
```

--
.pull-left[

```r
library(purrr)
# library(tidyverse)
```
]

.pull-right[
<img src="img/2021-10-01/purrr.png" width="35%" style="display: block; margin: auto;" />
]

---
# Семейство функций `map`

.pull-left[

```r
cube <- function(x) x * 3

map(1:3, cube)
```

```
[[1]]
[1] 3

[[2]]
[1] 6

[[3]]
[1] 9
```
]

--
.pull-right[
<img src="img/2021-10-01/map.png" width="100%" style="display: block; margin: auto;" />
]

---
# Разные `map_`

- Простой `map()` всегда возвращает list.
- Если вы уверены, что ваш результат подходит под определение вектора (данные одного типа), используйте `map_`:

- `map_chr`
  - `map_lgl`
  - `map_int`
  - `map_dbl`

```r
map_dbl(1:3, cube)
```

```
[1] 3 6 9
```

---
# Разные `map_`

```r
map_chr(mtcars, typeof)
```

```
     mpg      cyl     disp       hp     drat       wt     qsec       vs 
"double" "double" "double" "double" "double" "double" "double" "double" 
      am     gear     carb 
"double" "double" "double" 
```

```r
map_lgl(mtcars, is.double)
```

```
 mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 
```

```r
map_int(mtcars, n_distinct)
```

```
 mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
  25    3   27   22   22   29   30    2    2    3    6 
```

---
# Anonymous functions / lambda functions

```r
# Если забыли про n_distinct из dplyr:
map_dbl(mtcars, function(x) length(unique(x)))
```

```
 mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
  25    3   27   22   22   29   30    2    2    3    6 
```

```r
# Лень писать так много. Так проще:
map_dbl(mtcars, ~ length(unique(.x)))
```

```
 mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
  25    3   27   22   22   29   30    2    2    3    6 
```

---
# `map_df`

- `map_dfr` - это `map()` + `bind_rows()`
- `map_dfc` - это `map()` + `bind_cols()`

```r
input_files <- c("file1.csv", "file2.csv", "file3.csv")
```
--
.pull-left[

```r
file1 <- read_csv("file1.csv")
file2 <- read_csv("file2.csv")
file3 <- read_csv("file3.csv")

file <- bind_rows(file1, file2, file3)
```
]

--
.pull-right[

```r
file <- map_dfr(input_files, read_csv)
```
]

---
# `map` ❤️ списки

```r
x <- list(
  list(-1, x = 1, y = c(2), z = "a"),
  list(-2, x = 4, y = c(5, 6), z = "b"),
  list(-3, x = 8, y = c(9, 10, 11)))

map_dbl(x, "x") # по имени элемента
```

```
[1] 1 4 8
```

```r
map_dbl(x, list("y", 1)) # по позиции
```

```
[1] 2 5 9
```

```r
map_chr(x, "z", .default = NA) 
```

```
[1] "a" "b" NA 
```

---
# Аргументы функции

.pull-left[

```r
x <- list(1:5, c(1:10, NA))

# Не очень
map_dbl(x, ~ mean(.x, na.rm = TRUE))
```

```
[1] 3.0 5.5
```

```r
# Получше
map_dbl(x, mean, na.rm = TRUE)
```

```
[1] 3.0 5.5
```
]

--
.pull-right[
<img src="img/2021-10-01/map-arg.png" width="100%" style="display: block; margin: auto;" />

]

---
# `map2`

- Если нужно итерировать и по элементам списка, и по вектору аргумента функции.

```r
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)

map2_dbl(xs, ws, weighted.mean, na.rm = TRUE)
```

```
[1] 0.3485683 0.4950948 0.5957083 0.6036551 0.4957436 0.5629462 0.4741791
[8] 0.3276506
```

--
<img src="img/2021-10-01/map2-arg.png" width="60%" style="display: block; margin: auto;" />

---
# `pmap`

- Когда не хватает `map2`, а нужен `map3` или даже `map4`...
- Нужно подать список всех аргументов функции.
- `map2(x, y, f)` - то же, что и `pmap(list(x, y), f)`.

```r
pmap_dbl(list(xs, ws), weighted.mean)
```

```
[1] 0.3485683 0.4950948 0.5957083 0.6036551 0.4957436 0.5629462 0.4741791
[8] 0.3276506
```

---
# `imap`

- `imap(x, f)` - то же, что и `map2(x, seq_along(x), f)` или `map2(x, names(x), f)`

```r
# .y - это название элемента списка
# .x - элемент списка
x <- map(1:6, ~ sample(1000, 10))
imap_chr(x, ~ paste0("The highest value of ", .y, " is ", max(.x)))
```

```
[1] "The highest value of 1 is 987" "The highest value of 2 is 742"
[3] "The highest value of 3 is 810" "The highest value of 4 is 980"
[5] "The highest value of 5 is 958" "The highest value of 6 is 990"
```

---
# `walk`

.pull-left[

```r
ggplots <- list(gg1, gg2, gg3)
output_files <- c("plot1.png", "plot2.png", "plot3.png")

walk2(output_files, ggplots, ggsave)
```
]

--
.pull-right[
<img src="img/2021-10-01/walk.png" width="50%" style="display: block; margin: auto;" />

]

---

# Отслеживать и ловить ошибки <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#b74f6f;overflow:visible;position:relative;"><path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"/></svg>

- `safely()` возвращает список из двух элементов: 
  - `result` нужный результат или NULL если была ошибка,
  - `error` error object или NULL, если ошибки не было.

- `possibly()` позволяет использовать default value, если возникает ошибка.

- `quietly()` выдает result, output, messages и warnings.

---

# Что почитать про функции и purrr

- [Functions Chapter in R4DS](https://r4ds.had.co.nz/functions.html)

- [Functions Chapter in Advanced R](https://adv-r.hadley.nz/functions.html)

- [purrr cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/purrr.pdf)

- [Functional Chapter in Advanced R](https://adv-r.hadley.nz/functionals.html)

- [purrr tutorial by Jenny Bryan](https://jennybc.github.io/purrr-tutorial/index.html)

- Картинки, иллюстрирующие принцип работы функций из **purrr**, взяты из 'Advanced R' by Hadley Wickham.

---

# Объединение таблиц

Датасет про лемуров ай-ай. Первая таблица - общая информация о лемурах, проживающих или проживавших в Duke Lemur Center.

```r
aye_lemurs <- read_csv("data/2021-10-01/aye_info.csv")

str(aye_lemurs)
```

```
spec_tbl_df [51 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ animal_id     : num [1:51] 6201 6202 6261 6262 6451 ...
 $ name          : chr [1:51] "Nosferatu" "Poe" "Samantha" "Annabel Lee" ...
 $ taxonomic_code: chr [1:51] "DMAD" "DMAD" "DMAD" "DMAD" ...
 $ species       : chr [1:51] "Aye-aye" "Aye-aye" "Aye-aye" "Aye-aye" ...
 $ sex           : chr [1:51] "M" "M" "F" "F" ...
 $ birth_date    : Date[1:51], format: "1985-12-18" "1986-12-19" ...
 $ birth_type    : chr [1:51] "wild" "wild" "wild" "captive" ...
 $ litter_size   : num [1:51] NA NA NA NA NA NA NA NA 1 1 ...
 $ death_date    : Date[1:51], format: "2018-07-30" NA ...
 - attr(*, "spec")=
  .. cols(
  ..   animal_id = col_double(),
  ..   name = col_character(),
  ..   taxonomic_code = col_character(),
  ..   species = col_character(),
  ..   sex = col_character(),
  ..   birth_date = col_date(format = ""),
  ..   birth_type = col_character(),
  ..   litter_size = col_double(),
  ..   death_date = col_date(format = "")
  .. )
 - attr(*, "problems")=<externalptr> 
```

---
# Объединение таблиц

Вторая таблица - результаты трёх взвешиваний лемуров.

```r
lemur_weights <- read_csv("data/2021-10-01/lemurs_weights.csv")

lemur_weights
```

```
# A tibble: 2,270 × 4
   animal_id weight_1 weight_2 weight_3
       <dbl>    <dbl>    <dbl>    <dbl>
 1         5     1190     1190     1086
 2         6     1174      947       NA
 3         9      899      910      899
 4        10     1185     1236       NA
 5        14      897      968       NA
 6        17      848      812       NA
 7        18     1074     1054       NA
 8        21     1048     1048     1004
 9        23      805      915      805
10        24      702      646      702
# … with 2,260 more rows
```

---
# Объединение таблиц

```r
aye <- aye_lemurs %>% 
  left_join(lemur_weights, by = "animal_id")

str(aye)
```

```
spec_tbl_df [51 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ animal_id     : num [1:51] 6201 6202 6261 6262 6451 ...
 $ name          : chr [1:51] "Nosferatu" "Poe" "Samantha" "Annabel Lee" ...
 $ taxonomic_code: chr [1:51] "DMAD" "DMAD" "DMAD" "DMAD" ...
 $ species       : chr [1:51] "Aye-aye" "Aye-aye" "Aye-aye" "Aye-aye" ...
 $ sex           : chr [1:51] "M" "M" "F" "F" ...
 $ birth_date    : Date[1:51], format: "1985-12-18" "1986-12-19" ...
 $ birth_type    : chr [1:51] "wild" "wild" "wild" "captive" ...
 $ litter_size   : num [1:51] NA NA NA NA NA NA NA NA 1 1 ...
 $ death_date    : Date[1:51], format: "2018-07-30" NA ...
 $ weight_1      : num [1:51] 2860 2700 2242 944 2760 ...
 $ weight_2      : num [1:51] 2505 2610 2360 1180 2520 ...
 $ weight_3      : num [1:51] 2930 2680 2415 1689 2620 ...
 - attr(*, "spec")=
  .. cols(
  ..   animal_id = col_double(),
  ..   name = col_character(),
  ..   taxonomic_code = col_character(),
  ..   species = col_character(),
  ..   sex = col_character(),
  ..   birth_date = col_date(format = ""),
  ..   birth_type = col_character(),
  ..   litter_size = col_double(),
  ..   death_date = col_date(format = "")
  .. )
 - attr(*, "problems")=<externalptr> 
```
---
# Объединение таблиц <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#b74f6f;overflow:visible;position:relative;"><path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"/></svg>

- `left_join`
- `right_join`
- `inner_join`
- `full_join`

<br>

- `semi_join`
- `anti_join`

---
# А если таблиц много?

- `reduce()` берет на вход вектор длины n и возвращает вектор длины 1, применяя функцию к элементам вектора попарно:
  - `reduce(1:4, f)` равно `f(f(f(1, 2), 3), 4)`
  
.pull-left[

```r
df <- reduce(
  list(df1, df2, df3, df4), 
  left_join, 
  by = "ID")
```
]

.pull-right[
<img src="img/2021-10-01/reduce-arg.png" width="80%" style="display: block; margin: auto;" />

]

---
# Трансформация таблиц
## Каждый лемур - сам себе группа.

Задача: по 3 взвешиваниям посчитать средний вес каждого лемура.

```r
aye %>% 
  select(name, matches("weight")) %>% 
  mutate(avg_weight = (weight_1 + weight_2 + weight_3) / 3) %>% 
  head(5)
```

```
# A tibble: 5 × 5
  name           weight_1 weight_2 weight_3 avg_weight
  <chr>             <dbl>    <dbl>    <dbl>      <dbl>
1 Nosferatu          2860     2505     2930      2765 
2 Poe                2700     2610     2680      2663.
3 Samantha           2242     2360     2415      2339 
4 Annabel Lee         944     1180     1689      1271 
5 Mephistopheles     2760     2520     2620      2633.
```

---
# Трансформация таблиц
## Каждый лемур - сам себе группа.

Задача: по 3 взвешиваниям посчитать средний вес каждого лемура.

```r
aye %>% 
  select(name, matches("weight")) %>% 
* rowwise() %>%
  mutate(avg_weight = mean(c(weight_1, weight_2, weight_3), na.rm = TRUE)) %>% 
  head(5)
```

```
# A tibble: 5 × 5
# Rowwise: 
  name           weight_1 weight_2 weight_3 avg_weight
  <chr>             <dbl>    <dbl>    <dbl>      <dbl>
1 Nosferatu          2860     2505     2930      2765 
2 Poe                2700     2610     2680      2663.
3 Samantha           2242     2360     2415      2339 
4 Annabel Lee         944     1180     1689      1271 
5 Mephistopheles     2760     2520     2620      2633.
```

---
# Трансформация таблиц
## Каждый лемур - сам себе группа.

Задача: по 3 взвешиваниям посчитать средний вес каждого лемура.

```r
aye %>% 
  select(name, matches("weight")) %>% 
  rowwise() %>% 
  mutate(avg_weight = mean(c_across(where(is.numeric)), na.rm = TRUE)) %>% 
  head(5)
```

---
# Выбор столбцов
Функции tidyselect:

- `starts_with()`
- `ends_with()`
- `contains()`
- `matches()`
- `num_range()`

<br>

- `one_of(col_names_vec)`
- `where(is.numeric)`

Поменять порядок столбцов:

```r
aye %>% 
  select(taxonomic_code, name, everything())
```

---
# Трансформировать сразу все столбцы

```r
aye %>%  
* mutate(across(everything(), tolower)) %>%
  str()
```

```
spec_tbl_df [51 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ animal_id     : chr [1:51] "6201" "6202" "6261" "6262" ...
 $ name          : chr [1:51] "nosferatu" "poe" "samantha" "annabel lee" ...
 $ taxonomic_code: chr [1:51] "dmad" "dmad" "dmad" "dmad" ...
 $ species       : chr [1:51] "aye-aye" "aye-aye" "aye-aye" "aye-aye" ...
 $ sex           : chr [1:51] "m" "m" "f" "f" ...
 $ birth_date    : chr [1:51] "1985-12-18" "1986-12-19" "1978-08-15" "1988-05-24" ...
 $ birth_type    : chr [1:51] "wild" "wild" "wild" "captive" ...
 $ litter_size   : chr [1:51] NA NA NA NA ...
 $ death_date    : chr [1:51] "2018-07-30" NA "1995-12-20" "1989-12-09" ...
 $ weight_1      : chr [1:51] "2860" "2700" "2242" "944" ...
 $ weight_2      : chr [1:51] "2505" "2610" "2360" "1180" ...
 $ weight_3      : chr [1:51] "2930" "2680" "2415" "1689" ...
 - attr(*, "spec")=
  .. cols(
  ..   animal_id = col_double(),
  ..   name = col_character(),
  ..   taxonomic_code = col_character(),
  ..   species = col_character(),
  ..   sex = col_character(),
  ..   birth_date = col_date(format = ""),
  ..   birth_type = col_character(),
  ..   litter_size = col_double(),
  ..   death_date = col_date(format = "")
  .. )
 - attr(*, "problems")=<externalptr> 
```

---
# Трансформировать несколько столбцов

```r
aye %>%  
* mutate(across(c("name", "taxonomic_code"), tolower)) %>%
  str()
```

```
spec_tbl_df [51 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ animal_id     : num [1:51] 6201 6202 6261 6262 6451 ...
 $ name          : chr [1:51] "nosferatu" "poe" "samantha" "annabel lee" ...
 $ taxonomic_code: chr [1:51] "dmad" "dmad" "dmad" "dmad" ...
 $ species       : chr [1:51] "Aye-aye" "Aye-aye" "Aye-aye" "Aye-aye" ...
 $ sex           : chr [1:51] "M" "M" "F" "F" ...
 $ birth_date    : Date[1:51], format: "1985-12-18" "1986-12-19" ...
 $ birth_type    : chr [1:51] "wild" "wild" "wild" "captive" ...
 $ litter_size   : num [1:51] NA NA NA NA NA NA NA NA 1 1 ...
 $ death_date    : Date[1:51], format: "2018-07-30" NA ...
 $ weight_1      : num [1:51] 2860 2700 2242 944 2760 ...
 $ weight_2      : num [1:51] 2505 2610 2360 1180 2520 ...
 $ weight_3      : num [1:51] 2930 2680 2415 1689 2620 ...
 - attr(*, "spec")=
  .. cols(
  ..   animal_id = col_double(),
  ..   name = col_character(),
  ..   taxonomic_code = col_character(),
  ..   species = col_character(),
  ..   sex = col_character(),
  ..   birth_date = col_date(format = ""),
  ..   birth_type = col_character(),
  ..   litter_size = col_double(),
  ..   death_date = col_date(format = "")
  .. )
 - attr(*, "problems")=<externalptr> 
```

---
# Трансформировать несколько столбцов

```r
aye %>%  
  select(name, matches("weight")) %>% 
* mutate(across(where(is.numeric), log))
```

```
# A tibble: 51 × 4
   name           weight_1 weight_2 weight_3
   <chr>             <dbl>    <dbl>    <dbl>
 1 Nosferatu          7.96     7.83     7.98
 2 Poe                7.90     7.87     7.89
 3 Samantha           7.72     7.77     7.79
 4 Annabel Lee        6.85     7.07     7.43
 5 Mephistopheles     7.92     7.83     7.87
 6 Endora             7.86     7.77     7.88
 7 Ozma               7.82     7.80     7.87
 8 Morticia           7.90     7.84     7.72
 9 Blue Devil         7.19     7.51     7.81
10 Goblin             7.07     7.29     7.05
# … with 41 more rows
```

---
# Трансформировать несколько столбцов

```r
aye %>%  
  select(name, matches("weight")) %>% 
* mutate(across(where(is.numeric), ~ .x/1000))
```

```
# A tibble: 51 × 4
   name           weight_1 weight_2 weight_3
   <chr>             <dbl>    <dbl>    <dbl>
 1 Nosferatu         2.86      2.50     2.93
 2 Poe               2.7       2.61     2.68
 3 Samantha          2.24      2.36     2.42
 4 Annabel Lee       0.944     1.18     1.69
 5 Mephistopheles    2.76      2.52     2.62
 6 Endora            2.6       2.36     2.64
 7 Ozma              2.5       2.44     2.62
 8 Morticia          2.7       2.55     2.26
 9 Blue Devil        1.33      1.82     2.46
10 Goblin            1.18      1.46     1.15
# … with 41 more rows
```

---
# Трансформировать несколько столбцов

```r
aye %>%  
  select(name, matches("weight")) %>% 
* mutate(across(where(is.numeric), list(kg = ~ .x/1000)))
```

```
# A tibble: 51 × 7
   name           weight_1 weight_2 weight_3 weight_1_kg weight_2_kg weight_3_kg
   <chr>             <dbl>    <dbl>    <dbl>       <dbl>       <dbl>       <dbl>
 1 Nosferatu          2860     2505     2930       2.86         2.50        2.93
 2 Poe                2700     2610     2680       2.7          2.61        2.68
 3 Samantha           2242     2360     2415       2.24         2.36        2.42
 4 Annabel Lee         944     1180     1689       0.944        1.18        1.69
 5 Mephistopheles     2760     2520     2620       2.76         2.52        2.62
 6 Endora             2600     2360     2645       2.6          2.36        2.64
 7 Ozma               2500     2440     2620       2.5          2.44        2.62
 8 Morticia           2700     2550     2255       2.7          2.55        2.26
 9 Blue Devil         1330     1820     2460       1.33         1.82        2.46
10 Goblin             1180     1460     1150       1.18         1.46        1.15
# … with 41 more rows
```

---
# `summarise` по нескольким столбцам

```r
aye %>%  
  select(name, matches("weight")) %>% 
* summarise(across(where(is.numeric), list(kg = ~ .x/1000), .names = "{.col} KG"))
```

```
# A tibble: 51 × 3
   `weight_1 KG` `weight_2 KG` `weight_3 KG`
           <dbl>         <dbl>         <dbl>
 1         2.86           2.50          2.93
 2         2.7            2.61          2.68
 3         2.24           2.36          2.42
 4         0.944          1.18          1.69
 5         2.76           2.52          2.62
 6         2.6            2.36          2.64
 7         2.5            2.44          2.62
 8         2.7            2.55          2.26
 9         1.33           1.82          2.46
10         1.18           1.46          1.15
# … with 41 more rows
```

---
# `summarise` по нескольким столбцам с группировкой

```r
aye %>%  
  select(sex, birth_type, matches("weight")) %>% 
* group_by(sex, birth_type) %>%
  summarise(across(where(is.numeric), mean, na.rm = TRUE))
```

```
# A tibble: 5 × 5
# Groups:   sex [3]
  sex   birth_type weight_1 weight_2 weight_3
  <chr> <chr>         <dbl>    <dbl>    <dbl>
1 F     captive       1604.    1490.    1722.
2 F     wild          2510.    2428.    2484.
3 M     captive       1893.    1589.    1410.
4 M     wild          2773.    2545     2743.
5 NA    captive        NaN      NaN      NaN 
```
---
# `summarise` по нескольким столбцам с группировкой... <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#b74f6f;overflow:visible;position:relative;"><path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"/></svg>

```r
aye %>%  
  select(sex, birth_type, matches("weight")) %>% 
* group_by(sex, birth_type) %>%
  summarise(across(where(is.numeric), mean, na.rm = TRUE)) %>% 
  ungroup() %>% drop_na(sex) %>% rowwise() %>% 
  mutate(avg_weight = mean(c_across(where(is.numeric)), na.rm = TRUE))
```

```
# A tibble: 4 × 6
# Rowwise: 
  sex   birth_type weight_1 weight_2 weight_3 avg_weight
  <chr> <chr>         <dbl>    <dbl>    <dbl>      <dbl>
1 F     captive       1604.    1490.    1722.      1605.
2 F     wild          2510.    2428.    2484.      2474.
3 M     captive       1893.    1589.    1410.      1631.
4 M     wild          2773.    2545     2743.      2687.
```

---
# `filter` по нескольким столбцам

```r
aye %>%  
  select(name, matches("weight")) %>% 
* filter(across(where(is.numeric), ~ . > 2500))
```

```
# A tibble: 3 × 4
  name           weight_1 weight_2 weight_3
  <chr>             <dbl>    <dbl>    <dbl>
1 Nosferatu          2860     2505     2930
2 Poe                2700     2610     2680
3 Mephistopheles     2760     2520     2620
```

---
# Полезные функции из tidyr <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#b74f6f;overflow:visible;position:relative;"><path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"/></svg>

- `unite`

```r
aye %>%  
  select(animal_id:species) %>%
  unite(col = "name & ID", name, animal_id, sep = " & ")
```

```
# A tibble: 51 × 3
   `name & ID`           taxonomic_code species
   <chr>                 <chr>          <chr>  
 1 Nosferatu & 6201      DMAD           Aye-aye
 2 Poe & 6202            DMAD           Aye-aye
 3 Samantha & 6261       DMAD           Aye-aye
 4 Annabel Lee & 6262    DMAD           Aye-aye
 5 Mephistopheles & 6451 DMAD           Aye-aye
 6 Endora & 6452         DMAD           Aye-aye
 7 Ozma & 6453           DMAD           Aye-aye
 8 Morticia & 6454       DMAD           Aye-aye
 9 Blue Devil & 6480     DMAD           Aye-aye
10 Goblin & 6514         DMAD           Aye-aye
# … with 41 more rows
```

- `separate`

```r
aye %>%  
  select(animal_id:species) %>%
  separate(col = species, into = c("Aye 1", "Aye 2"), sep = "-")
```

```
# A tibble: 51 × 5
   animal_id name           taxonomic_code `Aye 1` `Aye 2`
       <dbl> <chr>          <chr>          <chr>   <chr>  
 1      6201 Nosferatu      DMAD           Aye     aye    
 2      6202 Poe            DMAD           Aye     aye    
 3      6261 Samantha       DMAD           Aye     aye    
 4      6262 Annabel Lee    DMAD           Aye     aye    
 5      6451 Mephistopheles DMAD           Aye     aye    
 6      6452 Endora         DMAD           Aye     aye    
 7      6453 Ozma           DMAD           Aye     aye    
 8      6454 Morticia       DMAD           Aye     aye    
 9      6480 Blue Devil     DMAD           Aye     aye    
10      6514 Goblin         DMAD           Aye     aye    
# … with 41 more rows
```

---

# Что почитать про tidyr и продвинутый dplyr

- [tidyr cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/tidyr.pdf)

- [dplyr cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-transformation.pdf)

- [Tidy data Chapter in R4DS](https://r4ds.had.co.nz/tidy-data.html)

- `?across` и прочие хелпы...