提问者:小点点

如何在不进行繁琐转换的情况下跨这些列应用函数?


我有一个数据框架(如下),我想按列总结。

sample <- tibble(Scenario = c("Aggressive","Aggressive","Conservative","Aggressive","Likely","Aggressive","Conservative","Likely","Likely","Aggressive","Conservative","Conservative"),
           `Jan 2022` = c(5.5,15,15.77,45.2,NA,NA,NA,NA,NA,NA,NA,NA),
           `Feb 2022` = c(NA,NA,NA,NA,20.5,11.1,14.4,55.5,NA,NA,NA,NA),
           `Mar 2022` = c(NA,NA,NA,NA,NA,NA,NA,NA,88.5,9.5,18.9,25.5))

输出应该是这样的:

# A tibble: 3 × 4
# Groups:   Scenario [3]
  Scenario     `Feb 2022` `Jan 2022` `Mar 2022`
  <chr>             <dbl>      <dbl>      <dbl>
1 Aggressive         11.1       65.7        9.5
2 Conservative       14.4       15.8       44.4
3 Likely             76          0         88.5

下面是我用来得到这个输出的代码。如您所见,我使用pivot_longer,然后应用我的group_by总结来获得所需的输出。然后我使用pivot_wider将其恢复为所需的宽格式。

sample %>% 
  pivot_longer(cols = c(`Jan 2022`:`Mar 2022`), names_to = "Date", values_to = "Hours") %>% 
  group_by(Scenario, Date) %>% 
  summarise(Hours = sum(Hours, na.rm = T)) %>% 
  pivot_wider(names_from = Date, values_from = Hours)

我希望找到一种更有效的方法来做到这一点,而不需要使用pivot_longer。我尝试在原始数据帧上运行以下代码,但显然,它不能按预期工作:

    sample %>%
  group_by(Scenario) %>%
  summarise(Hours = lapply(X = c(`Jan 2022`:`Mar 2022`), FUN = function(x){sum(x, na.rm = T)}))

以下是我收到的一些警告和错误:

 Error: Problem with `summarise()` column `Hours`.
ℹ `Hours = lapply(...)`.
x NA/NaN argument
ℹ The error occurred in group 1: Scenario = "Aggressive".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In `Jan 2022`:`Mar 2022` :
  numerical expression has 5 elements: only the first used
2: In `Jan 2022`:`Mar 2022` :
  numerical expression has 5 elements: only the first used

我认为有一种方法可以通过应用函数来做到这一点,但我愿意接受任何建议。所需的代码行越少越好。


共3个答案

匿名用户

使用tidyverse,它是循环遍历列,而不是l应用

library(dplyr)
sample %>%
   group_by(Scenario) %>%
   summarise(across(where(is.numeric), sum, na.rm = TRUE), .groups = 'drop')

-输出

# A tibble: 3 × 4
  Scenario     `Jan 2022` `Feb 2022` `Mar 2022`
  <chr>             <dbl>      <dbl>      <dbl>
1 Aggressive         65.7       11.1        9.5
2 Conservative       15.8       14.4       44.4
3 Likely              0         76         88.5

匿名用户

使用data. table,您可以这样做:

data.table::setDT(sample)[, lapply(.SD, sum, na.rm=T), by=Scenario]

输出:

       Scenario Jan 2022 Feb 2022 Mar 2022
1:   Aggressive    65.70     11.1      9.5
2: Conservative    15.77     14.4     44.4
3:       Likely     0.00     76.0     88.5

匿名用户

附加解决方案选项

data. table

library(data.table)

setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = Scenario, .SDcols = is.numeric]

       Scenario Jan 2022 Feb 2022 Mar 2022
1:   Aggressive    65.70     11.1      9.5
2: Conservative    15.77     14.4     44.4
3:       Likely     0.00     76.0     88.5