只更改和保存文件中的一行

提问者：小点点

只更改和保存文件中的一行

我想知道R中是否有允许我更新文件而不是保存所有数据的东西。

可能有类似于sqldf:：read的东西。csv。用于保存的sql。

行啊

假设我把虹膜数据存储为。csv:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa

但我意识到，第二朵花是维吉尼亚，所以我想把第二排换成：

2          4.9         3.0          1.4         0.2  virginica

我知道，我可以读取文件、更改种类并再次保存，但行数越多（即。

共1个答案

匿名用户

一般来说，R实际上并不适用于就地文件编辑，而且我知道没有（目前可用的）工具在任何上下文中支持它。即使是像sed这样的unixy工具也可以进行快速编辑，但在技术上仍然无法“就地”进行编辑（即使它隐藏了如何进行编辑）。（可能会有一些这样做，但可能不会像您希望的那样容易访问。）

有一个明显的例外，文件格式是为就地编辑（嗯，交互）而设计的。它包括重要的就地添加、过滤、替换和删除运算符。在大多数情况下，它通常不需要在这样做的同时增加文件大小。这是SQLite。

例如

library(DBI)
# library(RSQLite) # don't need to load it, just need to have it available
fname <- "./iris.sqlite3"
con <- dbConnect(RSQLite::SQLite(), fname)
file.info(fname)$size
# [1] 0
dbWriteTable(con, "iris", iris)
# [1] TRUE
file.info(fname)$size
# [1] 16384
dbGetQuery(con, "select * from iris where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          4.7         3.2          1.6         0.2  setosa
file.info(fname)$size
# [1] 16384
dbExecute(con, "update iris set [Species]='virginica' where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
# [1] 1
dbGetQuery(con, "select * from iris where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
#   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 1          4.7         3.2          1.6         0.2 virginica
dbDisconnect(con)
file.info(fname)$size
# [1] 16384

它是跨平台的。它是Firefox浏览器和Android操作系统所必需的内部组件，比大多数人所意识到的更为丰富。（还有许多其他人。）
此外，驱动程序存在于大多数编程语言中，包括R、python、ruby，还有许多无法在此列出的驱动程序
对于单个SQLite文件中可以存储的数据量实际上没有实际限制。理论上它最多支持140TB(https://www.sqlite.org/whentouse.html)，不过，如果你的规模如此之大，有许多（合理的）理由支持不同的解决方案
拉取数据是建立在SQL标准之上的，虽然它不是100%兼容的，但它非常接近。查询时间/性能取决于您的查询大小，但通常相当快（参考：如果数据库大小大于2GB，SQLite性能会下降吗？）
事实上，它可以比单个文件操作更快

文件大小将有“开销”。值得注意的是iris占用的内存不足7K（请参见object.size（iris）），但文件大小从16K开始。对于较大的数据，间隙比率（文件大小与实际数据之比）将缩小。（我对ggplot2:：diamonds做了同样的事情；对象是3456376字节，文件大小是3780608，比原来大不到10%）

如果要导入所有文件，可以在导入时将其视为文件：

con <- dbConnect(RSQLite::SQLite(), fname)
iris2 <- dbGetQuery(con, "select * from iris")
dbDisconnect(con)

与相比

iris2 <- read.csv("iris.csv", stringsAsFactors = FALSE)

如果你想变得花哨：

import_sqlite <- function(fname, tablename = NA) {
  if (length(tablename) > 1L) {
    warning("the condition has length > 1 and only the first element will be used")
    tablename <- tablename[[1L]]
  }
  con <- DBI::dbConnect(RSQLite::SQLite(), fname)
  on.exit(DBI::dbDisconnect(con), add = TRUE)
  available_tables <- DBI::dbListTables(con)
  if (length(available_tables) == 0L) {
    stop("no tables found")
  } else if (is.na(tablename)) {
    if (length(available_tables) == 1L) {
      tablename <- available_tables
    }
  }
  if (tablename %in% available_tables) {
    tablename <- DBI::dbQuoteIdentifier(con, tablename)
    qry <- sprintf("select * from %s", tablename)
    out <- tryCatch(list(data = DBI::dbGetQuery(con, DBI::SQL(qry)),
                         err = NULL),
                    error = function(e) list(data = NULL, err = e))
    if (! is.null(out$err)) {
      stop("[sqlite error] ", out$err$message)
    } else {
      return(out$data)
    }    
  } else {
    stop(sprintf("table %s not found", DBI::dbQuoteIdentifier(con, tablename)))
  }
}
head(import_sqlite("iris.sqlite3"))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

（我提供的功能只是一个概念证明，您可以像与CSV文件交互一样与单个文件交互。其中有一些保护措施，但实际上只是为了解决这个问题而进行的黑客攻击。）

只更改和保存文件中的一行

共1个答案

相关问题

热门标签

只更改和保存文件中的一行

共1个答案

相关问题

热门标签

微信关注