Database ricks Delta Lake“合并模式”选项如何处理不同的数据类型？

提问者：小点点

Database ricks Delta Lake“合并模式”选项如何处理不同的数据类型？

如果预先存在的列附加了不同的数据类型，数据库三角洲湖合并模式选项会做什么？

例如，给定一个具有schemafooINT， barINT的Delta Lake表，当指定选项mergeSchema=true时，尝试使用schemafooINT，bar DOUBLE写入附加新数据时会发生什么？

共2个答案

匿名用户

写入失败。（截至Database ricks 6.3上的Delta Lake 0.5.0）

匿名用户

我想这就是你要找的。

import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};
import org.apache.spark.sql.functions.input_file_name

val customSchema = StructType(Array(
    StructField("field1", StringType, true),
    StructField("field2", StringType, true),
    StructField("field3", StringType, true),
    StructField("field4", StringType, true),
    StructField("field5", StringType, true),
    StructField("field6", StringType, true),
    StructField("field7", StringType, true)))

val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "false")
    .option("sep", "|")
    .schema(customSchema)
    .load("mnt/rawdata/corp/ABC*.gz")
    .withColumn("file_name", input_file_name())

只需将“field 1”、“field 2”等命名为您的实际字段名。此外，“ABC*. gz”会对以特定字符串开头的文件进行通配符搜索，例如“abc”或其他字符，以及“*”字符，这意味着反斜杠和“.gz”的任何字符组合，这意味着它是一个压缩文件。当然，你的可能会有所不同，所以只需更改该约定以满足您的特定需求。

Database ricks Delta Lake“合并模式”选项如何处理不同的数据类型？

共2个答案

相关问题

热门标签

Database ricks Delta Lake“合并模式”选项如何处理不同的数据类型？

共2个答案

相关问题

热门标签

微信关注