提问者:小点点

为什么错误"找不到编码器的类型存储在数据集"时编码JSON使用案例类?


我写了火花作业:

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val ctx = new org.apache.spark.sql.SQLContext(sc)
    import ctx.implicits._

    case class Person(age: Long, city: String, id: String, lname: String, name: String, sex: String)
    case class Person2(name: String, age: Long, city: String)

    val persons = ctx.read.json("/tmp/persons.json").as[Person]
    persons.printSchema()
  }
}

IDE当我运行main函数时,出现2个错误:

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.
    val persons = ctx.read.json("/tmp/persons.json").as[Person]
                                                                  ^

Error:(15, 67) not enough arguments for method as: (implicit evidence$1: org.apache.spark.sql.Encoder[Person])org.apache.spark.sql.Dataset[Person].
Unspecified value parameter evidence$1.
    val persons = ctx.read.json("/tmp/persons.json").as[Person]
                                                                  ^

但是在Spark Shell中,我可以毫无错误地运行此作业。问题是什么?


共3个答案

匿名用户

错误消息说Encoder无法采用Person案例类。

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.

将case类的声明移动到SimpleApp的范围之外。

匿名用户

如果您在SimpleApp中添加sqlContext. int._和_,也会出现同样的错误(顺序无关紧要)。

删除一个或另一个将是解决方案:

val spark = SparkSession
  .builder()
  .getOrCreate()

val sqlContext = spark.sqlContext
import sqlContext.implicits._ //sqlContext OR spark implicits
//import spark.implicits._ //sqlContext OR spark implicits

case class Person(age: Long, city: String)
val persons = ctx.read.json("/tmp/persons.json").as[Person]

使用Spark 2.1.0测试

有趣的是,如果你添加相同的对象两次,你就不会有问题。

匿名用户

@Milad Khajavi

在对象SimpleApp之外定义Person case类。此外,在main()函数中添加import sqlContext. implitings。_。