"为什么我的程序抛Null Point Exception"

"why NPE?"

Posted by jeanlyn on May 15, 2016

起因

前些天,一个同事问我,发了一段spark的业务代码给我,问, “为什么这段代码抛出了Null Point Exception”, 分析了一下后,觉得也挺有趣的,就发出来给大家吧.

代码以及分析

大概的代码如下

List<scala.collection.immutable.HashMap> retList = row.getList(0)
if(null == retList || retList.isEmpty()){ 
 ....
}

同事说retList.isEmpty()这里抛出了异常

我在看了一下具体的异常

java.lang.NullPointerException
	at scala.collection.convert.Wrappers$IterableWrapperTrait$class.isEmpty(Wrappers.scala:25)
	at scala.collection.convert.Wrappers$SeqWrapper.isEmpty(Wrappers.scala:64)
	at com.gf.spark.module.CustomerLoginProcess$2.call(CustomerLoginProcess.java:77)
	at com.gf.spark.module.CustomerLoginProcess$2.call(CustomerLoginProcess.java:70)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1108)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

其实真正抛出异常的是scala类库里面的isEmpty抛出来的.这时候我觉得需要看下spark里面的Row.getList是什么东西(spark-1.5)

def getList[T](i: Int): java.util.List[T] = {
  scala.collection.JavaConversions.seqAsJavaList(getSeq[T](i))
}

看到getList调用JavaConversions.seqAsJavaList返回new SeqWrapper(seq),所以当getSeq[T](i)返回为null的时候,getList返回的也并不是空(为SeqWrapper(null)),在调用isEmpty方法的时候就相当于null.isEmpty,这样抛出null point Exception也就不奇怪了.那么,这段代码的正确姿势是

if(!row.isNullAt(0)) {
    List<scala.collection.immutable.HashMap> restList = row.getList(0)
} else {
    .....
}