Saturday 15 August 2015

scala - Apache Spark: dealing with Option/Some/None in RDDs -


I am mapping on an Hibiz table, though an RHD element per Hbiz line, sometimes the row contains poor data (Throwing a NullPointerException into the parsing code), in that case I want to leave it.

I have my initial mapper a option to indicate that it returns 0 or 1 element, then filter for some , then Get the value contained:

  // myRDD RDD [(ImmutableBytesWritable, results)] Val output = myRDD map (Tupel => getData (Tupel._2)). Filters ({Cases some (Y) = & gt; true; case any = & gt; wrong}). Map (_.get) // better RDD operation with DF getData (R: Result) = {val key = r.getRow var id = "(unk)" var x = -1L try {id = bytes.stosting} (Key, 0, 11) x = tall Maxwell - Bytes. Lolong (key, 11) // ... more code which can throw exceptions (ID, (list (x), // more stuff ...))}} {{e-field: NullPointerException = & gt; {Log Warning ("Skipping id =" + id + ", x =" + x + "; \ n" + e) ​​none}}}  

Is there any more Idiomatic way is it small? I think it is very dirty, I am doing both in getData () and map.filter.map in the dance.

Maybe a flatmap can work (can generate 0 or 1 item in a SEQ ), but I keep it in the map function I do not want to flatten the platelet, just eliminate the blank.

Go back to if you change your getData Tile. Try it then you can simplify your changes so that something like this can work:

  def getData (r: result) = {val key = r.getRow var id = "(Unk)" var x = -1l value tr = util.Try {id = bytes.stosting (key, 0, 11) x = tall Maxwell - Bytes. Lolong (key, 11) // ... more code that can throw exceptions (id, (list (x) // more stuff ...)}} tr.failed.foreach (e => log warning ( "Id =" + id + ", x =" + x + "; \ n" + e) ​​") tr}  

Then your conversion may start:

  myRDD FlatMap (tuple = & gt; getData (tuple._2) .toOption)  

If your try a Failure then it will be turned on as by no one in tooption and then as part of the flatMap argument At that point, it has been removed Your next step in texts will only work with successful cases, which is the underlying type that is without wrapping with getData (i.e. any option )

< / Div>

No comments:

Post a Comment