Wednesday, 15 June 2011

hadoop - Weird handling of nulls by Hive distinct on multiple columns? -


This query:

  SELECT SELECT (different fields 1, field 2, field 3, Field 4) FROM returns the (small) number SOM_TABLE from this query:  
  select counts (separate fields (1, "tap"), collage (field 2, " Zero "), united (field 3," tap "), collages (field 4," tap ") from SOME_TABLE  

I hope the results will be the same. ?

Seeing C results:

  • COUNT (DISTINCT (...) specifies only all values ​​when all specified fields are non-empty.
      < Li> It is listed, but when there are more than one field it is a slightly ambiguous word.
  • When you enter your fieldman value When you insert, you're saying "when fieldn is zero, give me the" faucet "string, else I'll call fieldn "

Recently my beliefs on this I was also trying to validate it, and it could not be said anywhere without special mention. But say you have got a table like this (where the tap is zero below, string is not faucet):

  ------ --------- - | Region 1 | Region 2 | ----------------- | Foo | Faucet | Faucet Fu | ------- ----------  

then:

  • count (field 1) = 2
  • co unt (separate field 1) = 1
  • count (separate organized (field 1, 'bar' )) = 2 ('foo' and 'bar' distinct) <
  • count (separate field 1, field 2) = 0 (any Is also not a combination where both are non-blank)

  • No comments:

    Post a Comment