Thursday, 15 August 2013

hadoop - What determines how many times map() will get called? -


I have a text file and a parser that will parse each row (pars) and store it in my custom split input , I'm parsing my custom file in the input format phase, my division is customizable right now, I have 2 partitions and there is a list of my data within each partition.

But right now, my Mapper function is being called repeatedly on the same partition. I thought the mapper function will only call on the basis of your partition's number?

I do not know what this applies but my custom suffix gives a certain number for getLength () and for an empty string array GetLocation () what I have to do for them I have uncertainty.

  @ Override Public RecordRider & lt; Longweightable, derivative & gt; CreateRecordReader (InputSplit Input, TaskAttemptTontext taskContext) throws IOException, blocking exceptions {logger.info (">> > Creating record reader"); Customcorder reader recorder = new customcorder (inputable input split)); Return record reader; }  

map () once for every record It is said in your InputFormat from RecordReader (or referenced) from For example, TextInputFormat call map for each row in the input, even if there are usually several rows in the partition.


No comments:

Post a Comment