public class MultiOutputFormat
extends <any>
Multiple output formats can be defined each with its own
OutputFormat
class, own key class and own value class. Any
configuration on these output format classes can be done without interfering
with other output format's configuration.
Usage pattern for job submission:
Job job = new Job(); FileInputFormat.setInputPath(job, inDir); job.setMapperClass(WordCountMap.class); job.setReducerClass(WordCountReduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(MultiOutputFormat.class); // Need not define OutputKeyClass and OutputValueClass. They default to // Writable.class job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); // Create a JobConfigurer that will configure the job with the multiple // output format information. JobConfigurer configurer = MultiOutputFormat.createConfigurer(job); // Defines additional single text based output 'text' for the job. // Any configuration for the defined OutputFormat should be done with // the Job obtained with configurer.getJob() method. configurer.addOutputFormat("text", TextOutputFormat.class, IntWritable.class, Text.class); FileOutputFormat.setOutputPath(configurer.getJob("text"), textOutDir); // Defines additional sequence-file based output 'sequence' for the job configurer.addOutputFormat("sequence", SequenceFileOutputFormat.class, Text.class, IntWritable.class); FileOutputFormat.setOutputPath(configurer.getJob("sequence"), seqOutDir); ... // configure method to be called on the JobConfigurer once all the // output formats have been defined and configured. configurer.configure(); job.waitForCompletion(true); ...
Usage in Reducer:
public class WordCountReduce extends Reducer<Text, IntWritable, Writable, Writable> { private IntWritable count = new IntWritable(); public void reduce(Text word, Iterator<IntWritable> values, Context context) throws IOException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } count.set(sum); MultiOutputFormat.write("text", count, word, context); MultiOutputFormat.write("sequence", word, count, context); } }Map only jobs:
MultiOutputFormat.write("output", key, value, context); can be called similar to a reducer in map only jobs.
Modifier and Type | Class and Description |
---|---|
static class |
MultiOutputFormat.JobConfigurer
Class that supports configuration of the job for multiple output formats.
|
class |
MultiOutputFormat.MultiOutputCommitter |
Constructor and Description |
---|
MultiOutputFormat() |
Modifier and Type | Method and Description |
---|---|
void |
checkOutputSpecs(JobContext context) |
static MultiOutputFormat.JobConfigurer |
createConfigurer(Job job)
Get a JobConfigurer instance that will support configuration of the job
for multiple output formats.
|
static JobContext |
getJobContext(java.lang.String alias,
JobContext context)
Get the JobContext with the related OutputFormat configuration populated given the alias
and the actual JobContext
|
OutputCommitter |
getOutputCommitter(TaskAttemptContext context) |
<any> |
getRecordWriter(TaskAttemptContext context) |
static TaskAttemptContext |
getTaskAttemptContext(java.lang.String alias,
TaskAttemptContext context)
Get the TaskAttemptContext with the related OutputFormat configuration populated given the alias
and the actual TaskAttemptContext
|
static <K,V> void |
write(java.lang.String alias,
K key,
V value,
TaskInputOutputContext context)
Write the output key and value using the OutputFormat defined by the
alias.
|
public static MultiOutputFormat.JobConfigurer createConfigurer(Job job)
job
- the mapreduce job to be submittedpublic static JobContext getJobContext(java.lang.String alias, JobContext context)
alias
- the name given to the OutputFormat configurationcontext
- the JobContextpublic static TaskAttemptContext getTaskAttemptContext(java.lang.String alias, TaskAttemptContext context)
alias
- the name given to the OutputFormat configurationcontext
- the Mapper or Reducer Contextpublic static <K,V> void write(java.lang.String alias, K key, V value, TaskInputOutputContext context) throws java.io.IOException, java.lang.InterruptedException
alias
- the name given to the OutputFormat configurationkey
- the output key to be writtenvalue
- the output value to be writtencontext
- the Mapper or Reducer Contextjava.io.IOException
java.lang.InterruptedException
public void checkOutputSpecs(JobContext context) throws java.io.IOException, java.lang.InterruptedException
java.io.IOException
java.lang.InterruptedException
public <any> getRecordWriter(TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
java.io.IOException
java.lang.InterruptedException
public OutputCommitter getOutputCommitter(TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
java.io.IOException
java.lang.InterruptedException
Copyright © 2012 The Apache Software Foundation