為什么“set”只有一個元素,而例如前 5 行輸入應該有 4 個元素,這些元素具有相同的 URL 和四個不同的 IP。我還使用了“for-each”而不是“迭代器”,但不起作用。有人能幫我嗎?映射器public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, Text> { private Text IP = new Text(); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] tokens = line.split(","); word.set(tokens[2]); IP.set(tokens[0]); context.write(word, IP); } }減速器 public static class IntSumReducer extends Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { Set<String> set = new HashSet<String>(); Iterator<Text> iterator = values.iterator(); while (iterator.hasNext()) { set.add(iterator.next().toString()); } int a = set.size(); String str = String.format("%d", a); context.write(key, new Text(str)); } }工作 public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}
1 回答

森林海
TA貢獻2011條經驗 獲得超2個贊
Reducer 工作正常,但Combiner 并沒有按照你的想法做。打開組合器時發生的情況是:
映射器輸出:
("GET / HTTP/1.1", "10.31.0.1") ("GET / HTTP/1.1", "10.31.0.2")
組合輸入:
("GET / HTTP/1.1", {"10.31.0.1", "10.31.0.2"})
組合器輸出:
("GET / HTTP/1.1", "2") //You have the right answer here...
減速機輸入:
("GET / HTTP/1.1", {"2"}) //...but then it gets passed into the Reducer again
減速機輸出:
("GET / HTTP/1.1", "1")
只有一個元素進入 Reducer,因此它減少到“1”。
刪除組合器(刪除job.setCombinerClass(IntSumReducer.class);
,這將起作用。
其他建議的更改:
使用 Reducer 輸出
IntWritable
而不是將數字轉換為Text
.用
Set
aSet<Text>
代替Set<String>
,以節省昂貴的Text -> String
轉換。
添加回答
舉報
0/150
提交
取消