兴来每独往,胜事空自知。行到水穷处,坐看云起时

hello world

windows下使用IDEA开发MapReduce

30 June ’16

在虚拟机上搭建hadoop,并启动(这里采用伪分步式。)

hdfs-site.xml: ```

dfs.replication 1 dfs.name.dir /usr/local/hadoop/hdfs/name dfs.data.dir /usr/local/hadoop/hdfs/data dfs.permissions false
core-site.xml:
fs.defaultFS hdfs://192.168.169.148:9000 hadoop.tmp.dir /usr/local/hadoop/tmp
mapred-site.xml:
fs.defaultFS hdfs://192.168.169.148:9000 hadoop.tmp.dir /usr/local/hadoop/tmp

[root@bogon hadoop-2.7.2]# cat etc/hadoop/mapred-site.xml

mapred.job.tracker 192.168.169.148:9001
开启hadoop:

[root@bogon hadoop-2.7.2]# ./sbin/start-dfs.sh 查看hadoop 是否成功: [root@bogon hadoop-2.7.2]# jps 3680 DataNode 3571 NameNode 4371 HRegionServer 3880 SecondaryNameNode 6479 Jps ```

准备测试文件: [root@bogon files]# pwd /root/study/files [root@bogon files]# ll total 12 -rw-r--r--. 1 root root 31 Jun 29 18:54 a.txt -rw-r--r--. 1 root root 27 Jun 29 18:55 b.txt -rw-r--r--. 1 root root 37 Jun 29 18:55 c.txt [root@bogon files]# 在a.txt,b.txt,c.txt中随意输入单词或语句;

导入到hdfs文件系统: [root@bogon hadoop-2.7.2]# ./bin/hdfs dfs -mkdir -p /wordcount/input root@ubuntu:~/study/hadoop/hadoop-2.7.2# ./bin/hadoop fs -put ~/study/files/* /wordcount/input 导入后查看是否成功: [root@bogon hadoop-2.7.2]# ./bin/hdfs dfs -ls -R /wordcount drwxr-xr-x - root supergroup 0 2016-06-30 00:44 /wordcount/input -rw-r--r-- 1 root supergroup 31 2016-06-30 00:44 /wordcount/input/a.txt -rw-r--r-- 1 root supergroup 27 2016-06-30 00:44 /wordcount/input/b.txt -rw-r--r-- 1 root supergroup 37 2016-06-30 00:44 /wordcount/input/c.txt

在IDEA中编写MapReduce程序:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer;

import java.io.IOException;
import java.util.StringTokenizer;


/**
 * Created by Administrator on 2016/5/31 0031.
 */
public class WordCount  {

    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer stringTokenizer = new StringTokenizer(value.toString());
            while (stringTokenizer.hasMoreTokens()) {
                word.set(stringTokenizer.nextToken());
                context.write(word, one);
            }
        }
        public static class IntSumReducer  extends Reducer<Text,IntWritable,Text,IntWritable> {
            private IntWritable result = new IntWritable();

            public void reduce(Text key, Iterable<IntWritable> values,   Context context ) throws IOException, InterruptedException {
                int sum = 0;
                for (IntWritable val : values) {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key, result);
            }
        }
    }

    public static void main(String[] args) {
        Configuration conf = new Configuration();
        conf.set("dfs.permissions","false");
        conf.set("fs.default.name", "hdfs://192.168.169.148:9000");
        conf.set("hadoop.job.user", "root");
        conf.set("mapred.job.tracker", "192.168.169.148:9001");
        conf.set("mapred.jar","G:\\study\\workspace\\hadooptest\\output\\hadoopTest.jar");
        try {
            Job job = Job.getInstance(conf, "word count");
            job.setJarByClass(WordCount.class);
            job.setMapperClass(TokenizerMapper.class);
            job.setCombinerClass(IntSumReducer.class);
            job.setReducerClass(IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
//        FileInputFormat.addInputPath(job, new Path(args[0]));
//        FileOutputFormat.setOutputPath(job, new Path(args[1]));
            FileInputFormat.addInputPath(job, new Path("hdfs://192.168.169.148:9000/wordcount/input"));
            FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.169.148:9000/wordcount/output"));
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }
    }
}

pom.xml: ```

4.0.0 my hadoopTest 1.0-SNAPSHOT jar org.apache.hadoop hadoop-hdfs 2.7.2 org.apache.hadoop hadoop-mapreduce-client-core 2.7.2 org.apache.hadoop hadoop-common 2.7.2 org.apache.hadoop hadoop-mapreduce-client-common 2.7.2 org.apache.hadoop hadoop-yarn-common 2.7.2
打开 Project Structure窗口(Ctrl+Alt+Shift+S),Artifacts,勾选Build on make。

这里设置的是就这个build过后的jar包:
     conf.set("mapred.jar","G:\\study\\workspace\\hadooptest\\output\\hadoopTest.jar");


运行WordCount文件。以下代码或配置文件都是修改过后的,可能不会出现下面的问题了,没有修改前,问题多多,如下:

问题1:

     连接不到192.168.169.146:9000上

     解决:将hadoop配置文件中的localhost都改为ip。

问题2:

     Server IPC version 9 cannot communicate with client version 4

     解决:当前使用hadoop版本为2.7.2,  去除pom.xml文件中的依赖:
org.apache.hadoop hadoop-core 1.2.1


问题3:
 Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the corr ``` 添加依赖: ``` <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-common -->
org.apache.hadoop hadoop-mapreduce-client-common 2.7.2

问题4:

     运行到 job.waitForCompletion(true) 时,报NullpointException。

     解决:
               hadoop2.x在windows下编译缺少winutils.exe,需要重新下载一个并且配置hadoop_home环境变量,修改path变量
               同时在system32文件夹下添加hadoop.dll文件。
     步骤:1、下载与hadoop同版本的winutils.exe和hadoop.dll,
               2、替换window下hadoop目录下bin文件;
               3、拷贝hadoop.dll至C:\Windows\System32目录下;
               4、设置环境变量HADOOP_HOME;修改path:添加HADOOP_HOME\bin;
               5、重新运行wordCount.java,如不行,重启下IDEA;


运行成功:

 Process finished with exit code 0

返回linux虚拟机(192.168.169.148)下,查看hdfs文件系统,发现已经有了输出目录产生:

[root@bogon hadoop-2.7.2]# ./bin/hdfs dfs -ls -R / drwxr-xr-x - root supergroup 0 2016-06-29 19:26 /hh drwxr-xr-x - root supergroup 0 2016-06-29 19:26 /hh/.tmp drwxr-xr-x - root supergroup 0 2016-06-29 19:26 /hh/.tmp/data … -rw-r–r– 3 root supergroup 7 2016-06-29 19:26 /hh/hbase.version drwxr-xr-x - root supergroup 0 2016-06-29 19:37 /hh/oldWALs drwxr-xr-x - root supergroup 0 2016-06-29 23:57 /wordcount drwxr-xr-x - root supergroup 0 2016-06-29 19:25 /wordcount/input -rw-r–r– 1 root supergroup 31 2016-06-29 19:25 /wordcount/input/a.txt -rw-r–r– 1 root supergroup 27 2016-06-29 19:25 /wordcount/input/b.txt -rw-r–r– 1 root supergroup 37 2016-06-29 19:25 /wordcount/input/c.txt drwxr-xr-x - Administrator supergroup 0 2016-06-29 23:57 /wordcount/output -rw-r–r– 3 Administrator supergroup 0 2016-06-29 23:57 /wordcount/output/_SUCCESS -rw-r–r– 3 Administrator supergroup 116 2016-06-29 23:57 /wordcount/output/part-r-00000 ```

查看输出结果:

[root@bogon hadoop-2.7.2]# ./bin/hdfs dfs -cat /wordcount/output/part-r-00000
are    1
book    1
catch    1
dfsa    1
good    1
group    1
hello    1
i    2
like    1
man    1
me    1
my    1
nice    1
ok    2
root    1
super    1
wo    1
you    3

想重新运行一遍的话,需要修改为一个新的输出目录,或删除原输出目录。因为如果输出目录已存在的话,会报目录已存在错误!