Knowledge is Here...: July 2016

Friday, July 29, 2016

How to load data in hadoop?

Hadoop is know as to handle big data, i have been trying to setup connection with Pentaho and Hadoop. So first of all i need to install/setup hadoop (i did this on Red Hat Linux), and then it must contain a table which can hold some data. so in order to load data i used Hive to migrated data in Hadoop tables.

We used HIVE to create table and load data. I am using some pre-downloaded files, after search a lot on google. As all the files have data in tab separated form.

Create table from HIVE
Move to Hadoop_Home and where we need to move the files from root directory to specified directory
Move to HIVE shell
and using Load command, load data into tables.

create table users (id STRING, birth_date STRING, gender STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' stored as textfile tblproperties ("skip.header.line.count"="1");

create table products (url STRING, category STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' stored as textfile tblproperties ("skip.header.line.count"="1");

hadoop fs -put /urlmap.tsv /user/hadoop/products.tsv
hadoop fs -put /regusers.tsv /user/hadoop/users.tsv

hive>
LOAD DATA INPATH '/user/hadoop/products.tsv' OVERWRITE INTO TABLE products;
LOAD DATA INPATH '/user/hadoop/users.tsv' OVERWRITE INTO TABLE users;

Wednesday, July 27, 2016

How to clear cache of run (saved password of shared folder)

We generally used RUN command of window to work quickly or open different type of windows shortly (short hand of computer) using commands. We also need help of RUN command to open Network or shared folders if your machine is available in network.

After long time use of RUN command, it save all commands (as history) and it will be available to you for next time. If you do not want to see the history you need to below steps periodically.

Here i am going to tell the steps to delete specific IP address entry from history of RUN command.

Follow below steps to resolve the problem:

Open cmd
Execute command "net use"
Copy string from Remote column, IP address of server whose cache you want to delete
Execute below command

net use \\192.168.1.1\folder(paste your string copied in step 3) /delete

Now on re-execution of net use command, you can see the string entry get removed
Now go to task manager

kill explorer.exe and starting it again.

Tuesday, July 19, 2016

SecondaryNameNode Inconsistent checkpoint fields

While setting up Hadoop on linux server, i have faced below error consistently in log files, it took long to resolve for me, let head sup and re start you hadoop server.

You can locate below similar error in log files.

2016-07-19 12:43:28,702 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.
LV = -60 namespaceID = 1655575768 cTime = 0 ; clusterId = CID-90fb1076-4e1e-4ab6-bb76-37177e31ad64 ; blockpoolId = BP-532229130-127.0.0.1-1468908471777.
Expecting respectively: -60; 724011492; 0; CID-f9fc0705-4a77-49d1-9220-874fd5f30efe; BP-78957080-127.0.0.1-1456924944705.
at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:134)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:531)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
at java.lang.Thread.run(Thread.java:745)

SOLUTION

Stop all hadoop services (stop-all.sh)
Open logs hadoop-root-secondarynamenode-localhost***.log
Look for path of secondary name node
delete secondary namenode directory
start hadoop services

Hadoop datanode/namenode error

java.io.IOException: Incompatible clusterIDs in /home/hadoop/hadoopdata/hdfs/datanode: namenode clusterID = datanode clusterID =
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:646)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:320)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:403)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:422)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1311)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1276)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:828)
at java.lang.Thread.run(Thread.java:745);

SOLUTION

Check your hdfs-site.xml file to see where dfs.data.dir is pointing to
and delete folder
and then stop and start the datanode

Check you all log files for any error!!!!!!!!!

There might be chances to get below error

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /path/to/hadoop/storage/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:313)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1020)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:739)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:762)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:746)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
2016-07-19 11:33:16,737 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50070
2016-07-19 11:33:16,838 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2016-07-19 11:33:16,839 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2016-07-19 11:33:16,839 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2016-07-19 11:33:16,839 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.

PROBLEM
this is because above you have deleted both namenode and datanode folders:

SOLUTION
need to format the namenode

Steps to follow:

stop datanode
delete folder
execute "hdfs namenode -format"
start datanode

Thursday, July 14, 2016

Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

ERROR
hive>show databases;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

SOLUTION

chown -R hdfs:hadoop *

in case above solution doesn't work then try to remove *.lck files, using below command

[root@localhost sbin]# rm metastore_db/*.lck

RuntimeException org.apache.hadoop.security.AccessControlException

ERROR
RuntimeException org.apache.hadoop.security.AccessControlException: Permission denied: user=sa, access=WRITE, inode="/tmp/hive-root":root:supergroup:drwxr-xr-x

SOLUTION
Problem solved by doing change in dfs_permissions. Locate the file hdfs-site.xml and using vim editor change the value property to false.

 <property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>

Wednesday, July 6, 2016

The system cannot find the batch label specified - nodemanager

Seems to be you are trying to start yarn or other on Windows and if you downloaded files from internet.
If this is the case then the files you have downloaded or replaced in directory "bin" and "etc/hadoop"

PROBLEM
Some of the files *.cmd having LF as line terminator (use notepad ++ to see)

SOLUTION
Open notepadd++ tool and convert file type as below

Goto> Edit> EOL Conversion> Windows Format

By doing this you can see the changes in file all new line terminator has been changed to CRLF.

Knowledge is Here...

Pages

Labels