Friday, July 29, 2016

How to load data in hadoop?

Hadoop is know as to handle big data, i have been trying to setup connection with Pentaho and Hadoop. So first of all i need to install/setup hadoop (i did this on Red Hat Linux), and then it must contain a table which can hold some data. so in order to load data i used Hive to migrated data in Hadoop tables.

We used HIVE to create table and load data. I am using some pre-downloaded files, after search a lot on google. As all the files have data in tab separated form.

  1. Create table from HIVE
  2. Move to Hadoop_Home and where we need to move the files from root directory to specified directory
  3. Move to HIVE shell
  4. and using Load command, load data into tables.

create table users (id STRING, birth_date STRING, gender STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' stored as textfile tblproperties ("skip.header.line.count"="1");

create table products (url STRING, category STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' stored as textfile tblproperties ("skip.header.line.count"="1");

hadoop fs -put /urlmap.tsv /user/hadoop/products.tsv
hadoop fs -put /regusers.tsv /user/hadoop/users.tsv

LOAD DATA INPATH '/user/hadoop/products.tsv' OVERWRITE INTO TABLE products;
LOAD DATA INPATH '/user/hadoop/users.tsv' OVERWRITE INTO TABLE users;

