Hadoop is know as to handle big data, i have been trying to setup connection with Pentaho and Hadoop. So first of all i need to install/setup hadoop (i did this on Red Hat Linux), and then it must contain a table which can hold some data. so in order to load data i used Hive to migrated data in Hadoop tables.
We used HIVE to create table and load data. I am using some pre-downloaded files, after search a lot on google. As all the files have data in tab separated form.
We used HIVE to create table and load data. I am using some pre-downloaded files, after search a lot on google. As all the files have data in tab separated form.
- Create table from HIVE
- Move to Hadoop_Home and where we need to move the files from root directory to specified directory
- Move to HIVE shell
- and using Load command, load data into tables.
create table users (id STRING, birth_date STRING, gender STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' stored as textfile tblproperties ("skip.header.line.count"="1");
create table products (url STRING, category STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' stored as textfile tblproperties ("skip.header.line.count"="1");
hadoop fs -put /urlmap.tsv /user/hadoop/products.tsv
hadoop fs -put /regusers.tsv /user/hadoop/users.tsv
hive>
LOAD DATA INPATH '/user/hadoop/products.tsv' OVERWRITE INTO TABLE products;
LOAD DATA INPATH '/user/hadoop/users.tsv' OVERWRITE INTO TABLE users;